• Aucun résultat trouvé

Network-Thinking: Graphs to Analyze Microbial Complexity and Evolution

N/A
N/A
Protected

Academic year: 2021

Partager "Network-Thinking: Graphs to Analyze Microbial Complexity and Evolution"

Copied!
15
0
0

Texte intégral

(1)

HAL Id: hal-01300043

https://hal.sorbonne-universite.fr/hal-01300043

Submitted on 8 Apr 2016

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Distributed under a Creative Commons Attribution - NonCommercial - NoDerivatives| 4.0

Network-Thinking: Graphs to Analyze Microbial

Complexity and Evolution

Eduardo Corel, Philippe Lopez, Raphaël Méheust, Eric Bapteste

To cite this version:

Eduardo Corel, Philippe Lopez, Raphaël Méheust, Eric Bapteste. Network-Thinking: Graphs to

Analyze Microbial Complexity and Evolution. Trends in Microbiology, Elsevier, 2016, 24 (3),

pp.224-237. �10.1016/j.tim.2015.12.003�. �hal-01300043�

(2)

Review

Network-Thinking:

Graphs

to

Analyze

Microbial

Complexity

and

Evolution

Eduardo

Corel,

1,

*

Philippe

Lopez,

1

Raphaël

Méheust,

1

and

Eric

Bapteste

1

Thetree model andtree-based methods haveplayed a major,fruitful rolein evolutionarystudies.However,withtheincreasingrealizationofthequantitative and qualitative importance of reticulate evolutionary processes, affecting all levels of biological organization,complementary network-based models and methods are now flourishing, inviting evolutionary biology to experience a network-thinking era. We show how relatively recent comers in this field of study,thatis,sequence-similaritynetworks,genomenetworks,andgene fami-lies–genomesbipartitegraphs,alreadyallowforasignificantlyenhancedusage of molecular datasets in comparative studies. Analyses of these networks provide tools fortackling a multitude of complex phenomena, including the evolutionof genetransfer,compositegenes andgenomes,evolutionary tran-sitions,andholobionts.

NewMethods forStudying theWeb ofLife

ThetreemodelhasbeenlargelyandrightlyusedinevolutionaryanalysessinceDarwin'sseminal work[1].Thegenealogicalrelationshipsbetweenevolvingobjectsareindeedcriticaltoexplain life'sdiversity,notonlyfromaprocessualperspective(wherecommonancestryexplainssome similarities),butalsoasapowerfulpatterntoclassifyallrelatedevolvedforms[2].However,the treestructure,especiallywhenassumedtobeuniversal,stronglyconstrainsourdescriptionof theevolutionoflife[3–5].Bydefinition,atreecanonlydescribedivergencefromalastcommon ancestor (often with dichotomies, or with polytomies describing fast radiations). In vertical descent,thegeneticmaterialofaparticularevolutionaryunitispropagatedbyreplicationinside itsownlineage.Whensuchlineagessplit,andbecomegeneticallyisolatedfromoneanother,this produces a tree. By contrast, in introgressive descent, the genetic materialof a particular evolutionaryunitpropagatesintodifferenthoststructuresandisreplicatedwithinthesehost structures[4].However,atreewithasingleancestorforeachobjectcannotrepresentsucha mergingofdistinctlineagesintoanovelcommonhoststructure.Typically,organismsproduced bysexualreproductionineukaryotesoriginatefromtwoparentswhich mergedtheirgenetic material.Genealogicaltreeswithasingleancestordonotdescriberelationshipswithin eukary-oticsexualpopulations.Indeed,thisgenuinegenealogicalrelationshipcannotbedepictedwitha traditionaltreerepresentationsincethispatternwouldimposethatoneconsidersanoffspring either more closely related to only one of its parents, or to be the progenitor of its own ascendants[4].

Thedistinctionbetweenverticalandintrogressivedescentisnotaminorone;introgression (see Glossary)affects alllevelsof biological organization:from molecules,when sequences legitimatelyorillegitimatelyrecombine,togenomes,whensequencesentergenomesbylateral

Trends

Introgressive processes shape the microbial world at all levels of organisation.

Thisreticulatedevolutionisincreasingly studiedby sequence-similaritynetworks. Theyprovideaninclusiveaccurate mul-tilevelframeworktostudythewebof life.

Networksenhanceanalysesof micro-bial genes, genomes, communities, andofsymbiosis.

1

EquipeAIRE,UMR7138,Laboratoire EvolutionParis-Seine,UniversitéPierre etMarieCurie,7quaiStBernard 75005Paris,France

*Correspondence:

(3)

Glossary

Articulationpoint(orcut-vertex): nodeinagraphwhoseremoval increasesthenumberofconnected componentsoftheresultinggraph. Betweenness:centralitymeasurefor anodeinagraph,namely,the proportionofshortestpathsbetween allpairsofnodesthatpassthrough thisspecificnode.Nodeshavinga betweennesscloseto1aresaidto bemorecentral,andthosecloseto 0,moreperipheral.

Bipartitegraph:agraphwithtwo typesofnode(topnodesandbottom nodes)suchthatanedgeonly connectsnodesofonetypewith nodesoftheothertype. Clubofgenomes:acoalitionof entitiesreplicatinginseparateevents andexploitingsomecommongenetic materialthatdoesnotnecessarily tracebacktoasinglelastcommon ancestor.

Community:ingraphtheory,groups ofnodesthataremoreconnected betweenthemselvesthanwiththe restofthegraph.Thistechnical meaningshouldnotbeconfusedwith itsuseinexpressionssuchas ‘microbialcommunities’. Connectedcomponent:setof nodesinagraphforwhichthereis alwaysaninterconnectingpath. Degree:numberofincidentedgesto agivennode.

Introgression:descentprocess throughwhichthegeneticmaterialof aparticularevolutionaryunit propagatesintodifferenthost structuresandisreplicatedwithin thesehoststructures.

Multiplexgraph:agraphhaving possiblyseveraledgesofdifferent typesbetweentwonodes. Neighbors:nodesthataredirectly connectedbyanedge.

Publicgeneticgoods:thecommon geneticmaterialsharedbyaclubof phylogeneticallydistantgenomes. Quotientgraph:simplifiedgraph whosenodesrepresentdisjoint subsetsofnodesoftheoriginal graph;anedgeinthisnewgraph connectstwosuchnewnodes wheneveranedgeintheoriginal graphconnectsatleastoneelement ofanewnodewithatleastonefrom theother.

Support:thecommonsetof neighborsofatwinclass. Twins:nodesinagraphthathave exactlythesamesetofneighbors.

genetransfer,andtoholobionts,whenorganismsformacollectivesystem(suchasthetight associationobservedbetweenhostandendosymbionts)[6–8](Box1).Introgressivedescent doesnotalwaysimplylateralgenetransfer:forexample,independentlyreplicatedgenefamilies, eachhavingtheirowntree,canmerge,andthisresultsinanovelcompositegenefamily.Since evenintrogressivedescentisdescent,itencompassesaverticaldimension.Thetree represen-tationemphasizeshowentitiesevolveexunibusplurum,whereasthenetworkrepresentation emphasizeshowentities evolveex pluribusunum. Of course,evolution progressesinboth dimensions.Thus,thetreeoflifeandthenetworkoflifearenotmutuallyexclusivemodels.When lineagesthatevolvedinatree-likefashionmerge,thiscreatesreticulationbetweenbranchesof trees;likewise,afterareticulationevent,phylogeneticallycompositeentitiescanundergoa tree-likeevolution:atreestartsgrowingonthegroundofaninitialreticulation.Consequently,future syntheticrepresentationscouldaimatdisplayingsimultaneouslybothverticalandlateralpartsof biologicalevolution.

Box1.MosaicismofLife

Introgression,themergingofentitiesfromdifferentlineages,affectsmultiplelevelsofbiologicalorganization.For example,FigureIAdescribestheintrogressionofgenesfromdistinctgenefamilies,whichresultsincompositegenes, suchasmultidomaingenes[46].Thesesequencescancomefromwithinagivengenome,orwhentheycomefrom differentgenomes,suchasthegenomes ofanendosymbiontand ofits host,theresultingcompositegene, composedpartlyofmaterialfromanendosymbiont,isthereforeasymbiogeneticgene.FigureIBdescribesthe introgressionofageneintoahostgenome,occurringforinstanceduringalateraloranendosymbioticgenetransfer, whichresultsinacompositegenome[63],orwhensequencestransferacrossmobilegeneticelements,producing mosaicmobileelements[37].Ofnote,morethanonegenecanbesoacquiredbyagenome[59,60].FigureIC describestheintrogressionofagenomeintoanothergenome,occurringforinstancewhenthegenomeofWolbachia pipiensisbecomesinsertedintothegenomeofaDrosophilaananassae,orwhenthegenomeofavirophagesuchas SputnikbecomesinsertedintothegenomeofagiantvirussuchasMimivirus,whichresultsinacompositegenome

[87–89].FigureIDdescribestheintrogressionofamobileelement,suchasaplasmid,withinahostcell(ororganism), occurringfor instancewhen a symbioticplasmidcarrying hypermutagenesisdeterminants(e.g., the imuABC cassettes)invadessoilbacteria,enhancingtheexplantaphenotypicdiversificationofthesenovelcompositecells

[90,91].FigureIEdescribestheintrogressionofagenomeintoahostcell,occurringforinstanceduringeventsof kleptoplasty[92,93]orasaresultofanextremereductiveevolutionaftersecondaryortertiaryplastidacquisition, whichresultsina(transientorpersistent)compositeorganism[7].FigureIFdescribesintrogressionofcellsor organisms,occurringforinstanceduringtheevolutionandgrowthofmultispeciesbiofilms[94],endosymbiosis[8,95], duringthedevelopmentandspeciationofanimals[82,83].Typically,sequence-similaritynetworkscanbeusedto investigateforA;genomenetworksforB,C,andE;multiplexgenomenetworksforB,C,andE;andbipartitenetworks forB,D,E,andF. Composite sequence Sequence Sequence Composite genome Genome Genome Genome Genome Composite genome

Composite cell or organism Holobiont Holobiont

Mobile genec element Cell or organism Cell or organism Cell or organism Cell or organism

(A) (B) (C)

(D) (E) (F)

FigureI.SeveralIllustrationsofMosaicismthroughMergingEvents.(A)Compositegenesresultfromthefusion ofdifferentgenedomains.(B)Compositegenomescanresultfromtheintrogressionofageneintoagenome,or(C)from theintrogressionofagenomeintoagenome.(D)Compositeorganismscanarisefromtheintrogressionofamobile geneticelement.Holobiontsresultfromtheintrogressionofagenome(E)orofanothercell(F)intoacell.

(4)

Importantly,notallevolvingobjectsentertaingenealogicalrelationships:forinstance,virusesand cells,bothcriticalplayersofbiologicalevolution,arenotassumedtoberelatedinthisway[9–14], nor are plasmid and plasmid-like transferable objects, as integrative-conjugative elements (ICEs).Cells, viruses, plasmids, andICEs lack recognized genealogicalrelationships, either because they genuinely evolved from separate roots, or because their putative common ancestor(s) cannot be inferred from the data, for example, if other descendants of such ancestorsbecameextinct,orhavenotbeensequencedtodate.This apparentgenealogical disconnectiondoesnotexcludeverticalevolutionwithinlineagesofmobileelements.Thereis, for example, evidencefor both verticaland introgressive descent inplasmids and ICEs of firmicutes[15].Butitmeans thatonegenealogicaltreecannot representalltheevolutionary history[6].Therefore,atraditionalapproachtoanalyzingevolutionincurs theriskofmissing explananda(many phenomena that arenot describedby a genealogical tree) andmissing explanans(manyevolutionaryprocessesresponsibleforlife'sdiversity).Treesandnetworksare representationsthatallowforscientificanalysis.Consistently,tree-thinkinghasalreadylargely beenexploited, andit isnow timely and heuristic to turn to network-thinking to illuminate additionalandcomplexaspectsofthebiology.Inthisreview,wearguethatsequence-similarity networks,alreadyusedtoinvestigatetheevolutionofproteincodinggenes,canalsobeusedto analyzemanymosaicismsoflife,suchasbacterialgenomeevolution,prokaryotes’andprotists’ organismalevolution,andtheevolutionofholobiontsandcommunitiesinwhichmicrobesplaya role,inparticularassymbionts(Box1).Weexplainintrogressionresultsinatleastthreemajor phenomena:(i) microbial social life,understood hereas genetic transfers between different genomes,(ii)chimerism(occasionallyimplyingmajorevolutionarytransitions),and(iii)holobionts. Allthreeexamplesresistclassictree-basedanalysesandchallengeourevolutionaryknowledge. Atreemodelalonedoesnotdescribetheseintrogressiveprocesses,thatis,thefactthatthey involvemultiplelineages, andtheir outcomes,that is, thefact thatthey produce collective, composite,entities.Wedescribehowandwhythesephenomenacanbestudiedusingthree classesofnetworks[sequence-similaritynetworks,genomenetworks,andbipartitegraphs (Figure1,KeyFigure)],enlargingtheanalyticaltoolkitofevolutionarymicrobiologists.Ontheone hand,thedisplayoflargenetworkswillconstituteachallengeforthefuturedevelopment of network-thinking.On the otherhand, interms of interpretation,even very largeanddense networkscanbeeffectivelysimplified, forexample,usingtwinanalyses. Thus,weexpecta network-thinkingeratosoonbeattheforefrontinmicrobiology.

InvestigatingMicrobialSocial LifewithGenomeNetworks

Genetransferbetweenprokaryoticorganismsandmobilegeneticelements(i.e.,virusesand plasmids) has largely shaped cellular genome content,as illustrated by the observation of prokaryoticpangenomes[15,16],forwhichthecollectionofgenefamiliesusedbythemembers ofagivenspeciesislargerthanthenumberofgenefamiliespresentinanyindividualgenome fromthatspecies. Theflow ofgenesbetween genomes,oftenmediated bymobilegenetic elements,explainsthisobservation,butcomplicatesclassicinferencesaboutthepast(suchas genomereconstructionattempts)[4,5,17–20].Foragivenlineage,thecontentsofancestral genomes may be largely different from the union of extant genomes because prokaryotic genomesactas ‘read–write’storage organellesrather than‘read–only’ memories[21],and genomescanlosegenes.Thus,describingevolutionrequiresnotonlythetrackingofmutations thataccumulatewithingenefamilies,orlossofgenefamilies[22],butalsogenesthataregained byintrogression[23].Thelatterencouragesexploringhorizontalgenetransferwithinprokaryotic communities.This bringsforwarddifficult questions[20,24–30]sincetherearemanyroutes throughwhichgenespassfromonemicrobialhosttotheother,thatis,multiplechannels[31]for gene transmission. For example,is gene transmission random interms of cellular,viral, or plasmidictargets(howeverproducingasymmetricalresultsduetosomefurtherhostselection actingon the incoming genetic material)? Is it random in terms of what gene familiesare transmitted?Canwefindgroupsofcotransmittedgenes?

(5)

Sharedgenenetworkswereintroducedpreciselytotackletheseissues(Figure1)[17,19,32]. Thesenetworksrepresentwhichgenomessharegeneticmaterial,withoutprejudiceregarding theprocesses involved(verticaldescent, butalsointrogressions[19,33,34]).Ingenome net-works,allentitiesarenotnecessarilygenealogicallyrelated,allowingforsimultaneousanalysisof mobilegeneticelementsandcellularevolution.Inthatrespect,thesocialmicrobialnetworkis moreinclusivethanthetreeoflife,whichisrestrictedtoonetypeofrelationshipbetweenone

KeyFigure

Different

Graph

Representations

of

the

Same

Gene

Sharing

among

Genomes

CC 2 CC 3 CC 1 B A A C B C C B A 1 1 1 1 1 2 1 1 1 1 2 2 2 2 1 3 1 2 3 (A) C C C A B B C A B B A B (B) (C) (D)

Figure1.(A)Sequence-similaritynetwork(SSN):eachnode(circle)representsaprotein-codinggenesequence;thecolor andthelabelofthenoderepresentthegenomewherethegeneisfound.Twonodesareconnectedbyanedge(alinelinking twonodes)ifthepairofsequencesfulfillsgivensimilaritycriteriasuchasaminimumpercentageidentityandcoverage(i.e., theratiobetweenthelengthofthematchingpartsandthetotallengthofanytwosequences).Sequence-similaritynetworks areanalyzedasapartitionintoconnectedcomponents(CCs,highlightedascolorhalos).Thispartitiondefinesgroupsof putativegenefamilies,whenreciprocalsequencecoverageandidentitypercentagearehigh[68]:forinstance,wecan interpretCC1asagenefamilyforwhichtwocopiesarepresentbothingenomesAandB.(B)Genomenetworks(GNs)can beobtainedfromSSNs:nodesaregenomes(describedbycolorandlabel);edgesconnectgenomesthatshareatleastone genefamily;GNscanbeweighted:weightscountthenumberofgenefamiliessharedbythetwogenomes.Intheexample, AandBsharethreegenefamilies,butthegraphdoesnotspecifywhichones.(C)Multiplexednetworks(MNs)canbe,in turn,obtainedfromGNsbylabellingedgesinordertoidentifywhatgenefamiliesareshared:nodesrepresentgenomes; multi-edgesrepresentdistinctsharedgenefamilies(samecolorcodeastheCCsintheSSN);weightscountthenumberof sharedgenesineachfamily:theblueedgebetweenAandBcorrespondstoCC1in(A)andhasthereforeweight2.(D) BipartitegraphscanalsobeobtainedfromSSNs;topnodesaregenomes;bottomnodesaregenefamilies;edgesconnect agenometoagenefamilyifthatgenomecontainsatleastonerepresentativeofthecorrespondinggenefamily;weights countthenumberofgenesofthatfamilypresentinthatgenome:intheexample,node1correspondstoCC1in(A),andhas thereforeedgesincidenttogenomesAandB,eachofweight2.

(6)

fractionofthebiologicaldiversity[6].Twogenomeswithadirectconnectioninsuchagraphare similarinthesensethattheyshareatleastonegenefamily,whereastwogenomesconnectedonly byanindirectpatharenotsimilarintermsofgenecontent.Thesegenomenetworksdisplaysome structure.Firstofall,plasmidsaremorecentral(higherbetweenness[35]foragivendegree)and virusesmoreperipheral,testifyingthatplasmidsaregeneralcouriersforgenetransmissionamongst microbes[19].Second,genomenetworkshaveseveralconnectedcomponents,thatis,several setsofgenomesforwhichthere isalwaysaninterconnectingpath.Eachoftheseconnected componentsgroupsgenomeswithexclusive,non-overlappingsets ofgenefamilies,andthus correspondstopoolsofgenesuniquelyassociatedwiththesegenomes[19].Theexistenceof differentconnectedcomponentssuggeststheexistenceofrestrictionstointrogression.

Withinaconnectedcomponent,agenomenetworkonlyshowsthatgenomessharegenes,but notwhatthesharedgenesare.Typically,atriangleofthreeconnectedgenomes(A,B,C)may resultfromthesharingofdifferentgenesforeachpair(AB,BC,AC)withinthistriangle[4](see

Figure1B,C).Thus,genomesmayformtightlyclustered communities[20]inthese graphs whilesharingdifferentgenes.Genomenetworksprovidegeneralinformationaboutbarriersto transmissionandaboutgenetic partnerships,suggestingclubsofgenomesenjoyingpublic geneticgoods[4,20].Thesegenomenetworksrequire, however,furtherspecifications(for example,ontheiredges)toaddressdetailedquestionsaboutgenetransmissionanditsbarriers. Amoreinformativerepresentationdisplaystheidentityofsharedgenesalongeachedgeofa genomenetwork,likein[36],whichshowedsomegenesharingbetweenbacteriophages(as early as 1999), or as in [37] that unraveled genetic transmission between mobile genetic elementsofgiantviruses(asrecentlyas2013).Suchmultiplexgraphsareunquestionably attractiveandrathernaturalrepresentationsofgeneticsharing.However,theirdisplaybecomes rapidlycomplexforlargedatasets,andfromananalyticalpointofview,othergraphscanoffer practicaladvantagestoanalyzegenetransmissionbeyondthegenomenetworkframework.

IntroducingBipartiteGraphsinEvolutionaryStudies

Theinformationontheidentityofsharededges(here,genefamilies)canbeconservedinaless clutteredfashionbyusingbipartite‘genefamilies–genomes’graphs.Inthesegraphs,theprecise informationregardinggenesharingisdirectlyencodedasedgesbetweenthesetwokindsof nodes.Multiplexgenomenetworkscanbeseenasunimodalprojections[38]ofsuchbipartite ‘genefamilies–genomes’graphs (Figure 1D).Bipartite graphs include the same diversity of genomesasthegenomenetworksdescribedabove,buttheyaremoreaccurate.Importantly, simplespecificbioinformatictreatmentsofthesemultilevelgraphsallowonetorapidlyidentify which groups of genes areshared by which groups ofgenomes [39],and to display and comparedifferentchannelsofgenetransmission,thatis,theroutesacrossgenerationsthrough whichhereditaryresourcesorinformationpassfromparenttooffspring[31].

Asingenomenetworks,connectedcomponentsproduceaninformativepartitionofthedata. Thispartitioncanmoreoverbeexaminedatdifferentlevelsofsimilaritybytuning,forexample,the sequenceidentitypercentage.Whenthedataconsistofalltheproteinsequencesfromallthe completeviral(3749),plasmidic(4350),andarchaeal(152)genomes,togetherwitha repre-sentativesubsampleoftheeubacteria(230)fromNCBI,wegetthenumbersshowninTable1. Assuming a rough molecular clock, these thresholds are useful for investigating events of differentages.Sequenceswith90%identityhavearelativelyweakdivergencewithrespect tosequenceswith30%identity;indeed,theselatterhavelikelydivergedfasterorforalonger periodoftime.

Thisrepresentationofgenefamilies–genomesbipartitegraphsisexplicitlymultilevel. Interest-ingly,itsanalysisdoesnotrequireanygraphclusteringalgorithm(whoseresultstendtovary

(7)

considerablywiththeirimplementation).Genetictransmissionamongmicrobescanbe investi-gated by simple topological notions of bipartite graphs that result in biologically relevant observations:twinsandarticulationpoints[40]thatwedetailbelow.

Weapplyherethesenotionsonlytogenefamilynodes.‘Twin’isanotionofgraphtheory;applied togenefamilies–genomesgraphs,itsinglesout‘fellowtravellers’:genefamiliesaretwinswhen theyarepresentinexactlythesamesetofgenomes.Inthelanguageintroducedin[34],the supportofsuchatwindefinesaclubofgenomes.Clubsofgenomes,whencomposedof individualspertainingtodifferentspecies,couldencouragefurtherstudiesof‘kin-coevolution’, forexample,thefactthatgeneticdivergenceaffectingmultipleecologicallycoexistinglineages, thatexchangedgenesatsomepointoftheirevolution,producesmultilineagepersistentclubs. Thebipartitegraphcanbesimplifiedbygroupingtogethersetsofgenefamiliesthatareshared byexclusivegroupsofgenomes,andbyreplacingeachsuchgroupofgenefamiliesbya super-node.Nodesthatremainuntouchedbythisreductionprocessareconsideredastrivialtwin classes(andresultintrivialsuper-nodes).Technically,thereisnodifferencebetweentrivialand non-trivialtwins,although,fromthebiologicalperspective,thelattercorrespondtogroupsof genefamiliesthataremorelikelytobetransmittedtogether.Theresultingquotientgraphis reduced,becauseeveryclubofgenomesisnowdefinedbyonesuper-node(individualgene familyorgroupofgenefamilieshostedinthisclubofgenomes) whilenoinformationislost (Figure2).Thispropertymeansthatevenverylargegraphscanbeinvestigated.Inthedataset presentedin Table 1,we typically find clubs, such as the one composed of the firmicute Enterococcusfaecalisandnineplasmids(present inlactococciorenterococci)that simulta-neouslyandexclusivelysharethefollowinggenefamilies(at90%identity):ribose5-phosphate isomerase RpiB, galactose mutarotase and related enzymes, b-glucosidase/6-phospho-b-glucosidase/b-galactosidase,andphosphotransferasesystemcellobiose-specificcomponent IIA.Thesesharedmobilizedgenefamiliesareinvolvedinneighborpathwaysofsugar metab-olisms(specificallyinglycolysisandinthepentosephosphatemetabolicpathways),whichlikely explainstheircollectivemobilizationinplasmids.

Articulation points ina gene families–genomesbipartite graph correspond to gene families sharedbymanygenomeswithotherwise totallydistinctgene contents(foragivensimilarity threshold).Althoughstrictlytopological,thenotionofanarticulationpointisthusexpectedto helpdetectpublicgeneticgoods[34],thatis,geneticmaterialthatisbeingsharedby taxonomi-callydistantgenomes,whichpossiblybenefitfromthepropertiestheyconfer,forsomereason otherthangenealogy(i.e.,genescodingforenvironmentaladaptationorhitch-hikingwiththose). However,anarticulationpointcanalsodetectselfishgenes,suchastheabundanttransposases

[41],whicharespreadingacrossmultipledistantlyrelatedgenomes(Box2). Table1.StatisticsoftheProkaryote–Virus–PlasmidGeneFamilies–GenomesBipartiteGraphsa

Minimalidentitypercentagetoconnectsequences 30% 60% 90% Numberofconnectedcomponents(CC) 156 375 488 NumberofCChavingonlyplasmids 25 73 155 NumberofCChavingonlyviruses 130 299 297 Sizeofthegiantconnectedcomponent(numberofnodes) 6362 5143 2769

a

Forreciprocal80%lengthcover,anddifferentidentitythresholds.

Thedataconsistofallproteinsequencesfromallcompleteplasmidic,viral,andarchaealgenomesfromNCBI(asof11/ 2013),aswellasonecompleteeubacterialcompletegenomeforeachfamily.Theidentitypercentagedescribesthe similarity,intermsoftheconservationofprimarysequences,betweenpairsof molecules.Thehigherthis‘identity threshold’themoresimilarpairsofsequencesmustbetobedirectlyconnectedinasequence-similaritynetwork.For high‘identitythreshold’,connectedcomponentsconsistofhighlyconservedsequences.Inafirstmolecularclock-like approximation,higher‘identitythresholds’definegroupsofsequencesthatdivergedmorerecentlyfromoneanotherthan groupsdefinedwithlower‘identitythreshold’.

(8)

InvestigatingCompositeGenes, Organisms,andEvolutionaryTransitions

withSequence-Similarity Networks

Introgressioncanalsobeinvestigatedbelowthegenelevelandabovetheorganismallevel.For instance,compositegenes,suchasthegenesproducedbyevolutionarytinkering[42],famous forencodingmultidomainproteins,arewelldocumentedincellulargenomes[43–45],andhave beenreportedinvirusesandplasmids[46,47].Suchgenesarecomposedofgeneticfragments (e.g.,components,whichcanbedomainsorfullgenes)thatareotherwisefoundinseparate genefamilies [48].The fusionof areceptor-binding protein witha tail fiberprotein inthe lactococcalbacteriophagebIBB29,producingacompositegeneinvolvedinhostspecificity, offersagoodexampleofthissortofmolecularmosaic[49].Whilemanysubstitutionmodels havebeen developed toaccount for gradual evolution by point mutation inphylogenetic inferences, modelsdescribing the rules andrates of emergence (or fission)of composite genesarestillrare[50–52],especiallyforunicellularorganismsandmobilegeneticelements

[46,47].However,manygenefamiliesarenotjustevolvinggraduallywithintheboundariesofa single genefamily [53].The accretionof twoprotein domainsinto a novelhost structure constitutesacaseofsaltatorymolecularevolutionbyintrogression.Therulesofevolutionand fragmentcombinationlargelyremaintobediscovered[54,55].Sequence-similaritynetworks couldcontributetothistask.Indeed,thesegraphscan:(i)provideasystematicdescriptionof bothcompositeandcomponentgenesingenomes(andmetagenomes);(ii)beusedtopolarize fusionandfissionevents(bycomparingthetaxonomicaldistributionofgeneshostsin associ-atedcomponentandcompositegenefamilies);(iii)bedirectlyusedtocomparetherelative conservationofoverlappingcomponentandcompositesequences,forexample,todetermine whetherdomainsfoundindifferentcombinationhavedifferentratesofevolution.Thedetection ofcompositegenesusingsequence-similaritynetworkscanfurthercontributeto understand-ingthe rulesof evolutionof other biologicalnetworks, suchas protein–protein interaction networks[56].Forinstance,when,asaresultofexon-ordomain-shuffling,compositegenes producenovelcombinationsofdomainsofinteraction,compositegenescanintroducenovel nodesandedgesinprotein–proteininteractions.Likewise,compositegenescanimpactthe robustnessofprotein–proteininteractionnetworks,whengenescodingforseparateproteins involvedinafunctionalinteractionbecomefused,‘crystallizing’anedgeoftheprotein–protein interactionnetwork. Support 2 Support 2 Support 3 Support 1 Support 3 Support 1

Twin 1 Twin 2 Twin 3

Super-node 1 Super-node 2 Arculaon point

Super-node 3

(A) (B)

Figure2.TwinsandArticulationPointsinaBipartiteGraph.(A)Topnodesinthisbipartitegrapharegenomesand bottomnodesgenefamilies.Nodesineachcoloredellipseatthebottomformatwinclass,sincetheirsetsofneighbors (supportsencircledbysimilarlycoloredellipsesonthetoplevel)areidentical(ashighlightedbythecoloringoftheirincident edges).(B)Collapsingtwinnodesintosuper-nodesyieldsareducedgraph,withoutfurtherbottomtwinnodes.The supportedgroupsofhostgenomesareunchanged,andarenowdefinedastheneighborsofasinglesuper-node.Dueto thegraphreduction,thegreensuper-nodeisnowanarticulationpoint,sinceitsremovaldisconnectsthenodesinthepink andbrownsupports.

(9)

This issue takes on particularly fundamental importance in organisms hosting genes from multipleorigins.Theseintrogressedgeneshavedistinctevolutionarypasthistoriesand,hence, possiblydifferentfutureevolvabilities.Forexample,eukaryoticgenomes[57,58]aswellasmajor archaeallineages arecomposed of genesfromboth bacterialand archaealorigins [59,60]. Somestudieshave focusedontheevolution ofcomplete genesofdistinctorigins inthese mosaictaxa[i.e.,contrastingtheessentialityorcentralityofgenesfrombacterialandarchaeal originsinregulatoryormetaboliceukaryoticnetworks[61],orsimplyperformingclassic phylo-geneticanalysesofthesegenestoidentifyendosymbioticgenetransfer(orEGT)[8]].Acommon fateforproteinsderivedfromsuchtransferredorganellargenesistobetargetedbacktothe compartmentof originto performtheir original function,but not only[62].Regardingthese proteinsandgenes,thestudy ofcompositeorganisms hasopenedthedoorto anexciting evolutionaryquestionthat,weargue,networkscannowbetteraddress:whathappensafter distinctgeneticmaterialbecomesintegratedintoanewhost?Genesfromdistinctoriginscould havedifferentpropensitiestobelostortodivergeduringsubsequentevolutionoftheirnovel composite host lineage [63]. Likewise, at the infragenic level, the evolutionary impact of introgression deserves consideration. Do composite organisms host novel symbiogenetic compositegeneswithcomponentsfromdifferentphylogeneticoriginsthatcouldonlybeborn insuchgeneticmelting-potsas aresultoftheoriginalmixingofgene fragments?Apositive answer,thatis,thedetectionofsuchnovelcompositegenesincompositeorganisms,could

Box2.ArticulationPointsRevealPotentialPublicGeneticGoods

Inaprokaryote–virus–plasmiddataset,wetypicallyfindclubsofgenomes,suchastheone(representedinFigureI) composedoftwomesophilicsulphur-reducingacetate-metabolizingProteobacteria(Geobactersulfurreducensand Desulfobaccaacetoxidans)andtwothermophilichydrogenotrophicmethanogenEuryarchaeota(Methanocellaconradii andMethanocellapaludicola).Thesetaxaarelinkedbyanarticulationpoint,whichindicatesthesharingofaconserved genefamily(at>90%identity),functionallyannotatedasatRNA(1-methyladenosine)methyltransferase.Thiskindof associationbetweensulphate-reducerandmethanogensiswell-documentedintheliterature[96,97].Thesharingof genesbetweendifferentprokaryotessuggestedbythisnetworkanalysismakessense,sincetheseprokaryotesare foundincommonanoxicenvironments,suchasricepaddysoils[98].Also,G.sulfurreducensandM.paludicolacontaina laterally-transferredtwo-genecluster,hgcAB,relatedtotheabilitytomethylatemercury[99].Thus,agraphanalysis producesanoveltestablehypothesis,namely,toseeifthesharedtRNAmethyltransferaseisinvolvedintheadaptationto theenvironmentofthesetaxa,orifithitch-hikedwithothergenestransferredbetweenthesetaxa,suchasthehcgAB cluster. Geobacter sulfurreducens Desulfobacca acetoxidans Methanocella conradii Methanocella paludicola tRNA methyltransferase

FigureI.ExcerptofaTypicalReducedGeneFamiles–GenomesBipartiteGrapharoundanArticulation Point.ThetopnodescomposetheclubdefinedbythesharingofaconservedtRNAmethyltransferase(bottomnodein yellow).Forsimplicity,onlythedirectneighborsofthemembersoftheclubhavebeenincludedinthepictureofthegraph. Theremovalofthearticulationpoint(inyellow)isolatesthetwotaxonomicallyhomogeneousgroupsfromeachother.

(10)

>80% reciprocal coverage Paral coverage

Related to endosymbiont

Composite in host Related to host

(A)

(B)

(C)

(D)

Figure3.Typical PatternsforCandidateEndosymbioticGeneTransfer(EGT)and CompositeGenesin Sequence-SimilarityNetworks.(A)Sequence-similaritynetworkscanbeusedforthedetectionofdistanthomologuesin eukaryoticgenomes.Complete(left)andpartial(right)sequencesimilarity,andhowtheyaretranslatedasdifferenttypesof edgesinthesequence-similaritynetwork(SSN).Inblack,thepercentageofreciprocalcoverishigh;thesequencesare homologousovertheirentirelength.Inpurple,thecoverpercentageislow;thesequencesareonlypartlysimilar,thatis,they shareahomologousdomain.(B)Shortest-pathanalysisinasequence-similaritygraphcanbeusedfordetectingpossible endosymbioticgenetransfer(EGT).Indeed,EGTresultsinacharacteristicnetworkpattern:anindirectshortpathalong whichalledgesindicatehomology,connectingtwonodescorrespondingtodivergedsequencespresentinagivenhost organism.Greennodesrepresenteukaryoticsequences;red,bacterialsequences;andyellow,archaealsequences.Black edgesdenotecompletesequencesimilarity(>80%length).Allshortestpathsbetweeneukaryoticsequencesthatpass throughthebacterialandarchaealcomponentsarelikelycandidatesforEGT,becausethisindicatesthatafirsttypeof eukaryoticsequencehasaffinitiestobacterialsequenceswhileasecondtypehasaffinitiestoarchaealones.(C) Sequence-similaritynetworkswithedgesforcompleteandpartialcoveragearealsousefulforthedetectionofcompositegenes.The

(11)

revolutionizeourunderstandingoftheoriginsofbiologicaltraits.Anegativeanswer,thatis,the lackofnovelcompositegeneticmaterialfromdifferentorganismalsourcesdespitetheirnew physicalproximity,wouldindicatestrongselectivepressurespreventingthebirthofnovelgene families in spite of changes in their genomic context. Thus, it would be worth testing if introgressionatonelevelofbiologicalorganization(i.e.,betweencells)canfavorintrogression atanotherlevel(i.e.,between genes).Forexample,organismswith compositegenomes,or holobionts, might be composed of morecomposite symbiogenetic genes than organisms devoidofendosymbionts,orlesssubjectedtogenetransfer.

Sequence-similaritynetworksareidealtoolsforinvestigatingtheseissues(Figure3).Thesevery inclusivegraphs[47,53]allowforcomparativeanalysesofmassivedatasetswithouttheneedfor multiplesequencealignments[4,64–66].SimilarityistypicallydetectedinaBLASTall-versus-all analysistoproduceatableofpairwisehits[67].Sequence-similaritynetworksaredisplayedand analyzedas asetof connectedcomponents(Figure 1A)[68].When thecoveragebetween sequencesishigh,thispartitionofthenodesdefinesgroupsofputativehomologoussequences orgenefamilies.Thus,sequence-similaritynetworkshavebeenusedwithrelativelystringent criteria(i.e., hits between two sequences must show >30% identity, cover 80% of both sequenceslength,andhaveamaximalE-valueof10–5inBLASTcomparativeanalyses)coupled with clustering methods to identify clusters of nodes corresponding to homologous gene families[69–71].Inthepast20years,sequence-similaritynetworkshaveindeedmainlybeen usedto investigate theevolution ofprotein-coding genes [4,64–66,71–75],and to perform functional annotation. For instance, the COG categories correspond to groups of similar sequences(withremarkabletopologicalpropertiesinsequence-similaritynetworks)thathave likelyevolvedfromasingleancestralgene.Incomparativeanalyses,COGareoftenusedas proxyforfunctionalannotationsbecausetheirremarkableconservationsuggeststhat sequen-cesfrom thesameCOG mayhave preservedsomecommonfunctions [71].This standard approach, however, would not readily detect composite genes [76]. Using less stringent thresholdsformutualsequencecoverage(Figure3A)oridentitypercentage,sequence-similarity networkscanbeusedtodetectsuperfamilies[66,77–79],divergenthomologues,orcomposite genes[when,forexample,thelengthcoverageconditionisrelaxedtotakeintoaccount(partial) similarity(Figure3C),suchasdomainsharing,betweensequences][46].

Thesekindsofanalyseswithflexibledefinitionsconfirmthatnotalleukaryoticgenefamilieshave homologsinprokaryotes.Whenthey do,sequence-similaritynetworkanalysesindicate that eukaryoticgenefamilieshomologoustothoseofbacteria(forwhichsequencesofeukaryotes exclusivelyclusterwithsequencesfrombacteria[63])andeukaryoticgenefamilieshomologous tothoseofarchaea(forwhichsequencesofeukaryotesexclusivelyclusterwithsequencesfrom archaea)havedifferentratesofevolution.Forexample,eukaryoticgenefamilieswithbacterial originsaremoreeasilyexpandedorlostwheneukaryoticgenomesexpandorshrink,whilethe numberofeukaryoticgenefamilieswitharchaealoriginsismuchmorestable[22,63].

figureshowsapatternassociatedwiththedetectionofcompositegenes.Blackedgesdenotecomplete(>80%cover)and purpleedgedenotepartial(<80%cover)sequencesimilarity.Thegreenfamilyisacandidatesymbiogeneticcomposite gene,derivedfromendosymbioticlateralgenetransfer,sinceitdisplaysonepartwithsimilaritytohost-relatedsequences (yellow)andanotherpartwithsimilaritytoendosymbiont-related(blue)genes.(D)AconcreteexampleofapossibleEGT: archaealsequencesarerepresentedinblue,eubacterialinred,andeukaryoticgenesingreen(thereisalsoasingle plasmidicsequenceinblue-greenontheright).Eukaryoticsequencesclearlyformtwogroups,oneclosertoarchaea,one morerelatedtoeubacteria.AllthesequenceshaveagenericannotationasRNA-pseudouridinesynthase,butwhilethe eubacterial(andrelatedeukaryotic)sequencesareexclusivelytRNAsynthases(thusputativelyofmitochondrialorigin),on thearchaealside(thuspossiblyofhostorigin)wefindtRNA-aswellasrRNA-pseudouridinesynthases.Itindeedturnsout thatthisfamilycontainstwopseudouridinesynthasegenesthatarebothpresentinSaccharomycescerevisiae,havinga similarfunctionbutactingonadifferentsubstrate:oneonthearchaealside,codingforCbf5pthatactsonlargeandsmall rRNA[100,101],andtheotherontheeubacterialside,codingforPus4,thatactsonmitochondrialandcytoplasmic tRNA-uridine[102].

(12)

Moreover,sequence-similaritynetworksdemonstratedtheirefficiencytounraveldistant homo-loguesineukaryoticgenomes,thatis,genefamiliesforwhich somepresent-dayeukaryotes possessaversionthatoriginatedfromabacterialprogenitor,whileotherpresent-day eukar-yotespossessanhomologousversionthatoriginatedfromanarchaealprogenitor,orwhenthe sameeukaryotespossessbothdivergedversionsinitsnucleargenome,onefromabacterial origin,theotherfromanarchaealorigin[63](Figure3B).Thelatterpresenceofsuchdistant homologuescharacterizestheoccurrenceofEGT[7,59],anintrogressiveprocesswhereagene fromanorganelle(suchas mitochondria orplastids)has beenimportedinto theeukaryotic nuclear DNA, where an homologous nuclear copy from archaebacterial origin was already present(Figure3D).Thesenetworksarepromisingtolookforpossiblystill-hiddenEGT,andpast endosymbioseswhentheyareappliedtonewgenomicdata.

Sequence-similaritynetworksarealsomostusefulforidentifyingcompositegenes(Figure3C), andtheirusefordetectinggenescomposedofpartsfromdifferentoriginswilllikelysoonaid reticulateevolutionanalyses[46,47,53].Indeed,thelevelofmolecularintricacybetweenhosts andsymbiontsmaywellexceedwholegeneintrogressioninthegenomeofcomposite organ-isms. Preliminary results show that photosynthetic eukaryotes contain some novel nuclear composite genes, featuring unique couplings of domains from plastid origin, without any counterpartintheprokaryoticworld.Forexample,photosyntheticdinophytescontaina com-positegenecodingforaproteinconsistingoftwodomains:oneSufEdomainofcyanobacterial origin(i.e., probablyoriginating from the chloroplastgenome) anda tRNA (5-methylamino-methyl-2-thiouridylate)-methyltransferaseofproteobacterialorigin.Interestingly,SufEdisplays desulfurylaseactivity[80],whilethetRNA (5-methylaminomethyl-2-thiouridylate)-methyltrans-feraseposessesathiolgroup(R-S-H)containingasulfuratom.Itispossiblethatthesulfuratom requiredforthethiolgroupisprovidedbytheactivityofSufEtothenewphysicalcouplingof these domains ina symbiogenetic gene.Such findings encourage experimentalstudies to establish whether and which biological properties emerged from the physical coupling of domainsinanoveleukaryoticgenewithendosymbioticorigin.

Understandingtheentanglementofmolecularbuildingblocks,belowandabovethegenelevel, isprobablythenextsteprequiredtoanalyzemolecularprocessesgoingonduringevolutionary transitionsmediatedbythemergingoflineages[4,57–60].

ConcludingRemarks:NetworksEnhanceOurComprehensionof Life's

Complexity

Thecomplexityanddiversityofphenomena acknowledgedandinvestigatedbyevolutionary biologistsisstriking,andgrowing:itnowgoeswellbeyondtheidentificationoflineage diver-gence from a single common ancestor, enhancing what is considered as the Darwinian paradigm.When pushedtoits limits,introgression mightresultintheintegrationoflaterally acquiredfeaturesintoasustainablestructure,controllablebyregulatorysystems,whichmay themselvesbetheresultofintrogression.Atechnicalandtheoreticaltransitionhasaccompanied this broadening of scope within the evolutionary paradigm. Namely, network models and methods,nevertrulyabsentinbiologicalstudies[81],havebeendevelopedandimplemented. Hence,theynowofferpowerfulcomplementaryapproachestoevolutionarystudies,whichwill enhancetheexploitationofmoleculardatasetsinmultipledirections.Theroutesandgenetic goodsofmicrobialsociallife,theoriginsandcombinationrulesofcompositegenes,andthe genetictransformationcoupledwithmajorevolutionarytransitions,canreadilybeinvestigated usingpowerful,inclusive,comparativenetwork-basedtools.Thediversityofsuchtoolsisitself constantlyincreasing:themulti-thresholdedsequence-similaritynetworks,(multiplex)genome networks, and the bipartite graphs presented here, allow one to perform multi-agent and multilevelcomparative analyses, and may become as familiar to evolutionary biologists as phylogenetictreesinthenearfuture.Importantly,thesenetworktoolshavenotyetbeenused

OutstandingQuestions

Whataretherulesofdomainandgene shufflinginmicrobes? Sequence-simi-laritynetworksprovidefastand effec-tivemeansforsystematicanalysesof theevolutionofcompositegenes,by simultaneously detecting families of componentscontributingtocomposite genefamilies.Thephylogeneticorigins andthefunctionalcategoriesofthese components will show whether microbesareusingtransferredgenes tocreatenewcompositegenesintheir genomes.Forexample,dothe notori-ously mosaic haloarchaeal genomes harbor compositegenesof bacterial origin?Doestheproportion of com-positegenesinmicrobeschangewith theenvironment?Canoneintroduce modelsofnucleotidesubstitutioninto sequence-similaritynetworksinorder tomakethemmorerealisticwithregard tosequenceevolution?

Iseverygeneeverywhere? Gene-simi-laritynetworksapplied tolarge-scale metagenomicdataandgene-sharing networks featuring environments insteadofgenomesastheirnodeswill provide inclusive novel ways to addressthisimportantquestion.These graphs will show whether similar sequencesarefoundingeographically or ecologically similar environments, and serve todetect ubiquitous and endemicgenessets.

Whatphenotypes inholobionts have multipleorigins,thatis,didnotevolve withina single phylum butemerged froma biological collective?Bipartite graphswithmicrobialtaxaormicrobial gene families as bottom nodes and with animalor humanhosts as top nodeswill immediately allowfor the identification of phylogenetically het-erogeneous groups of microbes, or groupsof genefamiliesin microbes, always associated with a particular host-levelphenotype.

Howdoprocessesofmolecular evo-lution occurring at the level of the microbiota affect eukaryotic hosts? Themicrobialgenefamilies–eukaryotic hostbipartitegraphsdescribedabove can berefined totake into account informationaboutthemolecular evolu-tionofthegenefamilies(e.g.,theirrate ofevolution,orwhethertowhatextent and by whatmobile elements each genefamilywaseventuallytransferred). This adds an explicit evolutionary dimensiontothebottom-levelnodes, allowingonetoevaluate,forexample,

(13)

attheirfullpotential(seeOutstandingQuestions).Inparticular,theycouldalsobeusedtoanalyze theevolutionofcommunitiesofsyntheticmicroorganisms,biofilms,andholobionts.Theselatter collectivesystems encompassa challengingcomplexity. Forexample, holobionts relyon a multiplicityofinteractingtransmissionsystemsandchannelsfortheirdevelopmentandevolution thatdifferinthemicrobesandintheirhosts.Thisheterogeneitycomplicatestheunderstandingof thecausesofholobionts’collectivephenotypesbytraditionalmethods,eveninthemetazoan world[82].Applyinganetworkanalyticalframeworktoholobiontstudiesmaybeaninnovative way to decipher what traits, long held as characteristic of a single animal (i.e., species incompatibility,self-immunity,orpossiblybehavior[83,84]),orofanindividualorganism/biofilm (i.e., health conditions [85,86] or drug resistance), originate from complex interactions, at multiplebiological levels,andhow theseinvolvemicrobes andtheir genes. Moregenerally, network-thinkinghaslotstocontributetomicrobiology.

Acknowledgments

E.C.andE.B.arefundedbyFP7/2007-2013GrantAgreement#615274.

References

1. Darwin,C.(1859)OntheOriginofSpeciesbyMeansofNatural Selection,JohnMurray

2. O’Hara,R.J.(1997)Populationthinking andtreethinkingin systematics.Zool.Scr.26,323–329

3. Doolittle,W.F.andBapteste,E.(2007)Patternpluralism andtheTree ofLifehypothesis.Proc.Natl.Acad.Sci.U.S.A.104,2043–2049 4. Bapteste,E.etal.(2012)Evolutionaryanalysesof

non-genea-logicalbondsproducedbyintrogressivedescent.Proc.Natl. Acad.Sci.U.S.A.109,18266–18272

5. Doolittle,W.F.(1999)Phylogeneticclassificationandthe univer-saltree.Science284,2124–2129

6. Bapteste,E.(2014)Theoriginsofmicrobialadaptations:How introgressivedescent,egalitarianevolutionarytransitionsand expandedkinselectionshapethenetworkoflife.Front. Micro-biol.5,1–4

7. Archibald,J.M.(2015)Genomicperspectivesonthebirthand spreadofplastids:Fig1.Proc.Natl.Acad.Sci.U.S.A.112, 10147–10153

8. Lane,C.E.andArchibald,J.M.(2008)Theeukaryotictreeoflife: endosymbiosistakesitsTOL.TrendsEcol.Evol.23,268–275 9. Claverie,J-M.andOgata,H.(2009)Tengoodreasonsnotto

excludegirusesfromtheevolutionarypicture.Nat.Rev. Micro-biol.7,615

10. Koonin,E.V.etal.(2009)Compellingreasonswhyvirusesare relevantfortheoriginofcells.Nat.Rev.Microbiol.7,615 11. Moreira,D.andLópez-García,P.(2009)Tenreasonstoexclude

virusesfromthetreeoflife.Nat.Rev.Microbiol.7,306–311 12. Navas-Castillo,J.(2009)Sixcommentsonthetenreasonsfor

thedemotionofviruses.Nat.Rev.Microbiol.7,615 13. Raoult,D.(2009)Thereisnosuchthingasatreeoflife(andof

coursevirusesareout!).Nat.Rev.Microbiol.7,615 14. Villarreal,L.P.andWitzany,G.(2010)Virusesareessentialagents

withintherootsandstemofthetreeoflife.J.Theor.Biol.262, 698–710

15. Lukjancenko,O.etal.(2010)Comparisonof61sequenced Escherichiacoligenomes.Microb.Ecol.60,708–720 16. Tettelin,H.etal.(2005)Genomeanalysisofmultiplepathogenic

isolatesofStreptococcusagalactiae:Implicationsforthemicrobial “pan-genome”.Proc.Natl.Acad.Sci.U.S.A.102,13950–13955 17. Dagan,T.etal.(2008)Modularnetworksandcumulativeimpact oflateraltransferinprokaryotegenomeevolution.Proc.Natl. Acad.Sci.U.S.A.105,10039–10044

18. Dagan,T.andMartin,W.(2007)Ancestralgenomesizesspecify theminimumrateoflateralgenetransferduringprokaryote evolution.Proc.Natl.Acad.Sci.U.S.A.104,870–875 19. Halary,S.etal.(2010)Networkanalysesstructuregenetic

diver-sityinindependentgeneticworlds.Proc.Natl.Acad.Sci.U.S.A. 107,127–132

20. Skippington,E.andRagan,M.A.(2011)Lateralgenetictransfer andtheconstructionofgeneticexchangecommunities.FEMS Microbiol.Rev.35,707–735

21. Shapiro,J.A.(2007)Bacteriaaresmallbutnotstupid:cognition, naturalgeneticengineeringandsocio-bacteriology.Stud.Hist. Philos.Biol.Biomed.Sci.38,807–819

22. Ku,C.etal.(2015)Endosymbioticoriginanddifferentiallossof eukaryoticgenes.Nature524,427–432

23. Lobkovsky,A.E.etal.(2014)Estimationofprokaryotic super-genomesizeandcompositionfromgenefrequencydistributions. BMCGenomics15,S14

24. Kloesges,T.etal.(2011)Networksofgenesharingamong329 proteobacterialgenomesrevealdifferencesinlateralgene trans-ferfrequencyatdifferentphylogeneticdepths.Mol.Biol.Evol.28, 1057–1074

25. Popa,O.andDagan,T.(2011)Trendsandbarrierstolateralgene transferinprokaryotes.Curr.Opin.Microbiol.14,615–623 26. Popa,O.etal.(2011)Directednetworksrevealgenomicbarriers

andDNArepairbypassestolateralgenetransferamong prokar-yotes.GenomeRes.21,599–609

27. Jain,R.etal.(1999)Horizontalgenetransferamonggenomes: thecomplexityhypothesis.Proc.Natl.Acad.Sci.U.S.A.96, 3801–3806

28. Park,C.andZhang,J.(2012)Highexpressionhampers hori-zontalgenetransfer.GenomeBiol.Evol.4,523–532 29. Sorek,R.etal.(2007)Genome-wideexperimentaldetermination

ofbarrierstohorizontalgenetransfer.Science318,1449–1452 30. Cohen,O.etal.(2011)Thecomplexityhypothesisrevisited: connectivityratherthanfunctionconstitutesabarriertohorizontal genetransfer.Mol.Biol.Evol.28,1481–1489

31. Lamn,E.(2014)Inheritancesystems.InTheStanford Encyclo-pediaofPhilosophy(Winter2014)(Zalta,E.N.,ed.), http://plato. stanford.edu/archives/win2014/entries/inheritance-systems 32. Lima-Mendez,G.etal.(2008)Reticulaterepresentationof

evo-lutionaryandfunctionalrelationshipsbetweenphagegenomes. Mol.Biol.Evol.25,762–777

33. Halary,S.etal.(2013)EGN:awizardforconstructionofgeneand genomesimilaritynetworks.BMCEvol.Biol.13,146 34. McInerney,J.O.etal.(2011)Thepublicgoodshypothesisforthe

evolutionoflifeonEarth.Biol.Direct6,41

35. Brandes,U.(2008)Onvariantsofshortest-pathbetweenness centralityandtheirgenericcomputation.Soc.Netw.30,136–145 36. Hendrix,R.W.etal.(1999)Evolutionaryrelationshipsamong diversebacteriophagesandprophages:Alltheworld'saphage. Proc.Natl.Acad.Sci.U.S.A.96,2192–2197

37. Yutin,N.etal.(2013)Virophages,polintons,andtranspovirons:a complexevolutionarynetworkofdiverseselfishgeneticelements withdifferentreproductionstrategies.Virol.J.10,158

the impact of lateral gene transfer, operating at the microbial level, on thephenotypesoftheeukaryotichost. Forexample,itbecomeseasytotest whether laterally transferred genes, mobilizedbyabroaderrangeofmobile elements,aremorelargelydistributed inhumanhoststhanareresidentgene familiesofthemicrobiome.

Can one extend the methods from bipartitetotripartitegraphs,toaccount formorelevelsofbiological organiza-tion?Thisdefines,asarealistic objec-tive, the implementation of genes– genomes–environments tripartite graphs,whichcanthenbeclustered toprovideaglobalyetaccurate repre-sentationof thestructure of genetic diversityonEarthinasingle compara-tiveanalysis.

(14)

38. Ahn,Y-Y.etal.(2011)Flavornetworkandtheprinciplesoffood pairing.Sci.Rep.1,196

39. Rivera,C.G.etal.(2010)NeMo:networkmoduleidentificationin cytoscape.BMCBioinformatics11(Suppl.1),S61 40. Diestel,R.(2006)GraphTheory,SpringerScience&Business

Media

41. Aziz,R.K.etal.(2010)Transposasesarethemostabundant,most ubiquitousgenesinnature.NucleicAcidsRes.38,4207–4217 42. Derouiche,A.etal.(2015)Evolutionandtinkering:whatdoa

proteinkinase,atranscriptionalregulatorandchromosome seg-regation/celldivisionproteinshaveincommon?Curr.Genet. PublishedonlineAugust19,2015.http://dx.doi.org/10.1007/ s00294-015-0513-y

43. Kawashima,T.etal.(2009)Domainshufflingandtheevolutionof vertebrates.GenomeRes.19,1393–1403

44. Chothia,C.(2003)Evolutionoftheproteinrepertoire.Science 300,1701–1703

45. deSouza,S.J.(2012)Domainshufflingandtheincreasing com-plexityofbiologicalnetworks.Bioessays34,655–657 46. Jachiet,P.A.etal.(2013)MosaicFinder:Identificationoffused

genefamiliesinsequencesimilaritynetworks.Bioinformatics29, 837–844

47. Jachiet,P.etal.(2014)Extensivegeneremodelingintheviral world:newevidencefornon-gradualevolutioninthemobilome network.GenomeBiol.Evol.6,2195–2205

48. Cheng,S.etal.(2014)Sequencesimilaritynetworkrevealsthe imprintsofmajordiversificationeventsintheevolutionof micro-biallife.Front.Ecol.Evol.2,1–13

49. Hejnowicz,M.S.etal.(2009)Analysisofthecompletegenome sequenceofthelactococcalbacteriophagebIBB29.Int.J.Food Microbiol.131,52–61

50. Pasek,S.etal.(2006)Genefusion/fissionisamajorcontributor toevolutionofmulti-domainbacterialproteins.Bioinformatics22, 1418–1423

51. Kummerfeld,S.K.andTeichmann,S.A.(2005)Relativeratesof genefusionandfissioninmulti-domainproteins.TrendsGenet. 21,25–30

52. Snel,B.etal.(2000)Genomeevolution.TrendsGenet.16,9–11 53. Haggerty,L.S.etal.(2014)Apluralisticaccountofhomology: adaptingthemodelstothedata.Mol.Biol.Evol.31,501–516 54. Patthy,L.(2003)Modularassemblyofgenesandtheevolutionof

newfunctions.Genetica118,217–231

55. Nakamura,Y.etal.(2007)Rateandpolarityofgenefusionand fissioninOryzasativaandArabidopsisthaliana.Mol.Biol.Evol. 24,110–121

56. Dohrmann,J.etal.(2015)Globalmultipleprotein–protein inter-actionnetworkalignmentbycombiningpairwisenetwork align-ments.BMCBioinformatics16,S11

57. McInerney,J.O.etal.(2014)ThehybridnatureoftheEukaryotaand aconsilientviewoflifeonEarth.Nat.Rev.Microbiol.12,449–455 58. Williams,T.A.etal.(2013)Anarchaealoriginofeukaryotes supportsonlytwoprimarydomainsoflife.Nature504,231–236 59. Nelson-Sathi,S.etal.(2012)Acquisitionof1,000eubacterial genesphysiologicallytransformedamethanogenattheoriginof Haloarchaea.Proc.Natl.Acad.Sci.U.S.A.109,20537–20542 60. Nelson-Sathi,S.etal.(2015)Originsofmajorarchaealclades correspondtogeneacquisitionsfrom bacteria.Nature 517, 77–80

61. Alvarez-Ponce, D. and McInerney,J.O. (2011) Thehuman genomeretainsrelicsofitsprokaryoticancestry:humangenes ofarchaebacterialandeubacterialoriginexhibitremarkable differ-ences.GenomeBiol.Evol.3,782–790

62. Deane,J.A.etal.(2000)Evidencefornucleomorphtohost nucleusgenetransfer:light-harvestingcomplexproteinsfrom cryptomonadsandchlorarachniophytes.Protist151,239–252 63. Alvarez-Ponce,D.etal.(2013)Genesimilaritynetworksprovide toolsforunderstandingeukaryoteoriginsandevolution.Proc. Natl.Acad.Sci.U.S.A.110,E1594–E1603

64. Yona,G.etal.(2000)ProtoMap:automaticclassificationof proteinsequencesandhierarchyofproteinfamilies.Nucleic AcidsRes.28,49–55

65. Frickey,T.andLupas,A.(2004)CLANS:aJavaapplicationfor visualizingproteinfamiliesbasedonpairwisesimilarity. Bioinfor-matics20,3702–3704

66. Atkinson,H.J.etal.(2009)Usingsequencesimilaritynetworksfor visualizationofrelationshipsacrossdiverseproteinsuperfamilies. PLoSONE4,e4345

67. Forster,D.etal.(2014)Testingecologicaltheorieswithsequence similaritynetworks:marineciliatesexhibitsimilargeographic dispersalpatternsasmulticellularorganisms.ISMEJ.13,1–16 68. Bittner,L.etal.(2010)Someconsiderationsforanalyzing biodi-versityusingintegrativemetagenomicsandgenenetworks.Biol. Direct5,47

69. Altenhoff,A.M.etal.(2015)TheOMAorthologydatabasein 2015:functionpredictions,betterplantsupport,syntenyview andotherimprovements.NucleicAcidsRes.43,D240–D249 70. Sayers,E.W.etal.(2011)DatabaseresourcesoftheNational

CenterforBiotechnologyInformation.NucleicAcidsRes.39, D38–D51

71. Tatusov,R.L.etal.(1997)Agenomicperspectiveonprotein families.Science278,631–637

72. Bapteste,E.etal.(2013)Networks:expandingevolutionary thinking.TrendsGenet.29,439–441

73. Enright,A.J.andOuzounis,C.A.(2000)GeneRAGE:arobust algorithmforsequenceclusteringanddomaindetection. Bioin-formatics16,451–457

74. Li,L.etal.(2003)OrthoMCL:identificationoforthologgroupsfor eukaryoticgenomes.GenomeRes.13,2178–2189 75. Enright,A.J.etal.(2003)ProteinfamiliesandTRIBESingenome

sequencespace.NucleicAcidsRes.31,4632–4638 76. Song,N.etal.(2008)Sequencesimilaritynetworkreveals

com-monancestryofmultidomainproteins.PLoSComput.Biol.4, e1000063

77. Sasson,O.etal.(2003)ProtoNet:hierarchicalclassificationofthe proteinspace.NucleicAcidsRes.31,348–352

78. Matsui,M.etal.(2013)Comprehensivecomputationalanalysisof bacterialcrp/fnrsuperfamilyanditstargetmotifsrevealsstepwise evolutionoftranscriptionalnetworks.GenomeBiol.Evol.5,267– 282

79. Rappoport,N.etal.(2013)ProtoNet:chartingtheexpanding universeofproteinsequences.Nat.Biotechnol.31,290–292 80. Ollagnier-de-Choudens,S.etal.(2003)Mechanisticstudiesof

theSufS-SufEcysteinedesulfurase:evidenceforsulfurtransfer fromSufStoSufE.FEBSLett.555,263–267

81. Ragan,M.A.(2009)TreesandnetworksbeforeandafterDarwin. Biol.Direct4,43

82. Selosse,M-A.etal.(2014)Microbialprimingofplantandanimal immunity:symbiontsasdevelopmentalsignals.Trends Micro-biol.22,607–613

83. Brucker,R.M.andBordenstein,S.R.(2012)Speciationby sym-biosis.TrendsEcol.Evol.27,443–451

84. Brucker,R.M.andBordenstein,S.R.(2013)Thehologenomic basisofspeciation:gutbacteriacausehybridlethalityinthe genusNasonia.Science341,667–669

85. Hur,K.Y.andLee,M-S.(2015)Gutmicrobiotaandmetabolic disorders.DiabetesMetab.J.39,198–203

86. Gilbert,S.F.etal.(2012)Asymbioticviewoflife:wehavenever beenindividuals.Q.Rev.Biol.87,325–341

87. DunningHotopp,J.C.etal.(2007)Widespreadlateralgene transferfromintracellularbacteriatomulticellulareukaryotes. Science317,1753–1756

88. LaScola,B.etal.(2008)Thevirophageasauniqueparasiteof thegiantmimivirus.Nature455,100–104

89. Boyer,M.etal.(2011)Mimivirusshowsdramaticgenome reduc-tionafterintraamoebalculture.Proc.Natl.Acad.Sci.U.S.A.108, 10296–10301

90. Lanza,V.F.etal.(2015)TheplasmidomeofFirmicutes:impacton theemergenceandthespreadofresistancetoantimicrobials. Microbiol.Spectr.3, PLAS-0039-2014

91. Remigi,P.etal.(2014)Transienthypermutagenesisaccelerates theevolutionoflegumeendosymbiontsfollowinghorizontalgene transfer.PLoSBiol.12,e1001942

(15)

92. Serôdio,J.etal.(2014)Photophysiologyofkleptoplasts: photo-syntheticuseoflightbychloroplastslivinginanimalcells.Philos. Trans.R.Soc.Lond.B:Biol.Sci.369,20130242 93. Rauch,C.etal.(2015)Whyitistimetolookbeyondalgalgenesin

photosyntheticslugs.GenomeBiol.Evol.7,2602–2607 94. Ereshefsky,M.andPedroso,M.(2015)Rethinkingevolutionary

individuality.Proc.Natl.Acad.Sci.U.S.A.112,10126–10132 95. Martin,W.F.etal.(2015)Endosymbiotictheoriesfor eukary-oteorigin.Philos.Trans.R.Soc.Lond.B:Biol.Sci.370, 20140330

96. Orphan,V.J.etal.(2001)Comparativeanalysisof methane-oxidizingarchaeaandsulfate-reducingbacteriainanoxicmarine sediments.Appl.Environ.Microbiol.67,1922–1934 97. Ozuolmez,D.etal.(2015)Methanogenicarchaeaandsulfate

reducingbacteriaco-culturedonacetate:teamworkor coexis-tence?Front.Microbiol.6,492

98. Sun,M.etal.(2015)Microbialcommunityanalysisinricepaddy soilsirrigatedbyacidminedrainagecontaminatedwater.Appl. Microbiol.Biotechnol.99,2911–2922

99. Liu,Y.R.etal.(2014)Patternsofbacterialdiversityalonga long-termmercury-contaminatedgradientinthepaddysoils.Microb. Ecol.68,575–583

100.Lafontaine,D.L.etal.(1998)TheboxH+ACAsnoRNAscarry Cbf5p,theputativerRNApseudouridinesynthase.GenesDev. 12,527–537

101.Zebarjadian,Y.etal.(1999)PointmutationsinyeastCBF5can abolishinvivopseudouridylationofrRNA.Mol.Cell.Biol.19, 7461–7472

102.Becker,H.F.etal.(1997)TheyeastgeneYNL292wencodesa pseudouridinesynthase(Pus4)catalyzingtheformationofpsi55 inbothmitochondrialandcytoplasmictRNAs.NucleicAcidsRes. 25,4493–4499

Figure

Figure I. Several Illustrations of Mosaicism through Merging Events. (A) Composite genes result from the fusion of different gene domains
Figure 1. (A) Sequence-similarity network (SSN): each node (circle) represents a protein-coding gene sequence; the color and the label of the node represent the genome where the gene is found
Table 1. Statistics of the Prokaryote – Virus – Plasmid Gene Families – Genomes Bipartite Graphs a
Figure 2. Twins and Articulation Points in a Bipartite Graph. (A) Top nodes in this bipartite graph are genomes and bottom nodes gene families
+3

Références

Documents relatifs

Then we will discuss a generalization of matchstick graphs being proposed in [5] where more than one matchstick in a straight line can be used for an edge, that is, we ask for

longitudinal and c) transversal cuts. 136 Figure 88 : a) design of the TEM cell and the MEA-2, b) En field spatial distribution at the surface of MEA-2 microelectrodes at 1.8 GHz,

Asmare Muse, B, Rahman, M, Nagy, C, Cleve, A, Khomh, F &amp; Antoniol, G 2020, On the Prevalence, Impact, and Evolution of SQL code smells in Data-Intensive Systems. in Proceedings

3b, when an aggregated graph is used to detect the communities, the static algorithms (Infomap and Louvain) give the best NMI scores, with a slight advantage over the tensor

In this paper, an analysis of the vulnerability of the Italian high-voltage (380 kV) electrical transmission network (HVIET) is carried out for the identification of

These are given alongside a number of different metrics designed to help to filter families for more confident predictions, including the gene family size, number of composites

théologiques. Les savants restent sur une seule ligne de conduite, une seule linéarité dite officielle. Les détails sont du ressort de la littérature. Revenons à notre auteur,

‫الخــاتمة العامــة ‪:‬‬ ‫يتضق مف خالؿ دراستنا لموضوع االنعكاسات المرتقبة عمى انضماـ الدوؿ النامية في المنظمة‬ ‫العالمية لمتجارة مع اإلشارة إلى