• Aucun résultat trouvé

OutbreakTools: A new platform for disease outbreak analysis using the R software

N/A
N/A
Protected

Academic year: 2021

Partager "OutbreakTools: A new platform for disease outbreak analysis using the R software"

Copied!
8
0
0

Texte intégral

(1)

HAL Id: hal-02641666

https://hal.inrae.fr/hal-02641666

Submitted on 28 May 2020

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Distributed under a Creative Commons Attribution| 4.0 International License

OutbreakTools: A new platform for disease outbreak

analysis using the R software

Thibaut Jombart, David M. Aanensen, Marc Baguelin, Paul Birrell, Simon

Cauchemez, Anton Camacho, Caroline Colijn, Caitlin Collins, Anne Cori,

Xavier Didelot, et al.

To cite this version:

Thibaut Jombart, David M. Aanensen, Marc Baguelin, Paul Birrell, Simon Cauchemez, et al..

Out-breakTools: A new platform for disease outbreak analysis using the R software. Epidemics, Elsevier,

2014, 7, pp.28-34. �10.1016/j.epidem.2014.04.003�. �hal-02641666�

(2)

ContentslistsavailableatScienceDirect

Epidemics

jo u rn al h om ep age :w w w . e l s e v i e r . c o m / l o c a t e / e p i d e m i c s

OutbreakTools:

A

new

platform

for

disease

outbreak

analysis

using

the

R

software

Thibaut

Jombart

a,∗

,

David

M.

Aanensen

r,s,1

,

Marc

Baguelin

b,e,1

,

Paul

Birrell

c,1

,

Simon

Cauchemez

d,1

,

Anton

Camacho

e,1

,

Caroline

Colijn

f,1

,

Caitlin

Collins

a,1

,

Anne

Cori

a,1

,

Xavier

Didelot

a,1

,

Christophe

Fraser

a,1

,

Simon

Frost

g,1

,

Niel

Hens

h,i,1

,

Joseph

Hugues

j,1

,

Michael

Höhle

k,1

,

Lulla

Opatowski

l,1

,

Andrew

Rambaut

m,1

,

Oliver

Ratmann

a,1

,

Samuel

Soubeyrand

n,1

,

Marc

A.

Suchard

o,p,1

,

Jacco

Wallinga

q,1

,

Rolf

Ypma

q,1

,

Neil

Ferguson

a

aMRCCentreforOutbreakAnalysisandModelling,DepartmentofInfectiousDiseaseEpidemiology,SchoolofPublicHealth,ImperialCollegeLondon,United

Kingdom

bImmunisation,HepatitisandBloodSafetyDepartment,PublicHealthEngland,London,UnitedKingdom cMRCBiostatisticsUnit,InstituteofPublicHealth,UniversityForvieSite,Cambridge,UnitedKingdom dMathematicalModellingofInfectiousDiseasesUnit,InstitutPasteur,Paris,France

eCentrefortheMathematicalModellingofInfectiousDiseases,DepartmentofInfectiousDiseaseEpidemiology,LondonSchoolofHygieneandTropical

Medicine,UnitedKingdom

fDepartmentofMathematics,ImperialCollegeLondon,UnitedKingdom gDepartmentofVeterinaryMedicine,UniversityofCambridge,UnitedKingdom

hInteruniversityInstituteofBiostatisticsandStatisticalBioinformatics,HasseltUniversity,Hasselt,Belgium

iCentreforHealthEconomicResearchandModellingInfectiousDiseases,VaccineandInfectiousDiseaseInstitute,UniversityofAntwerp,Antwerp,Belgium jMRC-UniversityofGlasgowCentreforVirusResearch,InstituteofInfection,InflammationandImmunity,CollegeofMedical,VeterinaryandLifeSciences,

UniversityofGlasgow,Glasgow,UnitedKingdom

kDepartmentofMathematics,StockholmUniversity,Stockholm,Sweden

lPharmacoepidemiologyandInfectiousDiseasesUnit,UniversitédeVersaillesSaintQuentinEA4499/InstitutPasteur,Paris,France mInstituteofEvolutionaryBiology,CenterforImmunity,InfectionandEvolution,UniversityofEdinburgh,UnitedKingdom nINRA,UR546BiostatisticsandSpatialProcesses,Avignon84914,France

oDepartmentsofBiomathematicsandHumanGenetics,DavidGeffenSchoolofMedicineatUCLA,UniversityofCalifornia,LosAngeles,CA,USA pDepartmentofBiostatistics,UCLAFieldingSchoolofPublicHealth,UniversityofCalifornia,LosAngeles,CA,USA

qCenterforInfectiousDiseaseControl,NationalInstituteofPublicHealthandtheEnvironment,Bilthoven,TheNetherlands rDepartmentofInfectiousDiseaseEpidemiology,SchoolofPublicHealth,ImperialCollegeLondon,UnitedKingdom sWellcomeTrustSangerInstitute,WellcomeTrustGenomeCampus,Hinxton,Cambridge,UnitedKingdom

a

r

t

i

c

l

e

i

n

f

o

Articlehistory: Received5March2014 Accepted3April2014 Availableonline18April2014 Keywords: Software Free Bioinformatics Epidemiology R Epidemics Publichealth Infectiousdisease

a

b

s

t

r

a

c

t

Theinvestigationofinfectiousdiseaseoutbreaksreliesontheanalysisofincreasinglycomplexanddiverse data,whichoffernewprospectsforgaininginsightsintodiseasetransmissionprocessesandinforming publichealthpolicies.However,thepotentialofsuchdatacanonlybeharnessedusinganumberof dif-ferent,complementaryapproachesandtools,andaunifiedplatformfortheanalysisofdiseaseoutbreaks isstilllacking.Inthispaper,wepresentthenewRpackageOutbreakTools,whichaimstoprovideabasis foroutbreakdatamanagementandanalysisinR.OutbreakToolsisdevelopedbyacommunityof epidemi-ologists,statisticians,modellersandbioinformaticians,andimplementsclassesandmethodsforstoring, handlingandvisualizingoutbreakdata.Itincludesrealandsimulatedoutbreakdatasets.Togetherwith anumberoftoolsforinfectiousdiseaseepidemiologyrecentlymadeavailableinR,OutbreakTools con-tributestotheemergenceofanew,freeandopen-sourceplatformfortheanalysisofdiseaseoutbreaks. CrownCopyright©2014PublishedbyElsevierB.V.ThisisanopenaccessarticleundertheCCBY license(http://creativecommons.org/licenses/by/3.0/).

∗ Correspondingauthorat:MRCCentreforOutbreakAnalysisandModelling,DepartmentofInfectiousDiseaseEpidemiologyImperialCollege,SchoolofPublicHealth,St Mary’sCampus,NorfolkPlace,LondonW21PG,UnitedKingdom.Tel.:+4402075943658.

E-mailaddresses:thibautjombart@gmail.com,t.jombart@imperial.ac.uk,tjombart@imperial.ac.uk(T.Jombart).

1 Authorsorderedalphabetically.

http://dx.doi.org/10.1016/j.epidem.2014.04.003

(3)

Introduction

Infectiousdiseaseoutbreakinvestigationisacomplextaskin whichavarietyofdatasourcescanbeexploitedforattemptingto uncoverthespatio-temporaldynamicsandtransmissionpathways ofapathogeninapopulation.Thesedatacanincludeinformation oncases’symptoms,theircontacts,resultsofdiagnostictestsand, increasingly,pathogengeneticsequences.Suchrichand diverse dataofferunprecedentedprospectsforunderstandingtheprocess ofdiseasetransmissionandultimatelydesigningadapted contain-mentstrategiesandprophylaxis.

Dedicated methodological approaches are traditionally used to analyze different types of data separately, and can exploit informationsuchasthegenerationtimedistributionandthe tim-ingofsymptomonsets(Wallingaand Teunis,2004;Hensetal., 2012),contactpatternsamongstindividuals(Calatayudetal.,2010; Cauchemezetal.,2011),geographiclocationsofthecases(Truscott etal.,2007;ChisSterandFerguson, 2007), orpathogengenetic sequences(Vegaetal., 2004;Jombartetal.,2011; Harrisetal., 2013).Interestingly,theadventofgeneticdatahasalsotriggereda numberofmethodologicaldevelopmentsaimingtoexploit differ-enttypesofdatasimultaneously(Ypmaetal.,2012;Morellietal., 2012;Teunisetal.,2013;Jombartetal.,2014; Mollentzeetal., 2014).Unfortunately,fewoftheseapproachesarewidelyavailable tothecommunityascomputersoftware,andaunifiedplatformfor theanalysisofdiseaseoutbreaksisstilllacking.

Becauseit is free, open-source,and hosts thelargest collec-tionoftoolsforstatisticalanalysis,theRsoftwareenvironment (RCoreTeam,2013a)appearsanidealhostforthedevelopment ofsuchaplatform.Besidesdedicatedpackagesfore.g.advanced linearmodelling(Faraway,2004), timeseries(Cowpertwaitand Metcalfe,2009),spatialprocesses(Bivandetal.,2008), multivari-atemethods(Karatzoglouetal.,2004;ZouandHastie,2012;Dray and Dufour, 2007), genetic data analysis (Paradis et al., 2004; Jombart, 2008; Jombart and Ahmed, 2011; Paradis, 2010) and graphics(Wickham,2009),Roffersthefullflexibilityofan inter-pretedcomputer language,alliedwiththepossibility of calling uponprecompiledroutines,e.g.inC,C++orFortran,whenever com-putationallyintensivetasksneedtobeundertaken.Risalready hosting a growing number of packages for infectious disease epidemiology, includingsurveillance (Höhle,2007)for temporal andspatio-temporalmodelling(includingoutbreakdetection),R0 (Obadiaetal.,2012),TreePar(StadlerandBonhoeffer,2013)and Epi-Estim(Corietal.,2013)forreproductionnumberestimation,and outbreaker(Jombartetal.,2014)fortransmissiontree reconstruc-tion.

Toensurecoherencebetweenthesedifferentapproachesand promotefurtherdevelopments,basictoolsforstoringand hand-ling outbreak data are needed. In order to fill this gap, a communityofepidemiologists,modellers,statisticiansand bioin-formaticians hasdeveloped theR package OutbreakTools. Here, “outbreakdata” is defined as theabove-described collectionof dataoriginatingfromasetofoutbreakcases.Thissoftware, ini-tiated duringa hackathon forthe analysisofdisease outbreaks in R (http://sites.google.com/site/hackoutwiki/), provides object classesimplementingaflexibleandcoherentrepresentationof out-breakdata,alongsideprocedurestomanipulate,summarizeand visualizethesedata.Inthispaper,weprovideanoverviewofthe mainfeatures ofOutbreakTools,anddiscussthefutureofRasa platformfortheanalysisofoutbreakdata.

Results

The mainpurpose ofOutbreakTools is toprovide a coherent yetflexiblewayofstoringoutbreakdata.Toachievethisgoal,a

Table1

Contentoftheformal(S4)class‘obkData’.InstancesoftheclassobkDatacanstore avarietyofdataintheindicatedslots.Fillingtheslotsisoptional,andemptyslots areallNULL.

Slotname Content

@individuals data.framecontainingpatientmeta-data(e.g.age,sex). @records listofdata.framecontainingtime-stampedobservations

madeoncases(e.g.fever,swabresults);allowsfor repeatedobservationsonthesameindividual. @dna obkSequencesobjectcontainingpathogengenetic

sequencesforoneorseveralgeneswithrecorded collectiondates;usestheclass‘DNAbin’tostoresequences; allowsformultiplesequencesforthesamecases. @contacts obkContactsobjectstoringcontactdatabetweenpatients,

storedasastaticordynamicnetwork;usestheclasses ‘network’and‘networkDynamic’.

@trees multiphyloobjectstoringoneorseveralphylogenetictrees ofpathogengenomes;usestheclass‘phylo’tostoretrees. @context alistofdata.framescontextualdatarelevantata

populationlevel(e.g.schoolclosure)

newformal(S4)class‘obkData’(shortfor‘outbreakdata’)hasbeen developed.Thisclassusesdifferentslots(Table1)tostore indi-vidualmetadata(e.g.age,sex),time-stampedobservationsmade ontheindividuals (e.g.fever, swabresults, oranswers onfood exposuresfromquestionnaires),contactsbetweenpatients,DNA sequencesofthepathogen,phylogenetictrees,andcontextualdata at thepopulationlevel(e.g.schoolclosures, climaticvariables). ComplexdatastructuressuchasdynamiccontactnetworksorDNA sequencesfromdifferentgenesarerespectivelystoredusingthe newclasses‘obkContacts’and‘obkSequences’.

To promote interoperability, okbDataobjects can becreated from standardinput filesvia proceduresalready availablein R. Datatablescanbeimportedfromtextfiles(extensions‘.txt’and ‘.csv’), from otherstatistical softwareusing thepackage foreign

Fig.1.TimelineofsamplesoftheNewmarketequineinfluenzaoutbreak(HorseFlu dataset).ThisfigurerepresentsthetemporaldistributionoftheVIRALshedding samplesgatheredduringtheoutbreak.Eachhorizontallinerepresentsanindividual. Individualsaresortedandcolouredbyyard.Repeatedsamplesgatheredonthesame individualarerepresentedusingdifferentsymbols.Thecodeforreproducingthis figureisprovidedinAppendix1.

(4)

(RCoreTeam,2013b),orfromXMLfilesusingthepackageXML (Butts,2008).AlignedDNAsequencesinFASTAformatcanberead usingape(Paradisetal.,2004)oradegenet(Jombart,2008;Jombart andAhmed,2011),andphylogenetictreescanbeimportedfrom NewickorNEXUSformatusingape(Paradisetal.,2004).Toensure thatobkDataobjectsarereadilycompatiblewithotherRpackages, existingclasseshavebeenusedforstoringdatawhenever possi-ble:theclass‘DNAbin’forDNAsequences(Paradisetal.,2004), theclasses‘network’and‘networkDynamic’forcontactdata(Butts, 2008),andtheclass‘phylo’forphylogenetictrees(Paradisetal., 2004).

Considerable efforts have been made to ensure that these differentpiecesofinformationarestoredinacoherentway.The useofaformal(S4)classsystemoffersmultipleadvantagesinthis respect,asitallowsonetoaccuratelydefinetheobject’scontent, and to perform consistency checks between thedifferent data sourceswhentheobjectiscreated.Thismeans,forinstance,that individualsdocumentedinthecontactorsymptomdatawillbe linked,throughuniqueindividualidentifiers,toavailable individ-ualmeta-data,orthattipsofthetreeswillbelinkedtoexisting DNA sequenceswheneverpossible. Similarly,datesprovided in differentformats areautomaticallystandardized,andsequences ofthesamegenesarecheckedforconsistentlength.AsobkData objectsallowforcoherentdatastorageandcanbesavedeasilyas compressedRobjects(usingthefunctionsave),theyalsooffera

newandefficientwayofsharingdataamongstcollaboratorsand makingstudiesreproducibleafterpublication.

Despite this complex data structure, accessing information stored in obkData objects is facilitated by a large number of accessors. These functions allow for the retrieval of specific data (get.data), including sampling dates (get.dates), con-tacts(get.contacts),individualmeta-data(get.individuals) orDNAsequencesfromgivengenes(get.dna),withoutrequiring knowledgeabouttheinternaldatastructure.Importantly, decou-plingtheaccesstoinformationfromtheinternaldatastoragealso ensures long-term code portability: future changes in the data structurewillnotaffectresultsaslongasaccessorsreturnthesame information.Thisapproachwillenablefuturedevelopmentsofthe obkDataclassandallowfortheincorporationofnewtypesofdata. Besidesaccessors,datahandlingisalsofacilitatedbyasubsetting procedure(functionsubset)whichallowsonetoisolatedatafor givensetsofindividuals,samples,genes,sequences,orfromagiven timewindow.

TheinformationcontainedinobkDataobjectscanbeeasily visu-alizedusingoptionsofthegenericfunctionplot,ordirectlyusing dedicatedfunctions.Individualtimelinescanbeusedtovisualize courseof illness and collection datesof samplesfor each indi-vidual(functionplotIndividualTimeline,Fig.1),mapscanbe drawn toassessthe geographicdistributionof thecases (func-tionplotGeo),contactdatacanbevisualizedasgraphs(function

Fig.2. PhylogenyofpandemicinfluenzaH1N1sequences(FluH1N1pdm2009dataset).Thisphylogenetictreebasedon514hemagglutininsegmentsofpandemicinfluenza H1N1wasplottedusingthefunctionplotggphy.ThecodeforreproducingthisfigureisprovidedinAppendix1.

(5)

Fig.3.SimulatedoutbreakusingsimuEpi.ThisoutbreakwassimulatedunderaSIRmodelwith100hosts,contactrateˇ=0.005andrecoveryrate=0.1.(a)Dynamicsof theoutbreakshowingthenumbersofsusceptibles,infectedandrecoveredovertime.(b)Transmissiontree,whereeachdotisalabelledcasewithcoloursrepresentingthe dateofinfection.(c)Neighbour-JoiningphylogenyreconstructedfromthesimulatedDNAsequences,ladderizedandrootedtothefirstcase.Thecodeforreproducingthese figuresisprovidedinAppendix1.

plotforobkContactsobjects),andgeneticdatacanbevisualizedas phylogenies(functionplotggphy,Fig.2)andminimumspanning trees(functionplotggMST).Mostofthesegraphstakeadvantage ofthehigh-qualitycustomisablegraphicsimplementedinggplot2 (Wickham,2009).

WhileOutbreakToolsfocusesonstoring,handlingand visualiz-ingdata,thepackagealsoimplementsbasictoolsfordataanalysis. Adaptedsummaries(functionsummary)havebeenimplemented toprovidequickinsightsintothedata,make.phylocanbeusedto obtainphylogeniesforallgenesofthedataset,andget.incidence canbeusedtocomputeincidencefromdatesofsymptomonsets, butalsofromanytime-stampeddata.Inthelattersituation, pos-itivecasescanbedefinedfromeitherquantitativeorcategorical data,byspecifyingarangeofnumericalvalues,alistofcharacter stringsorevenregularexpressions.Inpractice,thisallowsforthe computationofincidencebasedonanysymptomdataorsample analysis.Thisfeaturethereforeallowsforadirectuseofprocedures implementedinR0 (Obadiaetal.,2012)orEpiEstim(Corietal., 2013)forestimatingreproductionnumbers.

To illustrate its features, OutbreakTools is released with bothsimulatedand empiricaldatasets,including514annotated DNA sequences of the 2009 influenza pandemic (dataset FluH1N1pdm2009,Fig.2)anddatafromalargeNewmarket(UK) outbreak of equineinfluenza (dataset HorseFlu; Hughes etal., 2012, Fig. 1). Finally, OutbreakTools also includes a simulation tool(functionsimuEpi)whichallowsforthegenerationof out-breaks(includingpathogengenomesequences)underastandard

SIRmodel(Fig.3),andcaneasilybeextendedtouseothermodels (e.g.SIS,SEIR).OutbreakToolsisdocumentedina50-pagemanual andreleasedwithatutorialintroducingthedatastructuresandthe mainfunctionalitiesofthepackage.

Discussion

While a number of packagesfor infectiousdisease epidemi-ology haverecentlybeendeveloped intheRsoftware(Jombart et al., 2014;Obadia et al.,2012; Stadler and Bonhoeffer, 2013; Cori et al.,2013),basic toolsfor storing,handlingand visualiz-ingoutbreakdatahavesofarbeenlacking.OutbreakToolsfillsthis gapbyimplementingnewformalclassesallowingforacoherent yetflexiblerepresentation ofdiseaseoutbreakdata,alongsidea number offunctionsfor manipulatingandvisualizingthatdata. Assuch,itrepresentsasignificantsteptowardsbuildinga com-prehensiveplatformforoutbreakanalysisinR.Thecollaborative and open nature of this project, together with the possibility of modifying internal data structures seamlessly for the user, ensures that OutbreakTools willbe ableto evolve and adapt to incorporatenewtypesofdataandapproachesusedforoutbreak analysis.

The new availability of basic tools for outbreak analysis will hopefully encourage the further development of tools for investigatingepidemics.Itshouldinparticularfacilitatethe imple-mentationofnovelintegrativeapproachesabletoexploitvarious typesofdatasimultaneously(Ypmaetal.,2012;Morellietal.,2012;

(6)

Teuniset al.,2013;Mollentzeetal.,2014).Comparingthetools emergingfromthisstill-burgeoningmethodologicalfieldwilllikely beuseful,aswasrecentlydemonstratedbytheHIVmodelling com-munity(Eatonetal.,2012).Inthisrespect,theexistenceofaunified platformfortheanalysisofdiseaseoutbreaksshouldprovidethe commongroundneededforsuchcomparisonstobedrawn.More generally,theprovisionofacoherentstructureforstoringoutbreak datawilldrasticallyimprovetheeaseofdataexchangeamongst collaborators and hopefully encourage data sharing withinthe community.

Arguably,thechoiceofRfor developing anew platformfor outbreakanalysismayinitiallyappealmostlytoacommunityof Rexperts,andconsiderable effortsshouldbemadetoreachas broadanaudienceaspossible.First,providingfreetutorialsand teachingmaterialisparamountformakingnewtoolsaccessibleto thecommunityatlarge.Thisistheobjectiveofthe“R-epiproject” (http://sites.google.com/site/therepiproject/),awebsitedeveloped collaborativelyandaimingtoprovidefreeresourcesfortheanalysis ofdiseaseoutbreaksprimarilyinR,butalsousingotherfree soft-ware.Interestingly,recentdevelopmentssuchasthepackageshiny (Beeley,2013)dramaticallyaidinthedevelopmentofuser-friendly webinterfacesrunningRtools.Suchapproachescouldbe consid-eredforreachingouttoanevenbroaderaudienceandtryingand maximizetheavailabilityofleading-edgemethodsforepidemics analysistothecommunityatlarge,includingnotonlymodellers andstatisticians,butalsoepidemiologistsandpublichealth agen-cies.

Resources

Availability: OutbreakTools 0.1–0 is distributed on CRAN (http://cran.r-project.org/)andavailableforR3.0.2onWindows, MacOSX,and Linux platforms.It can beinstalled asanyother packageusingthegraphicaluserinterfaceortypingtheinstruction: install.packages(“OutbreakTools”)

Licence:GNUGeneralPublicLicence(GPL)≥2.

Website:http://sites.google.com/site/therepiproject/r-pac/about Documentation: besides the usual package documentation, OutbreakToolsisreleasedwithatutorialwhichcanbeopenedby typing:vignette(“OutbreakTools”).Moredocumentationcan befoundontheproject’swebsite.

Development:thedevelopmentofOutbreakToolsishostedon Sourceforge:http://sourceforge.net/projects/hackout/

Newcontributionsarewelcomeandencouraged.

Acknowledgments

We arethankful toTanjaStadler andtwo anonymous refer-eesfortheirthoroughreviewsand usefulcomments.Wethank Sourceforge(http://sourceforge.net/)forprovidingagreatplatform forsoftwaredevelopment.OutbreakToolshasbeencreatedduring “Hackout: ahackathonforthe analysis of diseaseoutbreaks inR” (http://sites.google.com/site/hackoutwiki/),aneventfundedbythe MedicalResearchCouncil.T.JombartandC.Collinsarefundedby theMedicalResearchCouncilandMIDAS.D.Aanensenisfunded by theWellcomeTrust (grant099202/Z/12/Z). M.Baguelinand A.CamachoaresupportedbytheWellcomeTrust(projectgrant WR/094527).A.CamachoisalsofundedbytheMedicalResearch Council(fellowshipMR/J01432X/1).A.CorithankstheNIHfor fund-ingthroughtheNIAIDcooperativeagreementUM1AI068619.M.A. SuchardissupportedthroughNationalScienceFoundationgrants DMS126153andIIS1251151.N.Hensacknowledgessupportfrom theUniversityofAntwerpscientificchairinEvidence-Based Vac-cinology,financedin2009–2014byagiftfromPfizer.O.Ratmann issupportedbytheWellcomeTrust(fellowshipWR092311MF).C. ColijnacknowledgessupportfromtheEngineeringand Physical SciencesResearchCouncil(EP/I031626/1;EP/K026003/1).S.Frost issupportedinpartbytheRoyalSociety,theERSC(ES/J011266/1), andtheMRC(MR/J013862/1).

AppendixA. Rcodeforreproducingfigures

## LOAD PACKAGES ##

library("OutbreakTools")

library("ggplot2"

)

library("ape")

library("adegenet")

library("adephylo")

## FIGURE 1: TIMELINE OF HORSEFLU DATA ##

## LOAD DATA

data("HorseFlu")

## CREATE BASIC FIGURE

figure1 <- plot(HorseFlu, orderBy="yardID", colorBy="yardID",

what="shedding", size=3)

## CUSTOMIZE LEGEND

S

figure1 <- figure1 + scale_color_discrete("Yard identifier") +

guides(col=guide_legend(ncol = 3))

figure1 <- figure1 + scale_shape("Shedding data", labels=paste("sample",

1:6)) + labs(x="Collection date")

## DISPLAY FIGURE

figure1

(7)

## CREATE FIGURE

figure2 <- plotggphy(x, ladderize = TRUE, branch.unit = "year", major.breaks

= "2 month", axis.date.format = "%b%Y", tip.color = "location", tip.size=3,

tip.alpha=0.75)

## DISPLAY FIGUR

E

figure2

## FIGURE 3: SIMULATED OUTBREAK ##

## SIMULATE OUTBREAK

set.seed(1)

epi <- simuEpi(N=100, beta=.005, D=50

)

## CREATE BASIC FIGURE - PANEL

A

figure3a <- epi$plot + labs(y="Number of individuals"

)

## DISPLAY FIGURE

- PANEL

A

figure3

a

## CREATE BASIC FIGURE - PANEL

B

net <- epi$x@contact

s

## DEFINE COLORS AND ANNOTATIONS

dates <- get.data(epi$x, "DateInfected"

)

days <- as.integer(date

s-min(dates)

)

col.info <

col.graph <- col.info$col[as.numeric(get.individuals(net))]

- any2col(days, col.pal=seasun

)

annot <- min(dates) + col.info$leg.tx

t

annot

<- format(annot, "%b %d"

)

## DISPLAY FIGURE - PANEL

B

par(mar=c(.2,1,1,0.2),xpd=TRUE)

plot(net, vertex.cex=1.5, vertex.col=col.graph)

legend("topleft", fill=col.info$leg.col, leg=annot, bg=transp("white"),cex=1,

title="Date of infection", ncol=2, inset=c

(-.02)

)

## BUILD NEIGHBOUR JOINING TREE - PANEL

C

tre <- nj(dist.dna(get.dna(epi$x)$locus.1)

)

tre <- ladderize(tre) # ladderize the tre

e

tre <- root(tre,1) # root tree to first cas

e

## DISPLAY FIGURE - PANEL

C

par(mar=rep(2,4))

bullseye(tre, tip.color=col.info$col, circ.bg=transp("grey",.1), font=2

)

legend("bottomright", fill=col.info$leg.col, leg=annot,

bg=transp("white"),cex=1, title="Date of infection", ncol=2, inset=c(-.02)

)

box("figure")

## FIGURE 2: PHYLOGENY OF PANDEMIC H1N1 DATA #

#

## LOAD DATA

data("FluH1N1pdm2009")

attach(FluH1N1pdm2009)

## CREATE OBKDATA OBJECT

x <- new("obkData", individuals = individuals, dna = FluH1N1pdm2009$dna

,

dna.individualID = samples$individualID, dna.date = samples$date

,

trees = FluH1N1pdm2009$trees)

detach(FluH1N1pdm2009)

(8)

References

Beeley,C.,2013.WebApplicationDevelopmentwithRusingShiny.PacktPublishing.

Bivand,R.,Pebesma,E.,Gómez-Rubio,V.,2008.AppliedSpatialDataAnalysiswith R.Springer.

Butts,C.,2008.network:apackageformanagingrelationaldatainR.J.Stat.Softw. 24.

Calatayud,L.,Kurkela,S.,Neave,P.E.,Brock,A.,Perkins,S.,etal.,2010.Pandemic (H1N1)2009virusoutbreakinaschoolinLondon,April–May2009:an obser-vationalstudy.Epidemiol.Infect.138,183–191.

Cauchemez,S.,Bhattarai,A.,Marchbanks,T.L.,Fagan,R.P.,Ostroff,S.,etal.,2011.

Roleofsocialnetworksinshapingdiseasetransmissionduringacommunity outbreakof2009H1N1pandemicinfluenza.Proc.Natl.Acad.Sci.U.S.A.108, 2825–2830.

ChisSter,I.,Ferguson,N.M.,2007.Transmissionparametersofthe2001footand mouthepidemicinGreatBritain.PLoSONE2,e502.

Cori,A.,Ferguson,N.M.,Fraser,C.,Cauchemez,S.,2013.Anewframeworkand soft-waretoestimatetime-varyingreproductionnumbersduringepidemics.Am.J. Epidemiol.178,1505–1512.

Cowpertwait,P.,Metcalfe,A.,2009.IntroductoryTimeSerieswithR.Springer.

Dray,S.,Dufour,A.,2007.Theade4package:implementingthedualitydiagramfor ecologists.J.Stat.Softw.22.

Eaton,J.W.,Johnson,L.F.,Salomon,J.A.,Bärnighausen,T.,Bendavid,E.,etal.,2012.

HIVtreatmentasprevention:systematiccomparisonofmathematicalmodelsof thepotentialimpactofantiretroviraltherapyonHIVincidenceinSouthAfrica. PLoSMed.9,e1001245.

Faraway,J.,2004.ExtendingtheLinearModelwithR:GeneralizedLinear,Mixed EffectsandNonparametricRegressionModels.Taylor&Francis.

Harris, S.R., Cartwright, E.J., Török, M.E., Holden, M.T., Brown, N.M., et al., 2013.Whole-genomesequencingforanalysisofanoutbreakof meticillin-resistantStaphylococcusaureus:adescriptivestudy.Lancet Infect.Dis.13, 130–136.

Hens,N.,Calatayud,L.,Kurkela,S.,Tamme,T.,Wallinga,J.,2012.Robust recon-structionandanalysisofoutbreakdata:influenzaA(H1N1)vtransmissionin aschool-basedpopulation.Am.J.Epidemiol.176,196–203.

Höhle,M.,2007.surveillance:anRpackageforthemonitoringofinfectiousdiseases. Comput.Stat.22,571–582.

Hughes,J.,Allen,R.C.,Baguelin,M.,Hampson,K.,Baillie,G.J.,etal.,2012. Trans-missionofequine influenzavirus duringan outbreak ischaracterized by frequentmixedinfectionsandloosetransmissionbottlenecks.PLoSPathog.8, e1003081.

Jombart,T.,2008.adegenet:aRpackageforthemultivariateanalysisofgenetic markers.Bioinformatics24,1403–1405.

Jombart,T.,Ahmed,I.,2011.adegenet1.3-1:newtoolsfortheanalysisof genome-wideSNPdata.Bioinformatics27,3070–3071.

Jombart,T.,Eggo,R.M.,Dodd,P.J.,Balloux,F.,2011.Reconstructingdiseaseoutbreaks fromgeneticdata:agraphapproach.Heredity106,383–390.

Jombart,T.,Cori,A.,Didelot,X.,Cauchemez,S.,Fraser,C.,Ferguson,N.,2014.Bayesian reconstructionofdiseaseoutbreaksbycombiningepidemiologicandgenomic data.PLoSComput.Biol.10,e1003457.

Karatzoglou,A.,Smola,A.,Hornik,K.,Zeileis,A.,2004.kernlab—AnS4Packagefor KernelMethodsinR.J.Stat.Softw.11,1–20.

Mollentze,N.,etal.,2014.ABayesianapproachforinferringthedynamicsofpartially observedendemicinfectiousdiseasesfromspace–time–geneticdata.Proc.Biol. Sci.281,20133251.

Morelli,M.J.,Thébaud,G.,Chadœuf,J.,King,D.P.,Haydon,D.T.,etal.,2012.ABayesian inferenceframeworktoreconstructtransmissiontreesusingepidemiological andgeneticdata.PLoSComput.Biol.8,e1002768.

Obadia,T.,Haneef,R.,Boëlle,P-Y.,2012.TheR0package:atoolboxtoestimate repro-ductionnumbersforepidemicoutbreaks.BMCMed.Inform.Decis.Mak.12, 147.

Paradis,E.,2010.pegas:anRpackageforpopulationgeneticswithan integrated-modularapproach.Bioinformatics26,419–420.

Paradis,E.,Claude,J.,Strimmer,K.,2004.APE:analysesofphylogeneticsand evolu-tioninRlanguage.Bioinformatics20,289–290.

RCoreTeam,2013a.R:ALanguageandEnvironmentforStatisticalComputing.R FoundationforStatisticalComputing,Vienna,Austria.

RCoreTeam,2013b.foreign:ReadDataStoredbyMinitab,S,SAS,SPSS,Stata,Systat, dBase.Rpackageversion0.8-61,pp.8–53.

Stadler,T.,Bonhoeffer,S.,2013.Uncoveringepidemiologicaldynamicsin heteroge-neoushostpopulationsusingphylogeneticmethods.Philos.Trans.R.Soc.Lond. B:Biol.Sci.368,20120198.

Teunis,P.,Heijne,J.C.M.,Sukhrie,F.,vanEijkeren,J.,Koopmans,M.,etal.,2013.

Infectiousdiseasetransmissionasaforensicproblem:whoinfectedwhom?J. R.Soc.Interface10,20120955.

Truscott,J.,Garske,T.,Chis-Ster,I.,Guitian,J.,Pfeiffer,D.,etal.,2007.Controlofa highlypathogenicH5N1avianinfluenzaoutbreakintheGBpoultryflock.Proc. Biol.Sci.274,2287–2295.

Vega,V.B.,Ruan,Y.,Liu,J.,Lee,W.H.,Wei,C.L.,etal.,2004.Mutationaldynamicsof theSARScoronavirusincellcultureandhumanpopulationsisolatedin2003. BMCInfect.Dis.4,32.

Wallinga,J.,Teunis,P.,2004.Differentepidemiccurvesforsevereacuterespiratory syndromerevealsimilarimpactsofcontrolmeasures.Am.J.Epidemiol.160, 509–516.

Wickham,H.,2009.ggplot2:ElegantGraphicsforDataAnalysis.Springer,NewYork.

Ypma,R.J.F.,Bataille,A.M.A.,Stegeman,A.,Koch,G.,Wallinga, J.,etal.,2012.

Unravellingtransmissiontreesofinfectiousdiseasesbycombininggeneticand epidemiologicaldata.Proc.Biol.Sci.279,444–450.

Zou,H.,Hastie,T.,2012.elasticnet:ElasticNetregularizationandvariableselection. RPackageVersion11.

Figure

Fig. 1. Timeline of samples of the Newmarket equine influenza outbreak (HorseFlu dataset)
Fig. 2. Phylogeny of pandemic influenza H1N1 sequences ( FluH1N1pdm2009 dataset). This phylogenetic tree based on 514 hemagglutinin segments of pandemic influenza H1N1 was plotted using the function plotggphy
Fig. 3. Simulated outbreak using simuEpi . This outbreak was simulated under a SIR model with 100 hosts, contact rate ˇ = 0.005 and recovery rate  = 0.1

Références

Documents relatifs

To the best of our knowledge, we provide for the first time sufficient conditions ensuring the existence of a steady state regime for the Green-Keldysh correlation functions

There are two main methods that are used to model the spread of an infectious disease: agent-based modelling and equation based modelling.. In this paper, we compare the results from

The WHO Emergency Response Framework will be updated with new performance targets in the areas of leadership (for example, an incident manager with delegated authorities is to

The document describes the elements of contact tracing; the procedures for conducting contact tracing up to the point of discharging the contacts; precautions to

Calculez le pourcentage ou la valeur

To resume, with an imperfect farm credit market due to asymmetric information, a FMD outbreak will have short run effects on production levels if the temporary

MRC Centre for Outbreak Analysis and Modelling, NIHR Health Protection Research Unit in Modelling Methodology, and Department of Infectious Disease Epidemiology, Imperial

The students’ variation of using negative and positive strategies in the three different situations, propose that the strategy used by students while dealing with their teachers and