HAL Id: hal-02641666
https://hal.inrae.fr/hal-02641666
Submitted on 28 May 2020
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Distributed under a Creative Commons Attribution| 4.0 International License
OutbreakTools: A new platform for disease outbreak
analysis using the R software
Thibaut Jombart, David M. Aanensen, Marc Baguelin, Paul Birrell, Simon
Cauchemez, Anton Camacho, Caroline Colijn, Caitlin Collins, Anne Cori,
Xavier Didelot, et al.
To cite this version:
Thibaut Jombart, David M. Aanensen, Marc Baguelin, Paul Birrell, Simon Cauchemez, et al..
Out-breakTools: A new platform for disease outbreak analysis using the R software. Epidemics, Elsevier,
2014, 7, pp.28-34. �10.1016/j.epidem.2014.04.003�. �hal-02641666�
ContentslistsavailableatScienceDirect
Epidemics
jo u rn al h om ep age :w w w . e l s e v i e r . c o m / l o c a t e / e p i d e m i c s
OutbreakTools:
A
new
platform
for
disease
outbreak
analysis
using
the
R
software
Thibaut
Jombart
a,∗,
David
M.
Aanensen
r,s,1,
Marc
Baguelin
b,e,1,
Paul
Birrell
c,1,
Simon
Cauchemez
d,1,
Anton
Camacho
e,1,
Caroline
Colijn
f,1,
Caitlin
Collins
a,1,
Anne
Cori
a,1,
Xavier
Didelot
a,1,
Christophe
Fraser
a,1,
Simon
Frost
g,1,
Niel
Hens
h,i,1,
Joseph
Hugues
j,1,
Michael
Höhle
k,1,
Lulla
Opatowski
l,1,
Andrew
Rambaut
m,1,
Oliver
Ratmann
a,1,
Samuel
Soubeyrand
n,1,
Marc
A.
Suchard
o,p,1,
Jacco
Wallinga
q,1,
Rolf
Ypma
q,1,
Neil
Ferguson
aaMRCCentreforOutbreakAnalysisandModelling,DepartmentofInfectiousDiseaseEpidemiology,SchoolofPublicHealth,ImperialCollegeLondon,United
Kingdom
bImmunisation,HepatitisandBloodSafetyDepartment,PublicHealthEngland,London,UnitedKingdom cMRCBiostatisticsUnit,InstituteofPublicHealth,UniversityForvieSite,Cambridge,UnitedKingdom dMathematicalModellingofInfectiousDiseasesUnit,InstitutPasteur,Paris,France
eCentrefortheMathematicalModellingofInfectiousDiseases,DepartmentofInfectiousDiseaseEpidemiology,LondonSchoolofHygieneandTropical
Medicine,UnitedKingdom
fDepartmentofMathematics,ImperialCollegeLondon,UnitedKingdom gDepartmentofVeterinaryMedicine,UniversityofCambridge,UnitedKingdom
hInteruniversityInstituteofBiostatisticsandStatisticalBioinformatics,HasseltUniversity,Hasselt,Belgium
iCentreforHealthEconomicResearchandModellingInfectiousDiseases,VaccineandInfectiousDiseaseInstitute,UniversityofAntwerp,Antwerp,Belgium jMRC-UniversityofGlasgowCentreforVirusResearch,InstituteofInfection,InflammationandImmunity,CollegeofMedical,VeterinaryandLifeSciences,
UniversityofGlasgow,Glasgow,UnitedKingdom
kDepartmentofMathematics,StockholmUniversity,Stockholm,Sweden
lPharmacoepidemiologyandInfectiousDiseasesUnit,UniversitédeVersaillesSaintQuentinEA4499/InstitutPasteur,Paris,France mInstituteofEvolutionaryBiology,CenterforImmunity,InfectionandEvolution,UniversityofEdinburgh,UnitedKingdom nINRA,UR546BiostatisticsandSpatialProcesses,Avignon84914,France
oDepartmentsofBiomathematicsandHumanGenetics,DavidGeffenSchoolofMedicineatUCLA,UniversityofCalifornia,LosAngeles,CA,USA pDepartmentofBiostatistics,UCLAFieldingSchoolofPublicHealth,UniversityofCalifornia,LosAngeles,CA,USA
qCenterforInfectiousDiseaseControl,NationalInstituteofPublicHealthandtheEnvironment,Bilthoven,TheNetherlands rDepartmentofInfectiousDiseaseEpidemiology,SchoolofPublicHealth,ImperialCollegeLondon,UnitedKingdom sWellcomeTrustSangerInstitute,WellcomeTrustGenomeCampus,Hinxton,Cambridge,UnitedKingdom
a
r
t
i
c
l
e
i
n
f
o
Articlehistory: Received5March2014 Accepted3April2014 Availableonline18April2014 Keywords: Software Free Bioinformatics Epidemiology R Epidemics Publichealth Infectiousdisease
a
b
s
t
r
a
c
t
Theinvestigationofinfectiousdiseaseoutbreaksreliesontheanalysisofincreasinglycomplexanddiverse data,whichoffernewprospectsforgaininginsightsintodiseasetransmissionprocessesandinforming publichealthpolicies.However,thepotentialofsuchdatacanonlybeharnessedusinganumberof dif-ferent,complementaryapproachesandtools,andaunifiedplatformfortheanalysisofdiseaseoutbreaks isstilllacking.Inthispaper,wepresentthenewRpackageOutbreakTools,whichaimstoprovideabasis foroutbreakdatamanagementandanalysisinR.OutbreakToolsisdevelopedbyacommunityof epidemi-ologists,statisticians,modellersandbioinformaticians,andimplementsclassesandmethodsforstoring, handlingandvisualizingoutbreakdata.Itincludesrealandsimulatedoutbreakdatasets.Togetherwith anumberoftoolsforinfectiousdiseaseepidemiologyrecentlymadeavailableinR,OutbreakTools con-tributestotheemergenceofanew,freeandopen-sourceplatformfortheanalysisofdiseaseoutbreaks. CrownCopyright©2014PublishedbyElsevierB.V.ThisisanopenaccessarticleundertheCCBY license(http://creativecommons.org/licenses/by/3.0/).
∗ Correspondingauthorat:MRCCentreforOutbreakAnalysisandModelling,DepartmentofInfectiousDiseaseEpidemiologyImperialCollege,SchoolofPublicHealth,St Mary’sCampus,NorfolkPlace,LondonW21PG,UnitedKingdom.Tel.:+4402075943658.
E-mailaddresses:thibautjombart@gmail.com,t.jombart@imperial.ac.uk,tjombart@imperial.ac.uk(T.Jombart).
1 Authorsorderedalphabetically.
http://dx.doi.org/10.1016/j.epidem.2014.04.003
Introduction
Infectiousdiseaseoutbreakinvestigationisacomplextaskin whichavarietyofdatasourcescanbeexploitedforattemptingto uncoverthespatio-temporaldynamicsandtransmissionpathways ofapathogeninapopulation.Thesedatacanincludeinformation oncases’symptoms,theircontacts,resultsofdiagnostictestsand, increasingly,pathogengeneticsequences.Suchrichand diverse dataofferunprecedentedprospectsforunderstandingtheprocess ofdiseasetransmissionandultimatelydesigningadapted contain-mentstrategiesandprophylaxis.
Dedicated methodological approaches are traditionally used to analyze different types of data separately, and can exploit informationsuchasthegenerationtimedistributionandthe tim-ingofsymptomonsets(Wallingaand Teunis,2004;Hensetal., 2012),contactpatternsamongstindividuals(Calatayudetal.,2010; Cauchemezetal.,2011),geographiclocationsofthecases(Truscott etal.,2007;ChisSterandFerguson, 2007), orpathogengenetic sequences(Vegaetal., 2004;Jombartetal.,2011; Harrisetal., 2013).Interestingly,theadventofgeneticdatahasalsotriggereda numberofmethodologicaldevelopmentsaimingtoexploit differ-enttypesofdatasimultaneously(Ypmaetal.,2012;Morellietal., 2012;Teunisetal.,2013;Jombartetal.,2014; Mollentzeetal., 2014).Unfortunately,fewoftheseapproachesarewidelyavailable tothecommunityascomputersoftware,andaunifiedplatformfor theanalysisofdiseaseoutbreaksisstilllacking.
Becauseit is free, open-source,and hosts thelargest collec-tionoftoolsforstatisticalanalysis,theRsoftwareenvironment (RCoreTeam,2013a)appearsanidealhostforthedevelopment ofsuchaplatform.Besidesdedicatedpackagesfore.g.advanced linearmodelling(Faraway,2004), timeseries(Cowpertwaitand Metcalfe,2009),spatialprocesses(Bivandetal.,2008), multivari-atemethods(Karatzoglouetal.,2004;ZouandHastie,2012;Dray and Dufour, 2007), genetic data analysis (Paradis et al., 2004; Jombart, 2008; Jombart and Ahmed, 2011; Paradis, 2010) and graphics(Wickham,2009),Roffersthefullflexibilityofan inter-pretedcomputer language,alliedwiththepossibility of calling uponprecompiledroutines,e.g.inC,C++orFortran,whenever com-putationallyintensivetasksneedtobeundertaken.Risalready hosting a growing number of packages for infectious disease epidemiology, includingsurveillance (Höhle,2007)for temporal andspatio-temporalmodelling(includingoutbreakdetection),R0 (Obadiaetal.,2012),TreePar(StadlerandBonhoeffer,2013)and Epi-Estim(Corietal.,2013)forreproductionnumberestimation,and outbreaker(Jombartetal.,2014)fortransmissiontree reconstruc-tion.
Toensurecoherencebetweenthesedifferentapproachesand promotefurtherdevelopments,basictoolsforstoringand hand-ling outbreak data are needed. In order to fill this gap, a communityofepidemiologists,modellers,statisticiansand bioin-formaticians hasdeveloped theR package OutbreakTools. Here, “outbreakdata” is defined as theabove-described collectionof dataoriginatingfromasetofoutbreakcases.Thissoftware, ini-tiated duringa hackathon forthe analysisofdisease outbreaks in R (http://sites.google.com/site/hackoutwiki/), provides object classesimplementingaflexibleandcoherentrepresentationof out-breakdata,alongsideprocedurestomanipulate,summarizeand visualizethesedata.Inthispaper,weprovideanoverviewofthe mainfeatures ofOutbreakTools,anddiscussthefutureofRasa platformfortheanalysisofoutbreakdata.
Results
The mainpurpose ofOutbreakTools is toprovide a coherent yetflexiblewayofstoringoutbreakdata.Toachievethisgoal,a
Table1
Contentoftheformal(S4)class‘obkData’.InstancesoftheclassobkDatacanstore avarietyofdataintheindicatedslots.Fillingtheslotsisoptional,andemptyslots areallNULL.
Slotname Content
@individuals data.framecontainingpatientmeta-data(e.g.age,sex). @records listofdata.framecontainingtime-stampedobservations
madeoncases(e.g.fever,swabresults);allowsfor repeatedobservationsonthesameindividual. @dna obkSequencesobjectcontainingpathogengenetic
sequencesforoneorseveralgeneswithrecorded collectiondates;usestheclass‘DNAbin’tostoresequences; allowsformultiplesequencesforthesamecases. @contacts obkContactsobjectstoringcontactdatabetweenpatients,
storedasastaticordynamicnetwork;usestheclasses ‘network’and‘networkDynamic’.
@trees multiphyloobjectstoringoneorseveralphylogenetictrees ofpathogengenomes;usestheclass‘phylo’tostoretrees. @context alistofdata.framescontextualdatarelevantata
populationlevel(e.g.schoolclosure)
newformal(S4)class‘obkData’(shortfor‘outbreakdata’)hasbeen developed.Thisclassusesdifferentslots(Table1)tostore indi-vidualmetadata(e.g.age,sex),time-stampedobservationsmade ontheindividuals (e.g.fever, swabresults, oranswers onfood exposuresfromquestionnaires),contactsbetweenpatients,DNA sequencesofthepathogen,phylogenetictrees,andcontextualdata at thepopulationlevel(e.g.schoolclosures, climaticvariables). ComplexdatastructuressuchasdynamiccontactnetworksorDNA sequencesfromdifferentgenesarerespectivelystoredusingthe newclasses‘obkContacts’and‘obkSequences’.
To promote interoperability, okbDataobjects can becreated from standardinput filesvia proceduresalready availablein R. Datatablescanbeimportedfromtextfiles(extensions‘.txt’and ‘.csv’), from otherstatistical softwareusing thepackage foreign
Fig.1.TimelineofsamplesoftheNewmarketequineinfluenzaoutbreak(HorseFlu dataset).ThisfigurerepresentsthetemporaldistributionoftheVIRALshedding samplesgatheredduringtheoutbreak.Eachhorizontallinerepresentsanindividual. Individualsaresortedandcolouredbyyard.Repeatedsamplesgatheredonthesame individualarerepresentedusingdifferentsymbols.Thecodeforreproducingthis figureisprovidedinAppendix1.
(RCoreTeam,2013b),orfromXMLfilesusingthepackageXML (Butts,2008).AlignedDNAsequencesinFASTAformatcanberead usingape(Paradisetal.,2004)oradegenet(Jombart,2008;Jombart andAhmed,2011),andphylogenetictreescanbeimportedfrom NewickorNEXUSformatusingape(Paradisetal.,2004).Toensure thatobkDataobjectsarereadilycompatiblewithotherRpackages, existingclasseshavebeenusedforstoringdatawhenever possi-ble:theclass‘DNAbin’forDNAsequences(Paradisetal.,2004), theclasses‘network’and‘networkDynamic’forcontactdata(Butts, 2008),andtheclass‘phylo’forphylogenetictrees(Paradisetal., 2004).
Considerable efforts have been made to ensure that these differentpiecesofinformationarestoredinacoherentway.The useofaformal(S4)classsystemoffersmultipleadvantagesinthis respect,asitallowsonetoaccuratelydefinetheobject’scontent, and to perform consistency checks between thedifferent data sourceswhentheobjectiscreated.Thismeans,forinstance,that individualsdocumentedinthecontactorsymptomdatawillbe linked,throughuniqueindividualidentifiers,toavailable individ-ualmeta-data,orthattipsofthetreeswillbelinkedtoexisting DNA sequenceswheneverpossible. Similarly,datesprovided in differentformats areautomaticallystandardized,andsequences ofthesamegenesarecheckedforconsistentlength.AsobkData objectsallowforcoherentdatastorageandcanbesavedeasilyas compressedRobjects(usingthefunctionsave),theyalsooffera
newandefficientwayofsharingdataamongstcollaboratorsand makingstudiesreproducibleafterpublication.
Despite this complex data structure, accessing information stored in obkData objects is facilitated by a large number of accessors. These functions allow for the retrieval of specific data (get.data), including sampling dates (get.dates), con-tacts(get.contacts),individualmeta-data(get.individuals) orDNAsequencesfromgivengenes(get.dna),withoutrequiring knowledgeabouttheinternaldatastructure.Importantly, decou-plingtheaccesstoinformationfromtheinternaldatastoragealso ensures long-term code portability: future changes in the data structurewillnotaffectresultsaslongasaccessorsreturnthesame information.Thisapproachwillenablefuturedevelopmentsofthe obkDataclassandallowfortheincorporationofnewtypesofdata. Besidesaccessors,datahandlingisalsofacilitatedbyasubsetting procedure(functionsubset)whichallowsonetoisolatedatafor givensetsofindividuals,samples,genes,sequences,orfromagiven timewindow.
TheinformationcontainedinobkDataobjectscanbeeasily visu-alizedusingoptionsofthegenericfunctionplot,ordirectlyusing dedicatedfunctions.Individualtimelinescanbeusedtovisualize courseof illness and collection datesof samplesfor each indi-vidual(functionplotIndividualTimeline,Fig.1),mapscanbe drawn toassessthe geographicdistributionof thecases (func-tionplotGeo),contactdatacanbevisualizedasgraphs(function
Fig.2. PhylogenyofpandemicinfluenzaH1N1sequences(FluH1N1pdm2009dataset).Thisphylogenetictreebasedon514hemagglutininsegmentsofpandemicinfluenza H1N1wasplottedusingthefunctionplotggphy.ThecodeforreproducingthisfigureisprovidedinAppendix1.
Fig.3.SimulatedoutbreakusingsimuEpi.ThisoutbreakwassimulatedunderaSIRmodelwith100hosts,contactrateˇ=0.005andrecoveryrate=0.1.(a)Dynamicsof theoutbreakshowingthenumbersofsusceptibles,infectedandrecoveredovertime.(b)Transmissiontree,whereeachdotisalabelledcasewithcoloursrepresentingthe dateofinfection.(c)Neighbour-JoiningphylogenyreconstructedfromthesimulatedDNAsequences,ladderizedandrootedtothefirstcase.Thecodeforreproducingthese figuresisprovidedinAppendix1.
plotforobkContactsobjects),andgeneticdatacanbevisualizedas phylogenies(functionplotggphy,Fig.2)andminimumspanning trees(functionplotggMST).Mostofthesegraphstakeadvantage ofthehigh-qualitycustomisablegraphicsimplementedinggplot2 (Wickham,2009).
WhileOutbreakToolsfocusesonstoring,handlingand visualiz-ingdata,thepackagealsoimplementsbasictoolsfordataanalysis. Adaptedsummaries(functionsummary)havebeenimplemented toprovidequickinsightsintothedata,make.phylocanbeusedto obtainphylogeniesforallgenesofthedataset,andget.incidence canbeusedtocomputeincidencefromdatesofsymptomonsets, butalsofromanytime-stampeddata.Inthelattersituation, pos-itivecasescanbedefinedfromeitherquantitativeorcategorical data,byspecifyingarangeofnumericalvalues,alistofcharacter stringsorevenregularexpressions.Inpractice,thisallowsforthe computationofincidencebasedonanysymptomdataorsample analysis.Thisfeaturethereforeallowsforadirectuseofprocedures implementedinR0 (Obadiaetal.,2012)orEpiEstim(Corietal., 2013)forestimatingreproductionnumbers.
To illustrate its features, OutbreakTools is released with bothsimulatedand empiricaldatasets,including514annotated DNA sequences of the 2009 influenza pandemic (dataset FluH1N1pdm2009,Fig.2)anddatafromalargeNewmarket(UK) outbreak of equineinfluenza (dataset HorseFlu; Hughes etal., 2012, Fig. 1). Finally, OutbreakTools also includes a simulation tool(functionsimuEpi)whichallowsforthegenerationof out-breaks(includingpathogengenomesequences)underastandard
SIRmodel(Fig.3),andcaneasilybeextendedtouseothermodels (e.g.SIS,SEIR).OutbreakToolsisdocumentedina50-pagemanual andreleasedwithatutorialintroducingthedatastructuresandthe mainfunctionalitiesofthepackage.
Discussion
While a number of packagesfor infectiousdisease epidemi-ology haverecentlybeendeveloped intheRsoftware(Jombart et al., 2014;Obadia et al.,2012; Stadler and Bonhoeffer, 2013; Cori et al.,2013),basic toolsfor storing,handlingand visualiz-ingoutbreakdatahavesofarbeenlacking.OutbreakToolsfillsthis gapbyimplementingnewformalclassesallowingforacoherent yetflexiblerepresentation ofdiseaseoutbreakdata,alongsidea number offunctionsfor manipulatingandvisualizingthatdata. Assuch,itrepresentsasignificantsteptowardsbuildinga com-prehensiveplatformforoutbreakanalysisinR.Thecollaborative and open nature of this project, together with the possibility of modifying internal data structures seamlessly for the user, ensures that OutbreakTools willbe ableto evolve and adapt to incorporatenewtypesofdataandapproachesusedforoutbreak analysis.
The new availability of basic tools for outbreak analysis will hopefully encourage the further development of tools for investigatingepidemics.Itshouldinparticularfacilitatethe imple-mentationofnovelintegrativeapproachesabletoexploitvarious typesofdatasimultaneously(Ypmaetal.,2012;Morellietal.,2012;
Teuniset al.,2013;Mollentzeetal.,2014).Comparingthetools emergingfromthisstill-burgeoningmethodologicalfieldwilllikely beuseful,aswasrecentlydemonstratedbytheHIVmodelling com-munity(Eatonetal.,2012).Inthisrespect,theexistenceofaunified platformfortheanalysisofdiseaseoutbreaksshouldprovidethe commongroundneededforsuchcomparisonstobedrawn.More generally,theprovisionofacoherentstructureforstoringoutbreak datawilldrasticallyimprovetheeaseofdataexchangeamongst collaborators and hopefully encourage data sharing withinthe community.
Arguably,thechoiceofRfor developing anew platformfor outbreakanalysismayinitiallyappealmostlytoacommunityof Rexperts,andconsiderable effortsshouldbemadetoreachas broadanaudienceaspossible.First,providingfreetutorialsand teachingmaterialisparamountformakingnewtoolsaccessibleto thecommunityatlarge.Thisistheobjectiveofthe“R-epiproject” (http://sites.google.com/site/therepiproject/),awebsitedeveloped collaborativelyandaimingtoprovidefreeresourcesfortheanalysis ofdiseaseoutbreaksprimarilyinR,butalsousingotherfree soft-ware.Interestingly,recentdevelopmentssuchasthepackageshiny (Beeley,2013)dramaticallyaidinthedevelopmentofuser-friendly webinterfacesrunningRtools.Suchapproachescouldbe consid-eredforreachingouttoanevenbroaderaudienceandtryingand maximizetheavailabilityofleading-edgemethodsforepidemics analysistothecommunityatlarge,includingnotonlymodellers andstatisticians,butalsoepidemiologistsandpublichealth agen-cies.
Resources
Availability: OutbreakTools 0.1–0 is distributed on CRAN (http://cran.r-project.org/)andavailableforR3.0.2onWindows, MacOSX,and Linux platforms.It can beinstalled asanyother packageusingthegraphicaluserinterfaceortypingtheinstruction: install.packages(“OutbreakTools”)
Licence:GNUGeneralPublicLicence(GPL)≥2.
Website:http://sites.google.com/site/therepiproject/r-pac/about Documentation: besides the usual package documentation, OutbreakToolsisreleasedwithatutorialwhichcanbeopenedby typing:vignette(“OutbreakTools”).Moredocumentationcan befoundontheproject’swebsite.
Development:thedevelopmentofOutbreakToolsishostedon Sourceforge:http://sourceforge.net/projects/hackout/
Newcontributionsarewelcomeandencouraged.
Acknowledgments
We arethankful toTanjaStadler andtwo anonymous refer-eesfortheirthoroughreviewsand usefulcomments.Wethank Sourceforge(http://sourceforge.net/)forprovidingagreatplatform forsoftwaredevelopment.OutbreakToolshasbeencreatedduring “Hackout: ahackathonforthe analysis of diseaseoutbreaks inR” (http://sites.google.com/site/hackoutwiki/),aneventfundedbythe MedicalResearchCouncil.T.JombartandC.Collinsarefundedby theMedicalResearchCouncilandMIDAS.D.Aanensenisfunded by theWellcomeTrust (grant099202/Z/12/Z). M.Baguelinand A.CamachoaresupportedbytheWellcomeTrust(projectgrant WR/094527).A.CamachoisalsofundedbytheMedicalResearch Council(fellowshipMR/J01432X/1).A.CorithankstheNIHfor fund-ingthroughtheNIAIDcooperativeagreementUM1AI068619.M.A. SuchardissupportedthroughNationalScienceFoundationgrants DMS126153andIIS1251151.N.Hensacknowledgessupportfrom theUniversityofAntwerpscientificchairinEvidence-Based Vac-cinology,financedin2009–2014byagiftfromPfizer.O.Ratmann issupportedbytheWellcomeTrust(fellowshipWR092311MF).C. ColijnacknowledgessupportfromtheEngineeringand Physical SciencesResearchCouncil(EP/I031626/1;EP/K026003/1).S.Frost issupportedinpartbytheRoyalSociety,theERSC(ES/J011266/1), andtheMRC(MR/J013862/1).
AppendixA. Rcodeforreproducingfigures
## LOAD PACKAGES ##
library("OutbreakTools")
library("ggplot2"
)
library("ape")
library("adegenet")
library("adephylo")
## FIGURE 1: TIMELINE OF HORSEFLU DATA ##
## LOAD DATA
data("HorseFlu")
## CREATE BASIC FIGURE
figure1 <- plot(HorseFlu, orderBy="yardID", colorBy="yardID",
what="shedding", size=3)
## CUSTOMIZE LEGEND
S
figure1 <- figure1 + scale_color_discrete("Yard identifier") +
guides(col=guide_legend(ncol = 3))
figure1 <- figure1 + scale_shape("Shedding data", labels=paste("sample",
1:6)) + labs(x="Collection date")
## DISPLAY FIGURE
figure1
## CREATE FIGURE
figure2 <- plotggphy(x, ladderize = TRUE, branch.unit = "year", major.breaks
= "2 month", axis.date.format = "%b%Y", tip.color = "location", tip.size=3,
tip.alpha=0.75)
## DISPLAY FIGUR
E
figure2
## FIGURE 3: SIMULATED OUTBREAK ##
## SIMULATE OUTBREAK
set.seed(1)
epi <- simuEpi(N=100, beta=.005, D=50
)
## CREATE BASIC FIGURE - PANEL
A
figure3a <- epi$plot + labs(y="Number of individuals"
)
## DISPLAY FIGURE
- PANEL
A
figure3
a
## CREATE BASIC FIGURE - PANEL
B
net <- epi$x@contact
s
## DEFINE COLORS AND ANNOTATIONS
dates <- get.data(epi$x, "DateInfected"
)
days <- as.integer(date
s-min(dates)
)
col.info <
col.graph <- col.info$col[as.numeric(get.individuals(net))]
- any2col(days, col.pal=seasun
)
annot <- min(dates) + col.info$leg.tx
t
annot
<- format(annot, "%b %d"
)
## DISPLAY FIGURE - PANEL
B
par(mar=c(.2,1,1,0.2),xpd=TRUE)
plot(net, vertex.cex=1.5, vertex.col=col.graph)
legend("topleft", fill=col.info$leg.col, leg=annot, bg=transp("white"),cex=1,
title="Date of infection", ncol=2, inset=c
(-.02)
)
## BUILD NEIGHBOUR JOINING TREE - PANEL
C
tre <- nj(dist.dna(get.dna(epi$x)$locus.1)
)
tre <- ladderize(tre) # ladderize the tre
e
tre <- root(tre,1) # root tree to first cas
e
## DISPLAY FIGURE - PANEL
C
par(mar=rep(2,4))
bullseye(tre, tip.color=col.info$col, circ.bg=transp("grey",.1), font=2
)
legend("bottomright", fill=col.info$leg.col, leg=annot,
bg=transp("white"),cex=1, title="Date of infection", ncol=2, inset=c(-.02)
)
box("figure")
## FIGURE 2: PHYLOGENY OF PANDEMIC H1N1 DATA #
#
## LOAD DATA
data("FluH1N1pdm2009")
attach(FluH1N1pdm2009)
## CREATE OBKDATA OBJECT
x <- new("obkData", individuals = individuals, dna = FluH1N1pdm2009$dna
,
dna.individualID = samples$individualID, dna.date = samples$date
,
trees = FluH1N1pdm2009$trees)
detach(FluH1N1pdm2009)
References
Beeley,C.,2013.WebApplicationDevelopmentwithRusingShiny.PacktPublishing.
Bivand,R.,Pebesma,E.,Gómez-Rubio,V.,2008.AppliedSpatialDataAnalysiswith R.Springer.
Butts,C.,2008.network:apackageformanagingrelationaldatainR.J.Stat.Softw. 24.
Calatayud,L.,Kurkela,S.,Neave,P.E.,Brock,A.,Perkins,S.,etal.,2010.Pandemic (H1N1)2009virusoutbreakinaschoolinLondon,April–May2009:an obser-vationalstudy.Epidemiol.Infect.138,183–191.
Cauchemez,S.,Bhattarai,A.,Marchbanks,T.L.,Fagan,R.P.,Ostroff,S.,etal.,2011.
Roleofsocialnetworksinshapingdiseasetransmissionduringacommunity outbreakof2009H1N1pandemicinfluenza.Proc.Natl.Acad.Sci.U.S.A.108, 2825–2830.
ChisSter,I.,Ferguson,N.M.,2007.Transmissionparametersofthe2001footand mouthepidemicinGreatBritain.PLoSONE2,e502.
Cori,A.,Ferguson,N.M.,Fraser,C.,Cauchemez,S.,2013.Anewframeworkand soft-waretoestimatetime-varyingreproductionnumbersduringepidemics.Am.J. Epidemiol.178,1505–1512.
Cowpertwait,P.,Metcalfe,A.,2009.IntroductoryTimeSerieswithR.Springer.
Dray,S.,Dufour,A.,2007.Theade4package:implementingthedualitydiagramfor ecologists.J.Stat.Softw.22.
Eaton,J.W.,Johnson,L.F.,Salomon,J.A.,Bärnighausen,T.,Bendavid,E.,etal.,2012.
HIVtreatmentasprevention:systematiccomparisonofmathematicalmodelsof thepotentialimpactofantiretroviraltherapyonHIVincidenceinSouthAfrica. PLoSMed.9,e1001245.
Faraway,J.,2004.ExtendingtheLinearModelwithR:GeneralizedLinear,Mixed EffectsandNonparametricRegressionModels.Taylor&Francis.
Harris, S.R., Cartwright, E.J., Török, M.E., Holden, M.T., Brown, N.M., et al., 2013.Whole-genomesequencingforanalysisofanoutbreakof meticillin-resistantStaphylococcusaureus:adescriptivestudy.Lancet Infect.Dis.13, 130–136.
Hens,N.,Calatayud,L.,Kurkela,S.,Tamme,T.,Wallinga,J.,2012.Robust recon-structionandanalysisofoutbreakdata:influenzaA(H1N1)vtransmissionin aschool-basedpopulation.Am.J.Epidemiol.176,196–203.
Höhle,M.,2007.surveillance:anRpackageforthemonitoringofinfectiousdiseases. Comput.Stat.22,571–582.
Hughes,J.,Allen,R.C.,Baguelin,M.,Hampson,K.,Baillie,G.J.,etal.,2012. Trans-missionofequine influenzavirus duringan outbreak ischaracterized by frequentmixedinfectionsandloosetransmissionbottlenecks.PLoSPathog.8, e1003081.
Jombart,T.,2008.adegenet:aRpackageforthemultivariateanalysisofgenetic markers.Bioinformatics24,1403–1405.
Jombart,T.,Ahmed,I.,2011.adegenet1.3-1:newtoolsfortheanalysisof genome-wideSNPdata.Bioinformatics27,3070–3071.
Jombart,T.,Eggo,R.M.,Dodd,P.J.,Balloux,F.,2011.Reconstructingdiseaseoutbreaks fromgeneticdata:agraphapproach.Heredity106,383–390.
Jombart,T.,Cori,A.,Didelot,X.,Cauchemez,S.,Fraser,C.,Ferguson,N.,2014.Bayesian reconstructionofdiseaseoutbreaksbycombiningepidemiologicandgenomic data.PLoSComput.Biol.10,e1003457.
Karatzoglou,A.,Smola,A.,Hornik,K.,Zeileis,A.,2004.kernlab—AnS4Packagefor KernelMethodsinR.J.Stat.Softw.11,1–20.
Mollentze,N.,etal.,2014.ABayesianapproachforinferringthedynamicsofpartially observedendemicinfectiousdiseasesfromspace–time–geneticdata.Proc.Biol. Sci.281,20133251.
Morelli,M.J.,Thébaud,G.,Chadœuf,J.,King,D.P.,Haydon,D.T.,etal.,2012.ABayesian inferenceframeworktoreconstructtransmissiontreesusingepidemiological andgeneticdata.PLoSComput.Biol.8,e1002768.
Obadia,T.,Haneef,R.,Boëlle,P-Y.,2012.TheR0package:atoolboxtoestimate repro-ductionnumbersforepidemicoutbreaks.BMCMed.Inform.Decis.Mak.12, 147.
Paradis,E.,2010.pegas:anRpackageforpopulationgeneticswithan integrated-modularapproach.Bioinformatics26,419–420.
Paradis,E.,Claude,J.,Strimmer,K.,2004.APE:analysesofphylogeneticsand evolu-tioninRlanguage.Bioinformatics20,289–290.
RCoreTeam,2013a.R:ALanguageandEnvironmentforStatisticalComputing.R FoundationforStatisticalComputing,Vienna,Austria.
RCoreTeam,2013b.foreign:ReadDataStoredbyMinitab,S,SAS,SPSS,Stata,Systat, dBase.Rpackageversion0.8-61,pp.8–53.
Stadler,T.,Bonhoeffer,S.,2013.Uncoveringepidemiologicaldynamicsin heteroge-neoushostpopulationsusingphylogeneticmethods.Philos.Trans.R.Soc.Lond. B:Biol.Sci.368,20120198.
Teunis,P.,Heijne,J.C.M.,Sukhrie,F.,vanEijkeren,J.,Koopmans,M.,etal.,2013.
Infectiousdiseasetransmissionasaforensicproblem:whoinfectedwhom?J. R.Soc.Interface10,20120955.
Truscott,J.,Garske,T.,Chis-Ster,I.,Guitian,J.,Pfeiffer,D.,etal.,2007.Controlofa highlypathogenicH5N1avianinfluenzaoutbreakintheGBpoultryflock.Proc. Biol.Sci.274,2287–2295.
Vega,V.B.,Ruan,Y.,Liu,J.,Lee,W.H.,Wei,C.L.,etal.,2004.Mutationaldynamicsof theSARScoronavirusincellcultureandhumanpopulationsisolatedin2003. BMCInfect.Dis.4,32.
Wallinga,J.,Teunis,P.,2004.Differentepidemiccurvesforsevereacuterespiratory syndromerevealsimilarimpactsofcontrolmeasures.Am.J.Epidemiol.160, 509–516.
Wickham,H.,2009.ggplot2:ElegantGraphicsforDataAnalysis.Springer,NewYork.
Ypma,R.J.F.,Bataille,A.M.A.,Stegeman,A.,Koch,G.,Wallinga, J.,etal.,2012.
Unravellingtransmissiontreesofinfectiousdiseasesbycombininggeneticand epidemiologicaldata.Proc.Biol.Sci.279,444–450.
Zou,H.,Hastie,T.,2012.elasticnet:ElasticNetregularizationandvariableselection. RPackageVersion11.