2 approches. À partir des données récoltées lors de la campagne de septembre 2011,
nous avons estimé simultanément la taille de la population et le taux
d'échantillon-nage en prenant en compte l'eet du type de végétation et l'heure d'échantillond'échantillon-nage
en tant qu'eet xe et nous avons introduit un eet aléatoire pour modéliser une
variation aléatoire du taux lié à l'unité d'observation.
Nous avons estimé un taux d'échantillonnage compris entre 33,9% et 47,4% pour
les arbustes et compris entre 53,6% et 66,7% pour les feuilles mortes.
j ou rn a lh o m epa ge :w w w . e l s e v i e r . c o m / l o c a t e / e c o l m o d e l
Bayesianestimationofabundancebasedonremovalsamplingunder
weakassumptionofclosedpopulationwithcatchabilitydependingon
environmentalconditions.Applicationtotickabundance
S.Bord
a,∗,P.Druilhet
b,P.Gasqui
a,D.Abrial
a,G.Vourc’h
aaINRA,UR346UnitéEpidémiologieAnimale,CentreINRAdeTheix,F-63122SaintGenèsChampanelle,France
bLaboratoiredemathématiques,UMRCNRS6620CampusdesCézeaux,B.P.80026,F-63171Aubièrecedex,France
article info
Articlehistory:
Received8July2013
Receivedinrevisedform
15November2013
Accepted1December2013
Available online 28 December 2013 Keywords:
Abundance Sampling-rate
Removalsampling
Samplingdesign
HierarchicalBayesianapproach
Haynemethod
abstract
Theestimationofanimalabundanceisessentialtounderstandpopulationdynamics,speciesinteractions
anddiseasepatternsinpopulations.Estimationsofrelativeabundanceclassicallyarebasedonasingle
observationofseveralsites.Inthiscase,themappingofabundanceassumesthattheprobabilityof
detectinganindividual,hencethesamplingrate,remainsconstantacrosstheobservedsites.Inpractice,
however,thisassumptionisoftennotsatisfiedasthesamplingratemayfluctuatebetweensitesdueto
randomfluctuationsand/orfluctuationsassociatedwiththesamplingprocess,notablyassociatedwith
thecharacteristicsofthesite.Itisthereforeimportanttoaccountforvariationsindetectionprobability.
Usingaremovalsamplingdesign,westudiedtheperformanceofaBayesianapproachtoestimateboth
samplingratesandabundanceundertheassumptionofaclosedpopulation.Theassumptionofaclosed
populationoftenisweakenedwhenthenumberofsuccessivesamplingsislarge.Thenumberofsamplings
hastobelimitedandoptimal.Wethereforeexaminedtheminimalnumberofsuccessivesamplings
neededtoachievesufficientstatisticalaccuracywhilerespectingunderlyingmodelassumptions.Using
thesamesimulations,wealsocomparedtheperformanceoftheBayesianapproachtotheperformanceof
thefrequentistHaynemethodbasedonlinearregression.WeshowthattheBayesianapproachproposed
givesgenerallybetterestimationsofpopulationsizethantheHaynemethod.Thetwomethodsgive
approximatelythesameresultsfortheestimationofsamplingrate.Wethenstudiedthevariabilityof
detectionprobabilityofIxodesricinustickssampledunderseveralenvironmentalconditionsbyusinga
hierarchicalBayesianmodelwitharandomeffect.Theestimatedsamplingrate ˆcvariedbetween33.9%
and47.4%forshrubsand53.6%and66.7%fordeadleaves.Thevariabilityofthesamplingratedueto
thesitedecreasedwhenthenumberofsuccessivesamplingsconsideredinthemodelincreased.The
variabilitywaslowerindeadleavesthanshrubs.Thisapproachcouldbeusedroutinelyforecologicalor
epidemiologicalstudiesofticksandspecieswithcomparablelifehistories.
© 2013 Elsevier B.V. All rights reserved.
1. Introduction
Theestimationofanimalabundanceisessentialinecologyto understandfundamentalprocesses,suchaspopulationdynamics andspecies interactions,aswellasinepidemiologyto under-standandgeneratediseasepatternsinpopulations(Anderson, 1991).Inthemajorityofbiologicalsystems,relevantindicators ofabundancearebasedoncountpointsurveys(Alldredgeetal., 2007)obtainedusingconvenientandcalibratedsamplingmethods (Anderson,2001;Pollocketal.,2002).Asapartofthepopulation isoftennotobservable,theprobabilityofdetectinganindividual
Royle,2010;PelletandSchmidt,2005).Consequently,indicators calculatedinthiswayonlygiveanindexofrelativeabundance. Theseindicatorsofabundanceareimplicitlybasedonthe assump-tionthatthesamplingrateisconstantfromsitetosite(Williams etal.,2002;Pollocketal.,2002;RoyleandDorazio,2006). How-ever,thesamplingratemaydependonenvironmentalconditions, suchasweather,season,samplerandhabitats.Ifthisisthecase, consideringthesamplingratetobeconstantleadstoconfusion betweenthevariabilityoftherateandthevariabilityofabundance (Thompsonetal.,1998;MacKenzieandKendall,2002).Therefore, aneffortneedstobemadetoestimateboththeabundanceandthe
intensive(DoddandDorazio,2004)andmaymodifytheobserved sitewhenthenumberofsuccessivesamplingsishigh.Thechoice ofagivenprotocol(CMRorRS)anditseaseofimplementation dependonthespeciesstudied.Althoughavailable,CMRandRS methodsarerarelyusedforcertainspecies.Onesuchspeciesare ticks,whicharethemostimportantvectorsofhumanandanimal diseasesaftermosquitoes(ParolaandRaoult,2001).The classi-calindexoftickabundanceusedisestimatedbythenumberof ticknymphsbydraggingapieceofclothonceoverthe vegeta-tionofadelimitedarea,generally10m2(Vassalloetal.,2000)in aselectionofsites.Host-seekingnymphs,i.e.thosewaitingfora hostonthetopofthevegetation,arecollectedbythedrag.The numbersofnymphscollectedonthedifferentsitesarethen com-pared.ThedragmethodisdistinguishedfromRSmethodsinthat theclothisdraggedonlyonceovereachsite.Toourknowledge, thesamplingrateofthedragsamplingmethodhasbeenstudied little.Onlyonestudy(Talleklint-EisenandLane,2000)hasused aRSdesigntoestimatetheabundanceandthesamplingrateof thedragmethod.Inthisstudy,17successivesamplingswere con-ductedover23days.Thisprotocolcouldnotguaranteethatthe populationremainedclosedoverthesamplingperiod.Theauthors estimatedthesamplingratetobe5.9%usingtheHaynemethod (Hayne,1949).
Toestimateparameters,theHaynemethodmakesalinear regressionofthenumberofsuccessivecapturesonthe cumula-tivenumberofcaptures.Thesamplingrateandthepopulationsize areestimatedrespectivelyastheslopeoftheregressionlineand astheintersectionpointbetweenthehorizontalaxisand regres-sionline.However,theHaynemethodisknowntoproducepoor estimationsofpopulationsize(Whiteetal.,1982),especiallywhen thesamplingrateispotentiallylow(lessthan10%)andvariable. Moreover,thismethoddoesnotallowcovariateeffectstobetaken intoaccount,nordoesitprovideconfidenceintervalsfor esti-mates.
Inthispaper,ahierarchicalBayesianapproachwasusedto esti-mateboththesamplingrateandthepopulationsize.Thisapproach allowstheinclusionofpriorknowledgeandprovidesposterior distributionsofparameterestimates.Moreover,becauseitis hier-archical,onecantakeintoaccountparameterswhichareeither observableornotobservable,andwhicharelocatedatdifferent scalessuchastheareasampledandthesiteofsampling(Gelman andHill,2006;Cressieetal.,2009).First,westudiedthe
per-samplingandNktheremainingpopulationafterthekthsampling whereNk=Nk−1−Xkfork∈1,...,K.Furthermore,eachindividualof theclosedpopulationwasassumedtobecapturedindependently withthesameprobabilityofcapture(Moran,1951;Zippin,1956). Hence,weassumedthatXkfollowedabinomialdistributionwith populationsizeNk−1andprobabilityofcapture(Eq.(1)): (Xk|Nk−1,)∼B(Nk−1,), where Nk=Nk−1−Xk. (1)
Thecaptureprobabilitywasconsideredasindependentandthe sameforallindividuals,soweconsideredthatitwasequaltothe samplingrate,i.e.thepercentofcapturedindividualsinthe popu-lation.
2.2. HierarchicalBayesianmodels
AfirstHBM(HBM1)assumedthatthepopulationsizeN0sand thesamplingrateswasspecifictoeachsites.AsecondHBM (HBM2)assumedthatthesamplingrateofagivensiteswas asso-ciatedtoboththeeffectofsamplingconditionsc(consideredtobe afixedeffect)andasmallvariationduetothesitesampled (consid-eredtobearandomeffect).Thelogittransformationofsampling ratesdenotedlogit(s)wasused.Thelogit(s)wasdecomposed asthesumofthelogittransformationofthesamplingratecunder thesamplingconditionscandarandomeffects(Eq.(2)): logit(s)=logit(c)+s. (2) Therandomeffectswasintroducedtoaddafluctuationofthe samplingratesduetotheobservedsites.Therangeof varia-tionoftherandomeffects,denotedby2
c,wasconsideredto bespecifictoeachsamplingconditioncwhereswasassumedto followanormaldistributionwithzero-meanandvariance2
cwhich dependedonthesamplingconditionsc.TheHBM1andHBM2 mod-elsdescribedabovearesummarisedinFigs.1and2byDirected AcyclicGraphs(DAGs)(ThulasiramanandSwamy,1992;Clarkand Gelfand,2006).TheseDAGsrepresentrelationships(lines)between theobserveddataandtheunknownparametersorhypothesesof themodel(nodes).Thelinesrepresenttherelationsandthe hier-archybetweennodes.ThenodessymbolisethedataobservedXk, parameterstoestimateN0s,s,c,priordistributionofparameters toestimateforN0s,s,candsandthedistributionof hyper-parameter2
c.Ifs=0andcwasspecifictoeachsites,thentheDAG correspondedtotheHBM1model.HBM1andHBM2were
imple-Fig.1.DirectedAcyclicGraphforthehierarchicalBayesianmodeldenotedbyHBM1
andaprioridistributionsofparametersN0and.N0representsthesizeof
popula-tion;representsthesamplingrate;Xkrepresentstheobservednumberofcapture
atthekthsampling;Nkrepresentsthesizeofpopulationafterthekfirstsamplings.
byinspectingtraceandautocorrelationplots.
2.3. Modelcheckingforobserveddata
Tocheckwhetherthemodelfitthedata(Gelmanetal.,2004), wetestedwhetherthesimulationsdenotedbyXrep=(Xrep,1,...,
Xrep,K)generatedunderthemodelweresimilartotheobserved datadenotedbyX=(X1,...,XK).Consequently,wesimulated25,000 sequencesofremovalsamplingXrepfromtheposteriorpredictive distribution.Theobserveddatawerecomparedtothesimulated
Xrepthroughpredictivebandsofoverallcoveragelevel1−˛,for some0<˛<1.Theshapeofthebandwasconstructedbyusing equal-tailedpredictiveintervalsforeachsamplingrankk∈1,...,K
withthesamecoveragelevel1−˛rank.Moreprecisely,foragiven
˛rank,theinferiorandsuperiorlimitsofthePredictiveIntervalfor rankk,denotedbyPIk,weregivenbythe˛rank/2,resp.1−˛rank/2, quantileofthe25,000simulatedvaluesXrep,k.Theoveralllevel1−˛
ofthepredictivebandwasthenestimatedbytheproportionofXrep
insidetheband,i.e.theXrepsuchthateachXrep,kbelongstoPIk.The valueof˛rankwassuchthattheoverallriskwasequalto˛.The validityofthemodelwasacceptediftheproportionoftheobserved sequencesX=(X1,...,XK)thatbelongedtothepredictivebandwas atleast1−˛.
2.4. Simulationscheme
TocomparetheperformanceoftheHaynemethodtothatof HBM1,simulationswereperformed.Differentlevelsofpopulation sizeN0andsamplingrateweretestedtoevaluatetheirimpacton thequalityofestimators.ThreepopulationsizesN0were consid-ered:50,100and500.Fourvaluesofweretested:0.10,0.30,0.50, 0.70.ForeachcombinationofN0andvalues,wesimulated1000 sequencesofremovalsamplingX1,...,X10accordingtothemodel describedinHBM1(Eq.Eq.(1)).Thefirstksuccessivesamplings ofeachsimulatedsequencewereconsideredtoestimateN0and, wherekrangedfrom2to10.ThecombinationofN0,andkvalues wereconsideredasonescenario.Intheend,108scenarioswere constructed.Basedonthe108scenariosandthe1000sequences simulated,theestimationofN0andwereperformedbyboththe HaynemethodandHBM1.
2.5. Performanceofestimators
Theclassicalcriterionusedtoevaluatetheperformanceofan estimator ˆofaparameteristheMeanSquareErrordenotedby MSEˆ
()(LehmannandCasella,1998).MSEˆ
()hastwo compo-nents:thevariabilityoftheestimatorandthedistancebetween theestimator’sexpectedvalueandthetruevalueoftheparameter beingestimatedorbias(Eq.(3)and(4)):
MSEˆ()=E((ˆ−)2)=(bias(ˆ))2+var( ˆ) (3) where
2.6. Casestudy:abundanceestimationofIxodesricinus host-seekingnymphs
Ixodesricinushost-seekingnymphswerecollectedfromthe15th tothe23thofSeptember2011inthe“ParcdelaFaisanderie”located intheForêtdeSénart,France.Thevegetationontheobservedsites wascomposedofeitherdeadleaveoryoungoak(Quercus)and/or hornbeam(Carpinusbetulus),consideredtobe“shrub”.Duringthe samplingperiod,60sitesweresampledeitherbetween8a.m.and 11a.m.(“morning”)orbetween12p.m.and5p.m.(“afternoon”). The60sitesthereforefellinto4categories:“shrubsampledinthe morning”(n=8),“shrubsampledintheafternoon”(n=15),“dead leavessampledinthemorning”(n=17),“deadleavessampledin theafternoon”(n=20).
Nymphs were captured using the drag sampling method describedbyVassalloetal.(2000),whichconsistsofdragginga 1m×1mclothoverthevegetationfor10m.Unfedhost-seeking nymphsrespondtothemechanicalstimulicreatedbythesweep ofthedragclothbyattachingthemselvestoit.Each1×10meter samplingsitewasmarkedwithboundarymarkers.Foreach samp-lingsite,16successiveremovalsamplingswererealisedtocollect host-seekingnymphs.Onesamplingwasmadeeverytwo min-utesandthirtyseconds.Thenumberofnymphscapturedateach samplingwasrecorded.
Thevariableofinterestwasthesequenceofconsecutive num-bersofnymphscapturedduringthesuccessivesamplingsona givenobservationsite.Onlythefirstnsamplingswerekeptforthe estimationofthehost-seekingpopulationsizeandthesampling rate.nvaluesvariedbetween3and5toavoidtheemergenceof uncontrolledphenomenathatcouldcontradictthehypothesisofa closedpopulationsuchasamodificationofenvironmental condi-tionsorthearrivalofnewhost-seekingticks(Sonenshine,1994). DataweremodelledthroughHBM1andHBM2.Theposterior dis-tributionoftheparameterswasbythinningthreeMarkovchains andkeepingevery50thofthelast500,000values(burn-inwasset to500,000).ThesamplingrateandthesizeofpopulationN0were estimatedbythemeanoftheposteriordistribution.
Tocomparetheestimatedsamplingratesoftwodifferent condi-tionsc1andc2,wemonitoredtheposteriordistributionofcontrasts. Thesamplingrateoftwoconditionsc1andc2areconsideredto bethesamewhen0belongstothe95%equaltailcrediblesetof
thantheHaynemethod.TheRRMSE(ˆ)wasgloballycomparable forbothmethodsalthoughitwasgenerallylowerfortheHBM1 estimator.For≤0.3,theRRMSE( ˆN0)andRRMSE(ˆ)oftheHBM1 estimatorweregloballygreaterthantheHayneestimator.When thetruevalueofN0wasequalto50or100,theRRMSE( ˆN0)ofthe Hayneestimatorwasverylarge.
3.2. BayesianmodelestimationsforI.ricinusabundance
Thecomparisonoftheobserveddatawiththepredictivebands ofanoverallcoveragelevelof95% showedthatonesequence
X=(X1,...,Xk)overthe60observedoneswasoutsidethepredictive bandsfork=3.Fork=4andk=5,6and9observedsequenceswere outsidethepredictivebands.Thevalidityofthemodelconsidering thefirst3samplingswasthusacceptedata95%confidencelevel.
Theresultspresentedintheprecedingsectiondemonstratethat forapopulationsizeofN0=50andasamplingrate=0.5,4 suc-cessivesamplingsareneededtoachieveagoodestimationquality. Threemodelswerethereforeconsideredusingthefirstthree,four andfivesuccessivesamplingsfortheapplication.
Theestimatedsamplingrate ˆcvariedbetween33.9%and47.4% forshrubsand53.6%and66.7%fordeadleaves(Fig.3).No sig-nificantdifference intheestimatedsamplingrate ˆc between themorningandtheafternoonwasdetectedwhateverthetype ofvegetationorthenumberofsuccessivesamplingsconsidered. Moreover,therewasnosignificantdifferencebetweendeadleaves andshrubssampled inthemorning(see Table2andFig.4). However,intheafternoon,thesamplingrateofshrubsiteswas sig-nificantlylowerthanthatofdeadleafsitessampledinthemorning orintheafternoon(fork=4,themedianof ˆposteriordistribution, equalto34.1%forshrubsintheafternoonversus63.1%and58.3% fordeadleavesinthemorningandintheafternoonrespectively). Thevariabilityofthesamplingrateˆ2
c duetothesiteobserved decreasedwhenthenumberofsuccessivesamplingsconsideredin themodelincreased(Table2).However,thevariabilitywaslower indeadleavesthaninshrubs(Table2).
4. Discussion
pop-0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 K= 4 Number of sampling 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 K= 5 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Fig.3.Posteriordistributionofsamplingrate ˆaccordingtoconditionsforHBM2consideringthe3,4and5firstsamplingsforthe4conditions:shrubsampledinthemorning
(Sh.Mor.),shrubsampledintheafternoon(Sh.Aft.),deadleavessampledinthemorning(DLMor.),deadleavessampledintheafternoon(DLAft.).Theredlinerepresent
themeanofposteriordistributionofsamplingrate ˆ.
−1.0 −0.5 0.0 0.5 1.0 K= 3 Sh, Aft vs Mor −1.0 −0.5 0.0 0.5 1.0 Mor, DF vs Sh −1.0 −0.5 0.0 0.5 1.0 DF−Mor vs Sh−Aft