HAL Id: hal-02806241
https://hal.inrae.fr/hal-02806241
Submitted on 6 Jun 2020
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Inference for diffusions with small diffusions coefficient.
Application to epidemics
Romain Guy, Catherine Laredo, Elisabeta Vergu
To cite this version:
Romain Guy, Catherine Laredo, Elisabeta Vergu. Inference for diffusions with small diffusions
co-efficient. Application to epidemics. Dynstoch Meeting 2012, Jun 2012, Paris, France. 19 diapos.
�hal-02806241�
Application to Epidemics
Inference for diusions with small diusion coecient and
application to epidemics
Romain GUY
1,2
with C. Laredo
1,2
and E. Vergu
1
1Unité Mathématiques et Informatique Appliquées, INRA, Jouy-en-Josas
2Laboratoire de Probabilités et modèles aléatoires (Paris Diderot)
Dynstoch 2012
8 juin 2012
Application to Epidemics
Outline
1
Parametric inference for discretely observed diusion processes
2
Application to Epidemics
Inferen eforepidemi datausingdiffusionpro esseswithsmalldiffusion oeffi ent RomainGuy1,2,CatherineLaredo1,2andElisabetaVergu1 1MathématiquesetInformatiqueAppliquées,INRA,Jouy-en-Josas,Fran e 2LaboratoiredeProbabilitésetModélisationAléatoire,UniversitéParisDiderot,Paris,Fran e Introdu tion&Maingoal Epidemi data:oftenpartiallyobserved&temporallyaggregated→parametri inferen ethrough likelihood-basedapproa hesrarelystraightforward,whateverthemathemati alrepresentationused. Theselastyears:developmentofmethodsbasedondataaugmentation,abletodealwithdierent patternsofmissingness,buttheydonotprovidesystemati solutions(limitationsduetotheamountof augmenteddata& omputationtimes[CF08 ℄. Inthis ontext:diusionpro essesprovideagoodapproximationofepidemi dynami sand,dueto theiranalyti alpower,allowsheddingnewlightoninferen eproblemsofepidemi data. We onsideredasimpliedframeworkwithdis reteobservationsonaxedtimeintervalandlowand highfrequen ydata.Ourmaingoalistoprovidegoodestimatorsofepidemi -relatedparameters. Context Epidemi s Notations N:populationsize(supposedlarge),m:initialnumberofinfe tives λ:transmissionrate,µ:re overyrate S(t), I(t):numbersofsus eptibles,infe teds;s(t) = S(t)N , i(t) =I(t)N n:numberofobservations(generallysmall)
Assumptions Homogenousmixingina losedpopulation Dis reteobservationsofbothSandIonaxedinterval[0, T ],withsamplinginterval∆(T = n∆)(if ∆ → 0,itgoeswithslowrate) SimpleSIR-likemathemati almodelsforepidemi sandrelatedinferen e results ODEmodel(x(t) = (s(t), i(t)), x(0) = (N−mN ,m N )) Atea htimepoint,x(t)issolutionof: dsdt = −λsi di dt = λsi − µi Observations:n,attimepointstk = k∆:Xtk = x(tk) + ǫkwithǫk ∼ iidN 0,σ21 00 σ22 . Asymptoti s:n → ∞ LeastSquareEstimator(LSE):√nˆλLSE − λ0ˆµLSE − µ0 −→ n→∞N 0,σ21 00 σ22 Markovpurejumppro ess(Xt = (S(t), I(t)), X0 = (N − m, m)) Exponentialholdingtimesandpossibletransitions:(S, I) λ N SI −→ (S − 1, I + 1)and(S, I)µI−→ (S, I − 1). Observations:thewholesamplepath Asymptoti s:N → ∞
MaximumLikelihoodEstimator(MLE):ˆλMLE = NN − m − S(T)R T0 S(t)I(t)dt,ˆµMLE =N − S(T ) − I(T )) R T0 I(t)dt, √NˆλMLE − λ0ˆµMLE − µ0 −→ N→∞N 0,V ar(λ0)0 0V ar(µ0)
,withV ar(λ0)andV ar(µ0)expli it. Diusionpro ess(Xǫt = (st, it), Xǫ0 = (N−mN ,mN )) In[EK86 ℄: dst = −λstitdt + 1√ N √λstit dB1(t) dit = (λstit − µit)dt − 1√N√λstit dB1(t) + 1√N√µit dB2(t) (1)
withB1, B2twoindependentBrownianmotions Observations:n,Xǫtk, at tk = k∆, on a xed interval [0, T ] (T = n∆) Asymptoti s:letǫ = 1√N , ǫ → 0, where ∆ xed or ∆ goes to zero slowly.
Inferen efordiusionpro esseswithsmalldiusion oe ient Existingresultsforsmallǫ
[GC90 ℄:unidimensionaldiusionpro essesinlowfrequen yandhighfrequen ywithσ(β, x) = 1and ǫ√nbounded.
[SU03 ℄:multidimensionaldiusionpro essinhighfrequen ywith1ǫ√n bounded, and1ǫn → 0. [GS09℄:multidimensionaldiusionpro essinhighfrequen ywith1ǫnρ bounded where ρ > 0. Framework:diusiononRp dXǫt = b(α, Xǫt)dt + ǫσ(β, Xǫt)dBt, X0 = x0 ∈ Rp (2) Epidemi s:α = β Σ(β, x) = tσ(β, x)σ(β, x),p × pinvertiblematrix. xαdeterministi solutionofdxα(t)dt = b(α, xα(t)), x(0) = x0 ∈ Rp ΦαtheinvertiblematrixsolutionofdΦα dt (t, t0) =∂b ∂x(α, xα(t))Φα(t, t0), Φα(t0, t0) = Ip. Observations:n,Xǫtattimestk = k∆onaxedinterval[0, T ](T = n∆). Mainidea:usingTaylor'ssto hasti formula[Aze82 ℄,denetheGaussianpro ess: Y ǫt = xα(t) + ǫZ t0σ(β, xα(s))dBs. LikelihoodoftheGaussianve tors(Y ǫtk)k∈{1,..,n} dene a ontrast pro ess for (Xtk)k∈{1,..,n}:
U∆,ǫ(α, β)) =Xn k=1log det Σ(β, Xtk−1) +1 ǫ2∆ nX k=1tNk(α)Σ−1(β,Xtk−1)Nk(α) withNk(α) = Xtk − xα(tk) − Φα(tk, tk−1)Xtk−1 − xα(tk−1). (ˆαǫ,∆, ˆβǫ,∆) = argmin (α,β)∈ΘU∆,ǫ (α, β) (3) Inferen eresults Forhighfrequen ydata(∆ → 0 ∼ n → ∞) Under lassi alregularityandidentiabilityassumptionsandǫ√n → 0 ǫ−1(˘αǫ,∆ − α0) √ n( ˘βǫ,∆ − β0) ! −→ n→∞,ǫ→0N0, I−1 b (α0, β0)0 0I−1σ (α0, β0) !! .
Ib(α0, β0)optimalandIσ(α0, β0)thesameasin[GS09 ℄.
Consequen e:for(1)bettertoestimate(λ, µ)withthedriftestimator. Forlowfrequen ydata(∆andnxed),βnonidentiable Knownβ ˆαǫ,∆(β0) = argmin α∈Θa U∆,ǫ (α, β0):ǫ−1 ˆαǫ,∆(β0) − α0−→ ǫ→0N (0, I−1∆ (α0, β0)) I∆(α0, β0) −→ ∆→0Ib(α0, β0) Unknownβ
Forepidemi sα = β: onsiderU∆,ǫ(α, α)andtheaboveresultholds
GeneralCase: onditionalleastsquare-like ontrastpro ess: ˜Uǫ,∆ α, (Xtk)k∈{1,..,n}=1 ǫ2 n X k=1tNk(X,α)Nk(X,α) (4) ˜αǫ,∆ = argminα∈Θa˜Uǫ,∆ (α):ǫ−1 ˜αǫ,∆ − α0
−→ ǫ→0N (0, J−1∆ (α0, β0)). Ba ktoepidemi s: omparisonofestimatorson simulateddata Simulateddata:epidemi traje toriesusingtheMarkovjumppro essfordierentvaluesofN, (λ, µ) andaxedm N ratio. Comparisonofdierentestimators: •Referen eestimator:(ˆλMLE, ˆµMLE)analyti ally omputedwiththewholesamplepath •Numeri al omputation(viaMatlabfmisear hfun tion)of:LSE,[GS09 ℄(ˆα),ˆαǫ,∆inthespe ial ase α = β,and˜αǫ,∆,basedondis retiseddata(1observation/day)realisti numberofobservations
S enariosExplored:N = {1000, 10000}, λ = {0.4, 2/3}, µ = 1/3, mN = 1%, (ǫ << 1, ∆ = 1, T ∈ [35, 105])
Simulationresults(on100runs) ForN = 1000,m = 10,λ = 2/3,µ = 1/3,1 observation/day, n = T = 40 days
Methodˆλ CI95 empiric CI95 theoretical ˆµ CI95 empiric CI95 theoretical MLE(all data) 0.668 [0.662; 0.674][0.622; 0.714]0.335 [0.329; 0.341][0.317; 0.354] LSE0.628[0.607; 0.648][0.618; 0.637]0.312[0.300; 0.324][0.301; 0.323]
[GS09](ˆα)0.665[0.664; 0.666][0.316; 1.010]0.335[0.334; 0.336][0.165; 0.504] ˆαǫ,∆(α = β) 0.661 [0.656; 0.668][0.316; 1.005]0.340[0.337; 0.343][0.170; 0.511] ˜αǫ,∆0.650[0.647; 0.653][0.371; 0.929]0.329[0.327; 0.331][0; 0.660] ForN = 10000,m = 100,λ = 0.4,µ = 1/3,1 observation/day, n = T = 105 days
Methodˆλ CI95 empiric CI95 theoretical ˆµ CI95 empiric CI95 theoretical MLE(all data) 0.399 [0.397; 0.401][0.386; 0.413]0.332 [0.332; 0.333][0.328; 0.335] LSE0.381[0.369; 0.393][0.379; 0.383]0.316[0.306; 0.326][0.312; 0.321] [GS09](ˆα)0.389[0.388; 0.390][0.244; 0.536]0.321[0.320; 0.322][0.201; 0.441] ˆαǫ,∆(α = β) 0.389[0.388; 0.390][0.243; 0.536]0.328[0.326; 0.330][0.207; 0.448] ˜αǫ,∆0.396[0.395; 0.397][0.325; 0.466]0.331[0.330; 0.332][0.279; 0.382] Green:Bestpon tualestimator(6=MLE) Blue:C.I.not ontainingthetruevalueoftheparameter Perspe tives Consideringotherme hanisti modelsforepidemi spread(SEIR...) Studyofintegratedandpartiallyobserveddiusionmodel(newinfe ted asesreported) A knowledgments:Partialnan ialsupportforthisresear hwasprovidedbyINRA,CemagrefandBasse-Normandie,Bretagne,PaysdeleLoireandPoitou-Charentes RegionalCoun ilsunderSANCREproje t,intheframeworkof"ForandAboutRegionalDevelopment"programs. Referen es [Aze82℄R.Azen ott.Formuledetaylorsto hastiqueetdéveloppementasymptotiqued'intégralesdefeynman.Séminairedeprobabilités(Strasbourg),1982. [CF08℄S.Cau hemezandN.Ferguson.Likelihood-basedestimationof ontinuous-timeepidemi modelsfromtime-seriesdata:appli ationtomeaslestransmissionin london.J.R.So .Interfa e,2008. [EK86℄S.N.EthierandT.G.Kurtz.MarkovPro essesChara terizationand onvergen e.WileyInters ien e,1986. [GC90℄V.Genon-Catalot.Maximum ontrastestimationfordiusionpro essesfromdis reteobservations.Statisti s,1990. [GS09℄A.GloterandM.Sorensen.Estimationforsto hasti dierentialequationswithasmalldiusion oe ient.Sto hasti Pro essesandtheirAppli ations,2009. [SU03℄M.SorensenandM.U hida.Smalldiusionasymptoti sfordis retelysampledsto hasti dierentialequations.Bernoulli,2003.