• Aucun résultat trouvé

Inference for diffusions with small diffusions coefficient. Application to epidemics

N/A
N/A
Protected

Academic year: 2021

Partager "Inference for diffusions with small diffusions coefficient. Application to epidemics"

Copied!
20
0
0

Texte intégral

(1)

HAL Id: hal-02806241

https://hal.inrae.fr/hal-02806241

Submitted on 6 Jun 2020

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Inference for diffusions with small diffusions coefficient.

Application to epidemics

Romain Guy, Catherine Laredo, Elisabeta Vergu

To cite this version:

Romain Guy, Catherine Laredo, Elisabeta Vergu. Inference for diffusions with small diffusions

co-efficient. Application to epidemics. Dynstoch Meeting 2012, Jun 2012, Paris, France. 19 diapos.

�hal-02806241�

(2)

Application to Epidemics

Inference for diusions with small diusion coecient and

application to epidemics

Romain GUY

1,2

with C. Laredo

1,2

and E. Vergu

1

1Unité Mathématiques et Informatique Appliquées, INRA, Jouy-en-Josas

2Laboratoire de Probabilités et modèles aléatoires (Paris Diderot)

Dynstoch 2012

8 juin 2012

(3)

Application to Epidemics

Outline

1

Parametric inference for discretely observed diusion processes

2

Application to Epidemics

Inferen eforepidemi datausingdiffusionpro esseswithsmalldiffusion oeffi ent RomainGuy1,2,CatherineLaredo1,2andElisabetaVergu1 1MathématiquesetInformatiqueAppliquées,INRA,Jouy-en-Josas,Fran e 2LaboratoiredeProbabilitésetModélisationAléatoire,UniversitéParisDiderot,Paris,Fran e Introdu tion&Maingoal Epidemi data:oftenpartiallyobserved&temporallyaggregated→parametri inferen ethrough likelihood-basedapproa hesrarelystraightforward,whateverthemathemati alrepresentationused. Theselastyears:developmentofmethodsbasedondataaugmentation,abletodealwithdierent patternsofmissingness,buttheydonotprovidesystemati solutions(limitationsduetotheamountof augmenteddata& omputationtimes[CF08 ℄. Inthis ontext:diusionpro essesprovideagoodapproximationofepidemi dynami sand,dueto theiranalyti alpower,allowsheddingnewlightoninferen eproblemsofepidemi data. We onsideredasimpliedframeworkwithdis reteobservationsonaxedtimeintervalandlowand highfrequen ydata.Ourmaingoalistoprovidegoodestimatorsofepidemi -relatedparameters. Context Epidemi s Notations N:populationsize(supposedlarge),m:initialnumberofinfe tives λ:transmissionrate,µ:re overyrate S(t), I(t):numbersofsus eptibles,infe teds;s(t) = S(t)N , i(t) =I(t)N n:numberofobservations(generallysmall)

Assumptions Homogenousmixingina losedpopulation Dis reteobservationsofbothSandIonaxedinterval[0, T ],withsamplinginterval∆(T = n∆)(if ∆ → 0,itgoeswithslowrate) SimpleSIR-likemathemati almodelsforepidemi sandrelatedinferen e results ODEmodel(x(t) = (s(t), i(t)), x(0) = (N−mN ,m N )) Atea htimepoint,x(t)issolutionof:   dsdt = −λsi di dt = λsi − µi Observations:n,attimepointstk = k∆:Xtk = x(tk) + ǫkwithǫk ∼ iidN 0,σ21 00 σ22  . Asymptoti s:n → ∞ LeastSquareEstimator(LSE):√nˆλLSE − λ0ˆµLSE − µ0  −→ n→∞N 0,σ21 00 σ22  Markovpurejumppro ess(Xt = (S(t), I(t)), X0 = (N − m, m)) Exponentialholdingtimesandpossibletransitions:(S, I) λ N SI −→ (S − 1, I + 1)and(S, I)µI−→ (S, I − 1). Observations:thewholesamplepath Asymptoti s:N → ∞

MaximumLikelihoodEstimator(MLE):ˆλMLE = NN − m − S(T)R T0 S(t)I(t)dt,ˆµMLE =N − S(T ) − I(T )) R T0 I(t)dt, √NˆλMLE − λ0ˆµMLE − µ0  −→ N→∞N 0,V ar(λ0)0 0V ar(µ0) 

,withV ar(λ0)andV ar(µ0)expli it. Diusionpro ess(Xǫt = (st, it), Xǫ0 = (N−mN ,mN )) In[EK86 ℄:   dst = −λstitdt + 1√ N √λstit dB1(t) dit = (λstit − µit)dt − 1√N√λstit dB1(t) + 1√N√µit dB2(t) (1)

withB1, B2twoindependentBrownianmotions Observations:n,Xǫtk, at tk = k∆, on a xed interval [0, T ] (T = n∆) Asymptoti s:letǫ = 1√N , ǫ → 0, where ∆ xed or ∆ goes to zero slowly.

Inferen efordiusionpro esseswithsmalldiusion oe ient Existingresultsforsmallǫ

[GC90 ℄:unidimensionaldiusionpro essesinlowfrequen yandhighfrequen ywithσ(β, x) = 1and ǫ√nbounded.

[SU03 ℄:multidimensionaldiusionpro essinhighfrequen ywith1ǫ√n bounded, and1ǫn → 0. [GS09℄:multidimensionaldiusionpro essinhighfrequen ywith1ǫnρ bounded where ρ > 0. Framework:diusiononRp dXǫt = b(α, Xǫt)dt + ǫσ(β, Xǫt)dBt, X0 = x0 ∈ Rp (2) Epidemi s:α = β Σ(β, x) = tσ(β, x)σ(β, x),p × pinvertiblematrix. xαdeterministi solutionofdxα(t)dt = b(α, xα(t)), x(0) = x0 ∈ Rp ΦαtheinvertiblematrixsolutionofdΦα dt (t, t0) =∂b ∂x(α, xα(t))Φα(t, t0), Φα(t0, t0) = Ip. Observations:n,Xǫtattimestk = k∆onaxedinterval[0, T ](T = n∆). Mainidea:usingTaylor'ssto hasti formula[Aze82 ℄,denetheGaussianpro ess: Y ǫt = xα(t) + ǫZ t0σ(β, xα(s))dBs. LikelihoodoftheGaussianve tors(Y ǫtk)k∈{1,..,n} dene a ontrast pro ess for (Xtk)k∈{1,..,n}:

U∆,ǫ(α, β)) =Xn k=1log det Σ(β, Xtk−1) +1 ǫ2∆ nX k=1tNk(α)Σ−1(β,Xtk−1)Nk(α) withNk(α) = Xtk − xα(tk) − Φα(tk, tk−1)Xtk−1 − xα(tk−1). (ˆαǫ,∆, ˆβǫ,∆) = argmin (α,β)∈ΘU∆,ǫ (α, β) (3) Inferen eresults Forhighfrequen ydata(∆ → 0 ∼ n → ∞) Under lassi alregularityandidentiabilityassumptionsandǫ√n → 0 ǫ−1(˘αǫ,∆ − α0) √ n( ˘βǫ,∆ − β0) ! −→ n→∞,ǫ→0N0, I−1 b (α0, β0)0 0I−1σ (α0, β0) !! .

Ib(α0, β0)optimalandIσ(α0, β0)thesameasin[GS09 ℄.

Consequen e:for(1)bettertoestimate(λ, µ)withthedriftestimator. Forlowfrequen ydata(∆andnxed),βnonidentiable Knownβ ˆαǫ,∆(β0) = argmin α∈Θa U∆,ǫ (α, β0):ǫ−1 ˆαǫ,∆(β0) − α0−→ ǫ→0N (0, I−1∆ (α0, β0)) I∆(α0, β0) −→ ∆→0Ib(α0, β0) Unknownβ

Forepidemi sα = β: onsiderU∆,ǫ(α, α)andtheaboveresultholds

GeneralCase: onditionalleastsquare-like ontrastpro ess: ˜Uǫ,∆ α, (Xtk)k∈{1,..,n}=1 ǫ2 n X k=1tNk(X,α)Nk(X,α) (4) ˜αǫ,∆ = argminα∈Θa˜Uǫ,∆ (α):ǫ−1 ˜αǫ,∆ − α0

−→ ǫ→0N (0, J−1∆ (α0, β0)). Ba ktoepidemi s: omparisonofestimatorson simulateddata Simulateddata:epidemi traje toriesusingtheMarkovjumppro essfordierentvaluesofN, (λ, µ) andaxedm N ratio. Comparisonofdierentestimators: •Referen eestimator:(ˆλMLE, ˆµMLE)analyti ally omputedwiththewholesamplepath •Numeri al omputation(viaMatlabfmisear hfun tion)of:LSE,[GS09 ℄(ˆα),ˆαǫ,∆inthespe ial ase α = β,and˜αǫ,∆,basedondis retiseddata(1observation/day)realisti numberofobservations

S enariosExplored:N = {1000, 10000}, λ = {0.4, 2/3}, µ = 1/3, mN = 1%, (ǫ << 1, ∆ = 1, T ∈ [35, 105])

Simulationresults(on100runs) ForN = 1000,m = 10,λ = 2/3,µ = 1/3,1 observation/day, n = T = 40 days

Methodˆλ CI95 empiric CI95 theoretical ˆµ CI95 empiric CI95 theoretical MLE(all data) 0.668 [0.662; 0.674][0.622; 0.714]0.335 [0.329; 0.341][0.317; 0.354] LSE0.628[0.607; 0.648][0.618; 0.637]0.312[0.300; 0.324][0.301; 0.323]

[GS09](ˆα)0.665[0.664; 0.666][0.316; 1.010]0.335[0.334; 0.336][0.165; 0.504] ˆαǫ,∆(α = β) 0.661 [0.656; 0.668][0.316; 1.005]0.340[0.337; 0.343][0.170; 0.511] ˜αǫ,∆0.650[0.647; 0.653][0.371; 0.929]0.329[0.327; 0.331][0; 0.660] ForN = 10000,m = 100,λ = 0.4,µ = 1/3,1 observation/day, n = T = 105 days

Methodˆλ CI95 empiric CI95 theoretical ˆµ CI95 empiric CI95 theoretical MLE(all data) 0.399 [0.397; 0.401][0.386; 0.413]0.332 [0.332; 0.333][0.328; 0.335] LSE0.381[0.369; 0.393][0.379; 0.383]0.316[0.306; 0.326][0.312; 0.321] [GS09](ˆα)0.389[0.388; 0.390][0.244; 0.536]0.321[0.320; 0.322][0.201; 0.441] ˆαǫ,∆(α = β) 0.389[0.388; 0.390][0.243; 0.536]0.328[0.326; 0.330][0.207; 0.448] ˜αǫ,∆0.396[0.395; 0.397][0.325; 0.466]0.331[0.330; 0.332][0.279; 0.382] Green:Bestpon tualestimator(6=MLE) Blue:C.I.not ontainingthetruevalueoftheparameter Perspe tives Consideringotherme hanisti modelsforepidemi spread(SEIR...) Studyofintegratedandpartiallyobserveddiusionmodel(newinfe ted asesreported) A knowledgments:Partialnan ialsupportforthisresear hwasprovidedbyINRA,CemagrefandBasse-Normandie,Bretagne,PaysdeleLoireandPoitou-Charentes RegionalCoun ilsunderSANCREproje t,intheframeworkof"ForandAboutRegionalDevelopment"programs. Referen es [Aze82℄R.Azen ott.Formuledetaylorsto hastiqueetdéveloppementasymptotiqued'intégralesdefeynman.Séminairedeprobabilités(Strasbourg),1982. [CF08℄S.Cau hemezandN.Ferguson.Likelihood-basedestimationof ontinuous-timeepidemi modelsfromtime-seriesdata:appli ationtomeaslestransmissionin london.J.R.So .Interfa e,2008. [EK86℄S.N.EthierandT.G.Kurtz.MarkovPro essesChara terizationand onvergen e.WileyInters ien e,1986. [GC90℄V.Genon-Catalot.Maximum ontrastestimationfordiusionpro essesfromdis reteobservations.Statisti s,1990. [GS09℄A.GloterandM.Sorensen.Estimationforsto hasti dierentialequationswithasmalldiusion oe ient.Sto hasti Pro essesandtheirAppli ations,2009. [SU03℄M.SorensenandM.U hida.Smalldiusionasymptoti sfordis retelysampledsto hasti dierentialequations.Bernoulli,2003.

(4)

Application to Epidemics

Settle the context

Classical SIR Epidemics S.D.E.

dst

=

−λ

stitdt +

1

N

pop

λ

stitdB1

(

t)

dit

=

stit

− γ

it

)

dt −

1

N

pop

λ

stitdB1

(

t) +

1

N

pop

γ

it

dB2

(

t)

Specicity : Multidimensionnal process, small diusion coecient, parameters (λ, γ)

both in drift and diusion function, (few observations available)

Theoretical Context

dX



t

=

b(α, X

t



)

dt + σ(β, X

t



)

dBt

,

X0

=

x0

∈ R

p

Discrete observation : X



t

at times tk

=

k∆ on a xed interval [0, T ] (T = n∆)

σ(β,

x) ∈ Mp

(R), b(α, x) ∈ R

p

, Σ(β,

x) = σ(β, x)

t

σ(β,

x) invertible.

Existing results : Maximum Contrast Estimators for high frequency data (linking  and

n)

(5)

Application to Epidemics

Main idea of our inference approach (extension of Genon-Catalot(90))

Taylor's stochastic expansion formula (Azencott (82),Wentzell-Freidlin(79))

X



t

=

x

α(

t) + gα,β(

t) + 

2

R

α,β



(

t)

where xα(t) is the deterministic solution

dx

α

(t)

dt

=

b(α, xα(t)), x(0) = x0

∈ R

p

,

dgα,β(t) =

b

x

(α,

xα(t))gα,β(t)dt + σ(β, xα(t))dBt

,

gα,β(0) = 0R

p

and R



α,β

satises

sup

t∈[0,T ]

{k

R



α,β

(t)k} −→

P,→0

0, E

h

k

R



α,β

(t + h) − R

α,β



(t)k

2

i

Ch.

Main Idea

Compute the likelihood of the Gaussian process Y



t

=

xα(

t) + gα,β

(

t)

Derive a Contrast process for X



(6)

Application to Epidemics

Properties of g

α,β

Φ

α

the Resolvent matrix of the linearized ODE

Let Φα

be the invertible matrix solution of

α

dt

(

t, t0

) =

∂b

x

(α,

xα(

t))Φα(

t, t0

),

with Φα(t0

,

t0

) =

Ip.

Properties of g

α,β

gα,β

is a Gaussian process for which we can obtain the analytic expression.

gα,β(

tk

) = Φα(

tk

,

tk−1

)

gα,β(

tk−1

) +

Z

k

α,β

Z

α,β

k

are independent N

“0, S

α,β

k

” variables.

S

α,β

k

=

1

Z

t

k

t

k−1

Φα(

tk

,

s)Σ(β, x

α(

s))

t

Φα(

tk

,

s)ds

(7)

Application to Epidemics

Likelihood of the Gaussian process and consequences of the approach

log-likelihood of the Gaussian process Y



t

L∆,

(α, β)

=



2

n

X

k=1

log

hdet “S

α,β

k

”i

+

1

n

X

k=1

t

Nk

(α)(

S

α,β

k

)

1

Nk

(α)

with N

k

(α)

=

Yt

k

xα(

t

k

) − Φα(

t

k

,

t

k−1

)

hYt

k−1

xα(

t

k−1

)

i

.

Dene MLE estimators

( ˆ

α,∆, ˆ

β,∆) =

argmin

(α,β)∈Θ

U∆,

(α, β)

Remark on low frequency data

(and n) is xed : no asymptotic result on ˆ

β,∆

Adaptation needed

(8)

Application to Epidemics

Construct the contrast for low frequency data (∆ and n xed)

Contrast process for the diusion X



t

(β unknown)

¯

U



(α)

=













2

n

X

k=1

log

hdet “S

α,β

k

”i

+

1

n

X

k=1

t

N

k

(α)







(

S

k

α,β

)

1

N

k

(α)

with N

k

(α)

=

X

t

k

x

α

(

t

k

) − Φ

α

(

t

k

,

t

k−1

)

h

X

t

k−1

x

α

(

t

k−1

)

i

.

Dene MCE estimator

¯

α

=

argmin

(α)∈Θ

α

¯

U

(α)

(9)

Application to Epidemics

Construct the contrast for low frequency data (∆ and n xed)

Contrast process for the diusion X



t

(β unknown)

¯

U



(α)

=













2

n

X

k=1

log

hdet “S

α,β

k

”i

+

1

n

X

k=1

t

N

k

(α)







(

S

k

α,β

)

1

N

k

(α)

with N

k

(α)

=

X

t

k

x

α

(

t

k

) − Φ

α

(

t

k

,

t

k−1

)

h

X

t

k−1

x

α

(

t

k−1

)

i

.

Dene MCE estimator

¯

α

=

argmin

(α)∈Θ

α

¯

U

(α)

(10)

Application to Epidemics

Construct the contrast for low frequency data (∆ and n xed)

Contrast process for the diusion X



t

(β unknown)

¯

U



(α)

=













2

n

X

k=1

log

hdet “S

α,β

k

”i

+

1

n

X

k=1

t

Nk

(α)







(

S

k

α,β

)

1

Nk

(α)

with Nk

(α)

=

X

t

k

xα(

tk

) − Φα(

tk

,

tk−1

)

h

X

t

k−1

xα(

tk−1

)

i

.

Dene MCE estimator

¯

α

=

argmin

(α)∈Θ

α

¯

U

(α)

(11)

Application to Epidemics

Construct the contrast for low frequency data (∆ and n xed)

Contrast process for the diusion X



t

(β unknown)

¯

U



(α)

=













2

n

X

k=1

log

hdet “S

α,β

k

”i

+

1

n

X

k=1

t

Nk

(α)







(

S

k

α,β

)

1

Nk

(α)

with Nk

(α)

=

X

t

k

xα(

tk

) − Φα(

tk

,

tk−1

)

h

X

t

k−1

xα(

tk−1

)

i

.

Dene MCE estimator

¯

α

=

argmin

(α)∈Θ

α

¯

U

(α)

(12)

Application to Epidemics

Results for low frequency data (∆ and n xed)

Contrast process for the diusion X



t

(β unknown)

¯

U

`α, (

Xt

k

)

k∈{1,..,n}

´ =

1

n

X

k=1

t

Nk

(α)

Nk

(α).

Results for low frequency data and β unknown

Under classical regularity assumptions on b and σ,

and identiability assumption : α 6= α

0

⇒ {∃

k, 1 ≤ k ≤ n, xα(

t

k

) 6=

0

(

t

k

)}.

¯

α

=

argmin

α∈Θ

a

¯

U

(α)

satises



1

( ¯

α

− α

0

) −→

→0

N (

0, J

1

0

, β

0

))

where J∆(α

0

, β

0

)

do not converges toward Ib

0

, β

0

)

as ∆ → 0(the Fisher

Information Matrix) .

(13)

Application to Epidemics

Additionnal information on β for low frequency data

In SIR-Epidemics α = (λ, γ) = β

Case of useful additionnal information

β =

f (α)

Σ(β,

x) = f (β)Σ

0

(

x)

Return on the log-likelihood of the Gaussian process Y



t

L∆,

(α, β)

=



2

n

X

k=1

log

hdet “S

α,β

k

”i

+

1

n

X

k=1

t

Nk

(α)(

S

α,β

k

)

1

Nk

(α)

with Nk

(α)

=

Yt

k

xα(

tk

) − Φα(

tk

,

tk−1

)

hYt

k−1

xα(

tk−1

)

i

.

Dene MLE estimators

( ˆ

α

,∆

, ˆ

β

,∆

) =

argmin

(α,β)∈Θ

(14)

Application to Epidemics

Additionnal information on β for low frequency data

In SIR-Epidemics α = (λ, γ) = β

Case of useful additionnal information

β =

f (α)

Σ(β,

x) = f (β)Σ

0

(

x)

Return on the log-likelihood of the Gaussian process Y



t

L∆,

(α, β)

=



2

n

X

k=1

log

hdet “S

α,β

k

”i

+

1

n

X

k=1

t

Nk

(α)(

S

α,β

k

)

1

Nk

(α)

with Nk

(α)

=

Yt

k

xα(

tk

) − Φα(

tk

,

tk−1

)

hYt

k−1

xα(

tk−1

)

i

.

Dene MLE estimators

(15)

Application to Epidemics

Additionnal information on β for low frequency data

In SIR-Epidemics α = (λ, γ) = β

Case of useful additionnal information

β =

f (α)

Σ(β,

x) = f (β)Σ0

(

x)

Return on the log-likelihood of the Gaussian process Y



t

L∆,

(α, β)

=



2

n

X

k=1

log

hdet “S

α,β

k

”i

+

1

n

X

k=1

t

Nk

(α)(

S

α,β

k

)

1

Nk

(α)

with Nk

(α)

=

Yt

k

xα(

tk

) − Φα(

tk

,

tk−1

)

hYt

k−1

xα(

tk−1

)

i

.

Dene MLE estimators

( ˆ

α

,∆

, ˆ

β

,∆

) =

argmin

(α,β)∈Θ

(16)

Application to Epidemics

Additionnal information on β for low frequency data

In SIR-Epidemics α = (λ, γ) = β

Case of useful additionnal information

β =

f (α)

Σ(β,

x) = f (β)Σ0

(

x)

ff

S

α,β

k

= ˜

S

k

α

Contrast process for the diusion X



t

(with information on β)

˜

U

(α)

=













2

n

X

k=1

log

hdet “

S

˜

α

k

”i

+

1

n

X

k=1

t

N

k

(α)(

S

˜

k

α

)

1

N

k

(α)

with Nk

(α)

=

Xt

k

x

α(

tk

) − Φα(

tk

,

tk−1

)

hXt

k−1

x

α(

tk−1

)

i

.

Dene MCE estimator

˜

(17)

Application to Epidemics

Results for low frequency data (∆ and n xed)

Contrast with information on β

˜

U

(α) =

1

n

X

k=1

t

Nk

(α)(˜

S

α

k

)

1

Nk

(α)

Results

Under the same assumptions (regularity and identiability)

˜

α

=

argmin

(α)∈Θ

a

˜

U

(α)

satises



1

( ˜

α

− α

0

) −→

→

0

N (

0, I

1

0

, β

0

))

with I∆(α

0

, β

0

) −→

∆→

0

Ib

0

, β

0

)

(18)

Application to Epidemics

Results for high frequency data (∆ → 0) (without linking  and n)

Contrast process

Using kS

α

0

0

k

− Σ(β

0

,

Xt

k−1

)k −→

,∆→

0

0, we consider :

ˇ

U∆,(α, β))

=



2

n

X

k=1

log

hdet “Σ(β, Xt

k−1

)

”i

+

1

n

X

k=1

t

Nk

(α)Σ

1

(β,

Xt

k−1

)

Nk

(α)

Asymptotic Normality

Under classical regularity and identiability assumptions on b and σ

( ˇ

α,∆, ˇ

β,∆) =

argmin

(α,β)∈Θ

ˇ

U∆,

(α, β)

satises

„

1

( ˇ

α,∆

− α

0

)

«

−→

N

0,

I

1

b

0

, β

0

)

0

««

(19)

Application to Epidemics

Some generalities

Transmissible disease : a world of incomplete and aggregated data

Date of infection and recovery of an infected individual unknown

Total Number of new infected cases collected at regular time interval (days or

week)

Modelisation : the simpler, the better

S

λ

I /N

→ I

pop

→ R

γ

Ethier & Kurtz diusion approximation

dst

=

−λ

stitdt +

1

N

pop

λ

stitdB1

(

t)

dit

=

stit

− γ

it

)

dt −

1

N

pop

λ

stitdB1

(

t) +

1

N

pop

γ

it

dB2

(

t)

(20)

Application to Epidemics

Estimation over simulations

Simulations using Matlab

Simulations over 1000 runs of a scenario close to inuenza (λ = 0.4 days

1

,

γ =

1/3 days

1

, T = 50)

Références

Documents relatifs