Estimation and Forecasting of a Lag 2 Dynamic Model for Infectious Diseases
by
©Chen Zhang
A thesis submitted to the School of Graduate Studies
in partial fulfilment of the requirements for the degree of
faster of Science
Department of M athernatics f!j Statistics l\!Iemorial University of Newfoundland
A'ugust, 2012
ewfoundland
Abstract
When a common inf
ec
tious disease is first detected
ina community,
it may quickly spread out througha
ir, water ,
public facilities and personalcontacts
. At agiven
time point, each infected individual may or may not infect other individuals inthe community. Meanwhile,
it is also possible that some individuals whocarr y the same
disease travel intothe community.
In the present work, we
discuss estimationand
forecasting of an extension to the
lag 1 longitudinal
dynamic model for correla
ted data used by Oyet & Sutradhar (2011) for
modelling the spreadof infectious disease.
The lag 1 model only allow individuals with infection
a
tt
ime pointt -
1 to cause new infectionsat t
ime pointt.
Clearly, if attime point t -
2, there isan
individual who
is still infected by the disease, it is also possible for this individual to infectothers at
time pointt .
The present model discussed inthis work allows for such a
possibility.During the modelling, we
consider stationa
ry and nonstationary covariates. We alsoext end
the model to situations where unobservable community effect and t
he latentco
mmunity effect is present. The regressionpara
meter/3 a
nd the parameter of latentcommunity effect
CJ~are estimated by generalized quasi-likelihood
( GQL) approach.The correlation parameters p1
and
p2a
re estimated by using method of moments. Ineach of the cases,
we examined the accuracy ofthe estimates a
nd forecasts through simulationstud
ies.A cknow le d gem e nts
I want to express my deep
est
gratitude
to my supervisor, Dr. Alwell Oyet in paticular for his guidance, inspirations andencouragement.
Thethesis
would be impossible without his intellectual, practical and financial support.My sincere thanks also goes to Dr. Brajendra C. Sutradhar for providing me the very useful background used in
the
thesis. I have learned a lot from his severalcourses
.It is my pleasure to thank Dr. Yuan Yuan, Dr. Zhaozhi Fan, Dr. Jie Xiao, Dr. Xiaoqiang Zhao, Dr. Chunhua Ou,
and Haiyan
Yang for their advice and en courage-
ment. I have really enjoyed my time with them.I would like to extend my thanks to all my friends, especially Zhen Wang, for her
constan
t encouragement and support during my masters study.I wish to acknowledge with thanks the administrative staff and computer
tech-
nicians in the Department of Mathematics and Statistics for making their time and
helpavailable
to me.Words fail me to
express
my appreciationto
my parents whose dedication,
loveand persistent co
nfidence in me, have taken
the loadofl my shoulder.
The financial support of
the
MemorialUniversity
of Newfoundland
is gratefullyacknowledged.
Finally, I would like to thank everybody who was important to th
e successful real-
ization of thesis, as wellas expressing my ap ology t
hat I could not mention personally one by one.Contents
Abstract
Acknowledgements
List of Tables
List of Figures
1 Introduction
Poisson AR( 1) Process 1.1
1.2 1.3
Generalized Quasi-Likelihood ( GQL) Motivation . . . . . . . . . . . . . . .
2 Lag 2 Dynamic Binary Sum Infectious Disease Model 2.1 Preliminaries . . . . . . . . . . . . . . . . . . 2.2 Lag 2 Fixed Binary Sum Infectious Disease Model
11
111
V11
Vl11
1
2 3 5
6
6
72.2.1.2 The Variance . 2.2.1.3 The Covariance 2.2.1.4 The Correlation.
2.2.2 Estimation of Parameters of the Lag 2 Fixed Binary Sum In- fectious Disease Model . . . . .
2.2.2.1 GQL Estimation of
fJ
2.2.2.2 MM Estimation of Pt and P2 2.2.3 Forecasting Performance
2.2.4 Simulation Study . . . .
2.3 Lag 2 Mixed Binary Sum Infectious Disease Mixed Model . 2.3.1 Basic Properties of the Proposed Mixed Model .
2.3.1.1 Conditional Properties . . 2. 3 .1. 2 Unconditional Properties . 2.3.2 Estimation of Parameters
2.3.2.1 2.3.2.2 2.3.2.3
Estimation of
fJ .
Estimation of P1 and P2 Esitmation of (}2
2.3.3 Simulation Study . . . . .
3 Lag 2 Dynamic Binomial Sum Infectious Disease Model
3.1 Preliminaries . . . . . . . . . . . . . . . . . . 3.2 Lag 2 Binomial Sum Infectious Disease Model
11 14 16
17 17 18
20 22
31 33 33 3335
36
37
39 43
46
46
47
3.2.1 Moments of Lag 2 Binary Sum Infectious Disease Model 50 3.2.1.1 The Mean . . . . . . . . . . . . . . . . . . . 503.2.1.2 The Variance . 3.2.1.3 The Covariance 3.2.1.4 The Correlation .
3.2.2 Estimation of the Parameters of the Lag 2 Binary Sum Infec- tious Disease Model . . . . . . .
3.2.2.1 GQL Estimation of
fJ
3.2.2.2 MM Estimation of p1 and P2 3.2.3 Forecasting Performance
3.2.4 Simulation Study
4 Concluding Remarks
Bibliography
51 54
57
58 58
59
61
63
71
74
List of Tables
2.1
Stationary Model Parameters Estimation Results.24 2.2
Nonstationary Model Parameters Estimation Results.25 2.3
Stationary Model Forecasting Error . .26 2.4
N onstationary Model Forecasting Error.27 2.5
Non-stationary/3
Estimation for fixed p1 , p2 and 0"244 2.6
Nonstationary p1 and p2 Estimation for fixed f] and 0"244 2.7
Non-stationary 0"2 Estimation for fixed/3,
p1 and P244
2.8
Non-stationary Parameters Estimation45
3.1
Stationary Model Parameters Estimation Results.64 3.2
Non-stationary Model Parameters Estimation Results.65
3.3
Stationary Model Forecasting Error ..66
3.4
Non-stationary Model Forecasting Error.67
List of Figures
2.1 P
lots for forecasting performance of binary sum model with stationary covariates .. . .
..
. . .. .
. .. . . . .
. . ..
..
. ..
. ..
.. . 29 2.2 P
lots for forecasting performance of binary summodel
withnonsta-
tionary covariates . . . . . . .
.
. ..
. . . . . ..
. . . .. .
. . .. . 30
3.1 P
lots forforecasting performance of binomial sum model with station-
ary covariates. . .
. . . . . . . . . . . ..
. .. . .
.. .
. . . . . 683.2 Plots of forecasting performance of binomial
sum model with nonsta-tionary covariates
. . ..
. . . . . . . .. .
. . . .. . . .
. .. . . .
.69
Chapter 1
Introduction
In recent years, statistical models have shown to be of great value in the modelling of infectious disease. In particular, Oyet & Sutradhar (2011) proposed a branching process with immigration type longitudinal count data model for modelling the num- ber of infections in each community. In their paper, they assumed that one infected person may possibly infect none, one or more individuals in a small time interval.
Some immigrants with the same disease may enter the community during the same time interval which will increase the number of infected individuals in the commu- nity. Let Yit. be the number of infected individuals at time
t (t
= 2, 3, ... , T) in the community i (i=
1, 2, ... , K), Oyet & Sutradhar (2011) model is given bywith assumptions:
Yi,t-1
Y.it =
L
BJ(nt>p) +
ditj=l
Assumption 1. Y'il rv Poi(p,i1 =
exp(x;,d3) ).
(1. 1)
Assumption 2. dit rv Poi(P,it - Pl/-Li,t-d fort= 2, ... , T with /-Lit = exp(x~1(3), for all
t
= 1, ... ,T.
Assumption 3. dit. and Yi,t.-1 are independent fort = 2, ... , T.
For model (1.1), they found that for all t = 1,2, ... ,T and k = 1,2, ... ,T-1
E(Yil,)
Cov(Yit, Yi , t- k)
/-Lit'
(
k-l )
IT n t -l l
l=O
ai,t-k,t-k a itt
Note that when n1 = 1, for all
t
= 1, 2, .. ,T ,
the binomial random variable BJ(n1,p)
will become a binary variable with the probability of infection p. The model (1.1) will then reduce to an autoregressive, of order 1, (AR(1)) type Poisson process. This reduced model implies that the infectious individuals at timet -
1 can only infect none or one individual at time pointt.
1.1 Poisson AR(1) Process
the process can be written as
Yi .. t-1
YiL
= L
Bj(p)+
diL·j=l
(1.2)
Sutradhar (2011) (see also McKenzie (1988)) used (1.2) for
t =
1, 2, ... ,T
to model longitudinal count data over time. The basic properties of the model arck d
c ( '' '\/" )
k f iit.p P,u an orr I it, I i,L-k
=
p - . /-LikMckenzie (1988) showed that the distribution of the process (1.2) is Poisson with mean p,;1,
=
exp(x~1
/3) by using alternate probability generating function (a.p.g.f.s). We can consider this model as a model for spread of disease for only some limited cases because it only allows each of the Yi,L-l infected individuals at timet-
1 to infect at most one individual. When Yi,L- l is considered to be an offspring variable at timet-1 and d;t is the immigration variable, the model (1.2) represents a branching process with immigration for ]( = 1 and largeT .
This model -vvas recently considered by Sutradhar, Oyct and Gadag (2010) as a special case of a negative binomial branching process with immigration.1.2 Generalized Quasi-Likelihood (GQL)
For the longitudinal regression setup, interest may be focused on the regression parameters for the marginal expectations of the longitudinal responses and the longi-
ing" correl
a
tion matrix basedgeneralized estima
tingequ at
ion (GEE) approach for
theestimation of
the regression parameters and generalized quasi-likelihood (GQL) estimation approach. The
GEEapproach
was proposedby
Liang & Zeger(1
986).It has been used
extensively
in recent years inestimation for longitudinal count re- sponse
. However,as
demonstrated by Crowder (1995), because of th e
uncertainty in the definition of the workingcorrelation
matrix, the GEE approach may
in somecases
lead toa co
mplete
breakdown of theestimation of t
he regression parameters.Furthermore, Sutradhar
a
nd Das (1999) have demonstrated that even t
hough
the GEE approach in many situations yieldsconsistent estimators for t
he regressionpa-
rameters, the GEE approach may, however, produce less efficient estimates than the
independenceassumption
based quasi-likelihood(
QL) or moment estimates. Sutrad- har((2011) p
.4)suggests t
hatbased on s
tudies byCrowder (1995)
, Sutradh ar an d
Das (1999),
Sutradhar(2003), and
Sutradh ar (2010),
theGEE approach can not
be trusted for regressionestimatio
n in discrete models suchas
longitudinalbinary or
count
data. Sutradhar (2003, Section3) therefore suggested an efficient GQL ap-
proach for time independe
ntcovari ates which
is anextens
ionof the QL approach
(or weighted leastsqu ares approach) for the
independent data
introduced by Wedder-
burn(1974).
Sutradhar (2010) introduced nonstationary autoco
rrelation structures for the cases whencovariates are time dependent, a
nd appliedthe GQL approach
forconsistent a
ndefficient estimation of the regression effects.
1.3 Motivation
From the model ( 1.1), ( 1. 2) and ( 1. 3), we can see that these lag 1 models only allow individuals with infection at time point
t -
1 to cause new infections at time pointt.
Clearly, if at time pointt -
2, there is an individual who is still infected by the disease, it is possible for this individual to infect others at timet
as well. Wedevelop a model which include infections from time
t -
2. For simplicity, we first extend the Poisson AR( 1) process to a lag 2 model. Since the number of infections in each community may also be affected by unobserved community effects such as environmental pollution, we also extend the lag 2 model a little further by introducing a random variable to represent the latent community effect. However, these AR(1) type extended models will have similar limitations as a Poisson AR( 1) process when used to model the spread of infectious disease. That is, the infected individuals at timet -
1 ort -
2 can only infect at most one individual at timet .
Consequently, we also consider an extension to Oyet & Sutradhar's (2011) lag 1 model for infectious disease which would be more appropriate to model the number of infections in reality.Chapter 2
Lag 2 Dynamic Binary Sum Infectious Disease Model
2.1 Pr e limina ries
In this
chapter
, we begin to construct new models which will include the infor- mation from previous two stages. Suppose that we have K communities. First, we begin witha
simple case, which deals witha
n infectiou s
individual whocan
only infecta
t most one persona
t time point t. Insection
2, we consider both stationaryand nonstationary covariates assuming
that nocommunity effect
is present. That is,all communities ar e assumed to be independent
of each other, butthey a
retime-wise
simulated data.. Iu section 3,
we assumethere
isan un observable community effect
whichcan a.ff ct
our responses. For example,this
unobservable community effect 'Yi couldbe
wealth differenceor
education levels.The
mean functiondepends
on the regression effect and community effect. Wediscuss t
he structures and assumption for t
his dynamic mixed model and obtain properties of the mixed model. Finally,we
estimate the parameters involved in this mixed model with nonstationary covariates.The !llain goal of this chapter is to estimate the parameter involved in the model
and to
forecastthe
numberof infections
in community at timepo
intt.
We have foundout that the
Generalized Quasi-likelihood (GQL) method of Sutradhar (2003) for estimatingthe
regression parameters of longitudinal response works
verywell
for estimatingthe
regressioneffects
of this model. Inthe same year,
Sutradhar and Jowahccr(2003)
have also used the GQL method to estimate the variance parameterof the communi ty random effect
'Yi· 'vVc will usc this GQL approach for the estimationof our
model parameters. We estimate the longitudinal correlation parametersby
usingthe
Method of Moments (Mf) .
Once allparameters
areproperly
estimated, wet
hen useth
e information fromt
and t - 1 to examine the forecast performance of this model.2.2 Lag 2 Fixed Binary Sum Infectious Disease Model
We begin our modelling by considering the simple case where the offspring random
varia
ble is binarywith correlation
index parameters p1 and P2for two consecut
ive gen-erations. 'vVe assume that K independent communities are at the risk of an infectious disease. Suppose that at initial time point, t
=
1, y;1 individuals in the ith community developed the disease where y;1 is assumed to follow a Poisson distribution with mean parameter f./,;1=
exp(<1f3 ).
Because we assume an infected individual can only effectnone or one individual for two time intervals, and also because there may be other infected individuals arriving from other communities, we shall model the number of infected people in ith community at time t as
Yi1
Poi(p ..;1),
wheTe f-l·il = ex;1!3Y;t
Yi2
L
blj (Pl)+
d;2j=l
}/i,t-1 }/i,t-2
L
blj(PJ)+ L
b2j(p2)+
d;t, fort=3, 4, ... , T ,
j=l j=l
(2.1)
where
b
1j rvBin(p
1 ),b
21 rvBin(p
2 ). This is an extension of the AR(l) model used by Staudenmayer and Buonaccorsi (2005) and Sutradhar (2003). In model (2.1), we make the following assumptions:(3)
lf.,~,_1 & d;t are independent fort = 2, 3, .. . , T .
parameter must satisfy /-Li2 - Pli-Lil ~ 0 , yielding Pt ::=:; ~:;~. Similarly, for /-Lit -
>
0 t b t. fi d d<
{Lit-PZf'i t- ' Tl f thPti-Li,t-1 - p2J-Li,t.- 2 _ o e sa 1s e , we nee Pt _
11;
1 • -. 1ere ore, e range of P1 is
0
< < . (
/-Li2 /-Lit - P2/-Li,t-2 1) j'>
3 d1 ..
d_ p1 _ 1mn - , , , or
t _
an zxe P2·/-Lil Jl·i,t-1
For stationary case, if we let I-Lil
=
I-Li2= · · ·
= /-LiT=
J-Li, then the range of P1 simplifies to0 ::::; P1 ::::; ( 1 - P2).
2.2.1 Bas ic Prope rties of the Lag 2 Fixe d Bina ry Sum Inf e c- tious Disease Mode l
2.2.1.1 The Mean
Based on the previous discussion, we know that E[Y,;1]
=
1-Ln and E[Y,;2]=
/-L·i2·Then, it follows that for
t =
3, 4, ... T,E[Y;t,j E [ %' b
1;(pi) + % '
b,;(p,)+ d <rl
E [ y t
1b11(P1)] + E [ y t.
2b 2 1(P2)] + E [ dit .]
]=l ]=l
E,.,, _, E [ %' b
1; (Pt , ,_,] + E,.,,_, E [% ' b ,;( p,t ,_J + E[d,]
PtE[Yi,t-d
+
P2E[Yi,t-2J+
Jl·it - Pl/-L·i,t.-1 - P2/-Li,t-2·(2.2)
---l
I I I
-l
I
Next, we
consider some specific cases:For t=
3,
Pl/Li2
+
P2/Lil+
/Li3 - Pl/Li2 - P2/Lil/Li3·
Fort= 4,
/Li4.
By mathema tical induction
, if we haveE
[Yi,t-1] = f.Li,t-l and E[Yi,t-2] = f.Li,t-2, then:E[Y;t]
Pl/Li,t- 1+
P2/Li,t-2 +/Lit - P1Pi,t-l - P2/Li,t-2/Lit
exp( x ; tfJ ). (2.3)
So
,E[Y;t]
= fLit = exp(x~tf3), for allt
= 1, 2, ... ,T .
2.2.1.2 The Variance
The variance of this model can be derived by finding the conditional and uncon- ditional variances. By assumptions (2) and (3), we have
(
yi 1- 1 Y; t-2
I )
Cov L b1 j , L b 2.i Yi,t-1 , Yi ,t- 2 =
0J=l J=l
and
Cov(Yi,t,-I,dit) =
0 ,and Cov (Yi , t-2 , dit) =
0.Then
+!-Lit - Pl/-Li,t-1 - P2/-Li,t-2·
(2.4)
Letting O"i,t.- l,t-
1
represent the variance ofYi , t- 1,
thenEyi,t- l [V ar·(Yi t.I Yi ,t- 1, Yi,t-2 )] + V ary i,t-t ( E [ Yi t IYi,t-I, Yi, t-2])
+(!-Lit.- Pl/-Li,t.-1 - P2/-ti,t-2)]
+ /-Lit - P1 /-Li,t-1 - P2/-Li,t-2 ·
(2 .5)
Similarly, letting D"i,t-2,t-2 represent the variance of
Yi ,t-2,
we have2
l
+f-li t - P1/1i,t-1- P2/1i,t-2
+
p 1ai,t-l,t-l+f-lit-Ptf-li,t.- 1 - P2/1i,t-2
+
p2 lai,t-t,t-1+
V aT(JL;,t-1 P1+
1'i,t-2P2 + !"it - fh J-li,t-1 - P2/1i,t-2) /1i,t-1PI(1-
P1)+
Piai,t-l,t-1+ p~ai,t-2,t-2
+!1·i,t.-2P2(1- P2) +f-lit - P1f-l·i,t-l - P2/1i,t-2
f-lit- (J-li,t-1- ai,t-l,t-dPi - (J-li,t-2-
ai,t- 2,t-2 )P~- (2.6)
From this formula, we can see that the variance of }'i1 has a recursive relationship with the variance of
Yi ,t- J
and the variance of Yi,t-2 . We know that VaT(YiL) =
J-liland V
aT(Yi2) =
P,i2
from our assumptions. It turns out that whent =
3,When
t
= 4,/-Li4.
If
we continue doing the calculation, by mathematical induction if we assume thatO'i,t- l,i,t-1
=
/-Li,t-1 and O'i,t- 2,i,t-2=
/-Li,l.-2, then using the formula of the variance, for t = 3, 4, ... , T, we have:+P,i,t-2P2(1-P2) + J.kit - PlJ.ki,t.-1 - P2l-"i,t-2
(2.7)
So, V ar(Yit)
=
J.kit=
exp(x~tf3) for allt =
1, 2, ... , T.2.2.1.3 The Covariance
The lag k covariance between Yit and Yi,t-k will also have a recursive relationship in terms of covariance between Yi,t-l and Yi.t-k & Yi,t.-2 and Yi,t-k· By assumption (2), we have
j=l
+Cov(dit, Yi,t-d
j=l
Yi,t -l Y:i,t-2
Cov(
L
bt.i(PI ), Yi,t-k)+
Cov(L
b2j(p2), Yi,t-d ·.i=l j=l
Consider only the first term of the equation,
PtCov(Yi,t- 1, Yi,t-k)
P1 CJ;,t-t ,t-k.
Similarly, we can show that:
=
P2CJi,t.-2,t-k·Ther
efore,
we have:(2.8)
We can use this formula to calculate the covariances for
some specific cases
. For t =2,
by using the
properties of AR( 1) Poisson process, we haveFort
=
3,Fort= 4,
By summarizing lots of the specific covarianccs, We h
ave found a gener al
formula forcova
riance of Yit and Yi,t-k for allt =
2, ..., T a
nd k = 1, 2, ... , T- 1to be
t-k
Cov(Yit, Yi,t-k) = a;k
L
PtPr1 f.Li,t-j+l-A:+
ai,k-1P2/.Li,t-k,j=l
wher
e
a;0 = 0, a;1=
1, a;k = p 1a;,k-1+
p2ai,k-2, fork = 2, 3, ... ,T -
1.2.2.1.4 The Correlation
(2.9)
Once we fou
nd
outthe covari ance
between
Yitand
Yi,t-k,the
lag k correlation between Yita
nd Yi,t-k willsimply
be
We again consider some specific cases such as lag 1 and lag 2 correlations. These correlation formulas will be needed in the estimation section.
t ·-1
C ·('-/' '\/' ) - L j=l Pl~
/-li,t+l- jon
1 ·it., 1 i,t+ 1 - --'~-;::=====--J
/-lit/-li,t+ 1~~ j-1
C . ('-/' ,/' ) _
P1 L-.i=t P1fJ2 /1-i,t+l-:i+
P2/-lit01T 1ii,1i,t+2 - .
.J
!-lit/-li ,t+ 1ate that when p2
=
0, the lag k covariance and correlation will reduce to(2.11)
(2.12)
which is the same for AR(1) based count data model considered by Sutra.clhar (2010, eqns, (15)-(16), p.178). Thus, this model is an extension to the AR(1) based count data model.
2.2.2 Estimation of Parameters of the Lag 2 Fixed Binary Sum Infectious Disease Model
2.2.2.1 GQL Estimation of (3
Following Sutradhar (2011, s c.6.4.2), let J-li
=
(Mit,P,i2, · · · ,/Jit, · · · ,J-l;T)' be theT
x 1 dimensional mean vector of Yi = (Y'il,Y i2, · · · ,
Yit, · · · , YiT )'. If we assume Pt, and p2 are known, a consistent and efficient estimate of (3 can be obtained by solvingth
e so-called generali
zed quasi-likelihood (GQL)estimating equ a
tionf(
0
I"\:""' ~ 8(3 1-Li
I.:;
-1(p)(:IJi - J.L;) =
0i=l
(2.13)
as the true
correlation structure
1 Pit2 Pit:3 PilT
1 Pi23 Pi2T
Ci(P)
=
1 P·i,t-l,T 1
with Pi,t-k,t
=
Corr(Yi,t.-k, Y;t) fort =
2, ... , Tand
k=
1, ... , T-
1. It is clear that E[2::::::
1~I.:i
1(p) (:y;- J.L;)J =
0, hence the GQLestimate
will bea consisten t estima
te. This GQL estimating equation (2.13)
can be solved iteratively by usingthe
Newton Raphson iterative equationwhere ~(r-) is the value
of (3 at rth iteratio
n.obtain a good stima te
for [3. These twopa ra
meters canbe
consistently estimatedby
using themeth od of moments
. Let Sil.t, Sit.,t+l,a
nd Sit,t.+2b
e the standardizedsample varia nce, the standardi
zed lag 1 sample autocovariance and the standardized lag2 sample au tocovari
ance,resp
ectively, defined asSilt.
Sit,t+2
E[Sitt]
11\" T- !
L L
CoTT(Yit, Yi,t.+J)/ K (T-1)
;=] t=l
/\" T-2
E[Sit, t+2 ] L L
CoTr(Yit, Yi,t+2)/ K (T-2).
i=l t=l
Using first order approximation of th
e expectation of the ratio of two sample variance
,we will have moment
equations
Sit,t+l
=
E[ Sit,t . +t]
;:::j E[S;t,t+d=
E[S·]
S ;u S itt.
E[s •
• "tt]
tt,t+!Sit,t+2 _ E
[ Sit,t . +2 ] ,..., E [Sit,t. +2] _
E[S]
- - - - - ,..._, - -i 'it
t+2 .
Siu Silt
E [Sitt] ,.
Then, one may obtain the estirnates for p1 and p2 by solving the marginal moment equations
Sit,t,+l E[S ] - 0 - - - itt+l -
Sitt, '
(2.15)
Sit,t+2 [ ]
-5- - - E Sit,t+2
=
0itt
(2.16)
Due to the nonlinearity of the estimating equations (2.15) and (2.16), the solutions can be obtained by using Newton's iteration method.
(2.17)
(2.18)
where Pl(r) and
/5
2(r) are the values of p1 and p2 at rth iteration respectively.2.2.3 Forecasting Performance
Once all parameters of the model (2.1) have been estimated, we can carry out a one-step forecast for the purpose of planning and control. From model (2.1), it is clear that the conditional mean of
Yi
1 givenYi ,t-l
andYi,t-
2 will have the formulaNext
,
if we define an l-step ahead forecasting function of Yi,t+l as Yit(l) = Yi,t+l E(Yi,t+tiYi,t+l-l,Yi,t+L-2), then, from (2.17), the one step ahe
ad forecasting function is given byYit( 1)
E(Yi,t+11Yit, Yi,t.-1)fLi,t.+l + Pl(Yit - fLit) + P2(Yi,t.-l - f./,i,t-1),
(2
.20)with
Y it(O) =
Yit· Once we have a onestep
ahead forecast, we cancalculate
the forecast error
e.iL ( 1) by usingeit .(1)
Yi,t+l -Y;, .(1)
(Yi,t+l - fLi,t+d-Pl(Yit - fLit)- P2(Yi,t-l - f./,i,t-J). (2.21)
From the above equation, we noti
ced
that the conditional meanand the mean of ei1• ( 1) is
The conditional variance of e;1
(1) lyil. ,
Yi,t-1 is given byV ar(Yi,t+liYiL, Yi,t-d
Then, the variance of C;t(1) follows the formula
VaT (Cit ( 1))
2 2
/-Li,t+l - P1 /-Lit - P2fJ·i,t-l· (2.22)
2.2.4 Simulation Study
In this
section ,
we perform a simulation study. Weconsider the
case of K=
100
communities
and T=
5t
ime points. We will use fit,t =
1,2, 3
,4
for the purpose of estimation and tryto
forecast the number of infections att = 5
in eachcomrnunity.
First, we consider atirr1 e
independentcovariate
vector x:t= (x;n,
X;t2) for the stationarycase,
where X;nand x;
12a
regenerated
as follows:and
0, t = 1,2,3,4,5;i = 1,2, ...
,if
(2.24) 1,
t
= 1,2,3,4,5;i=if+
1, ... ,K.Then we consider a time dependent covariate vector x~L = (xitl, xit2) for studying the nonstationary case, where X-in and Xit 2 are generated as follows:
and
- 1,
1,
;:z;iLl = 0, 0.5,
1,
- 1,
Xit.2 = 0,
0.5,
0.5+(L-1)0.5 T
t
= 1, 2; i = 1, 2, ...,if
t
= 3, 4, 5; i = 1, 2, ...,if
t
= 1; 'i =if+
1, ... , J(t
= 2, 3; i =if+
1, ... , J(t
= 4, 5; i =if +
1, ... , J(t =
1, 2, 3, 4, 5; i=
1, 2, ... , ~- 1· . - /( 1 31(
t -
'z- 4+ ' . .. ,
4. [( 3I<
t
= 2.3; z = 4+
1, ... , 4 4 5. . [( 1 3[(t = ' ' z =
4+ ' . .. ,
4. 3[(
t
= 1,2,3,4,5;z = 4+
1, ... ,K.(2.25)
(2.26)
Even though we have only two covariates in simulation, however, in practical cases,
factors
, such as geographic
locations and policy restrictions, or, they can also be timedependen
tfactors,
such as economic situations
and age.From t
he assumptions,
weknow
that0 <
p<
min( l:!:i1.
J.L;,-pzJ.Li,t-21)
j"ort >
3. Since P·i
s the lag 2- I - J.Lil' J.Li,t-1 ' ' - 2
correlation,
we
can naturally assumethat
p2<
p1 .So
,we
choose a small P2,then
compute the upper bond
p*=
min(l:!:i1. ,,. ;,-p
21~i. t-z 1).
Then we use this
p2 and) JLil ) JLi,l-1 ) )
P1 =
p; -0.1
or p1=
p~-0.2 as t he
true values of P1 and P2 for
the simulation. Using suitableinitial
valuesof /3,
p1 and p
2,we
solvethe ma rginal
estimating equation for/3 by using Newton Raphson
algorithm. Then
byusing
theinitial
valu s of p1 & p2and
the estimat
e of/3 obtained from previous
step,we
obtain estimatesfor
p1 & P2by using moment es timating equat ions. We
use estimated p1 ,p2 to estimate/3
again,then
use thisnew (3
and repeat the above steps to get theimproved estimates of p
1 and p2.This
itera tive
step continuesuntil
convergence. The table below shows estimated/3
and
p1,p
2from 1000
simulations.Table 2.1: Stationary Model Parameters Estimation Results.
P
arameter Esti mation
(3
PIP 2 {3
SE8 P1 SEPIP2
SEP2 (0.5,1.0)
0.400.10 (0.508, 0.994)
(0.224, 0.123)0.388 0
.060 0.106 0.059 (1.0,1.0) 0.40 0.10
(1.000,1.000) (0.
260,0.136
) 0.3880
.0600.1 02 0.057
(0.5,1.0) 0.35 0.20
(0.517,0.992) (0.224, 0.128)
0.345 0.0550.178 0.064
(1.0,1.0) 0.35 0.20
(1.018,0.990)
(0.257,0.138)
0.3500.
0570.176 0.064
(0.5,1.0) 0.60 0.20
(0.522,0.987)
(0.283,0.155
)0
.5710.
0570.177 0.070
(1.0,1.0) 0.60 0.20
(1.052,0.974)
(0.304,0.162) 0.
5750.059 0.1
700.067
(0.5,1.0) 0.75 0
.20 (0.524, 0.991) (0.320, 0.178) 0.7050.051 0.
1630.040
Table 2.2: onstationary Model Parameters Estimation Results.
Parameter Estimation
/3
PI P2s
SEfiP1
SEPI P2 SEP2(0.5, 1.0) 0.40 0.10
(0.500,
0.997) (0.070, 0.131) 0.393 0.074 0.128 0.080(1.0,1.0)
0.40 0.10(1.003
, 0.995)(0.070, 0.123)
0.393 0.084 0.145 0.091(0.5,
1.0) 0.35 0.20(0.502,
0.996)(0.072, 0.135) 0.362 0.073 0.174 0.084 (1.0,1.0)
0.35 0.20(1.002,
0.992)(0.073,
0.128) 0.366 0.075 0.175 0.093(0.5
, 1.0) 0.60 0.20(0.497,
1.000) (0.069,0.131
) 0.5870.078 0.197 0.104 (1.0
,1.0) 0.60 0.20(1.002
, 0.995) (0.067,0.119)
0.586 0.081 0.210 0.119(0.5
, 1.0) 0.75 0.20(0.500,
0.994) (0.064,0.124)
0.725 0.072 0.189 0.102(1.0,1.0)
0.75 0.20(1.002
, 0.996) (0.067,0.117) 0.723 0.078 0.209 0.129 (0.5, 1.0)
0.60 0.30(0.495,
1.004)(0.070, 0.140
)0.593 0.075 0.264 0.107 (1.0,1.0)
0.60 0.30 (1.002, 0.994) (0.071,0.125)
0.596 0.0820.266
0.120(0.5,
1.0) 0.45 0.40 (0.498, 0.998) (0.071, 0.142) 0.4730.067 0.307 0.091 (1.0
, 1.0) 0.45 0.40(0.996
, 1.003) (0.072,0.131
)0.475 0.076
0.288 0.105From Table 2.1 and Table 2.2, we can see that the estimates for
f3
are very closeto the
true value off3
irrespectiveof the combinations of parameters.
However,for
some combinationsof p
1 andp
2,the estimates for p
1 and p2 maynot
be as accurateas the
others,for
instance, understationary
case,when true
p2 =0.20,
accordingto the rho
1 restriction from our model assumption, rho1 shouldbe less
the 1 - p2 =0.8.
If the true combination
is
PI = 0.75and
PI = 0.20 ,the estimates are
lessclose to the
truevalues
compareto other
combinations.Similar result
happens when true
p1 =0.45 or 0.55,
p2 = 0.40.The
upper boundary for p2is 0.
5because
weh
aveassumed
that p12: p
2.These estimates arc less close to the
true values than others becausep
1or
p2 are close toits boundary.
For the purpose of examining the forecast performance of the model (2.1)
in fore- casting the future infections, we usethe parameter
estimates obtained by using onlyfunction in Section 2.2.3 to compute a one-step ahead forecast of the fifth observation.
The sum of squares of the forecast error as well as the variance of the forecast error for these 100 communities were calculated for each simulation run. We denote the average sum of squares of the forecast errors and the average variance of the forecast error by ASS and AV respectively. The results.summarized from 1000 simulations, are reported in Table 2.3 and Table 2.4.
Table 2.3: Stationary Model Forecasting Error.
(3 P1 P2
ASS AV
(0.5, 1.0) 0.40 0.10 1.658 1.743 (1.0, 1.0) 0.40 0.10 2.153 2.122 (0.5, 1.0) 0.35 0.20 1.802 1.798 (1.0, 1.0) 0.35 0.20 2.159 2.134 (0.5, 1.0) 0.60 0.20 1.298 1.353 (1.0, 1.0) 0.60 0.20 1.561 1.605 (0.5, 1.0) 0.75 0.20 0.872 1.007 (1.0, 1.0) 0.75 0.20 1.036 1.189 (0.5, 1.0) 0.60 0.30 1.190 1.288 (1.0, 1.0) 0.60 0.30 1.431 1.543 (0.5, 1.0) 0.45 0.40 1.381 1.456 (1.0, 1.0) 0.45 0.40 1.658 1.743
Table 2.4: Nonstationary Model Forecasting Error.
(3 P1
P2 ASS AV
(0.5, 1.0) 0.40 0.10
2.726 2.654 (1.0,
1.0) 0.400
.10 4.539 4.360(0.5,
1.0) 0.35 0.20 2.775 2.695(1.0,
1.0) 0.35 0.204.623 4.422 (0.5,
1.0) 0.60 0.20 2.1032.047
(1.0, 1.0) 0.600.20 3.477 3.368
(0.5, 1.0) 0.750.20
1.490 1.534(1.0 ,
1.0) 0.75 0.202.5
182.513 (0
.5,1.0)
0.60 0.30 1.993 1.971(1.0, 1.0)
0.600.30 3.329
3.326 (0.5, 1.0) 0.45 0.402
.329 2.298 (1.0, 1.0)
0.450.40 3.906 3.831
From Table
(2.3) and Table (2
.4), we see t
hat the value of average sum of squa res
andthe
averagevariance
of theforecast errors
are very close to each otherf or all
different combinations of p arameters. This indi cates
that the average sum ofsqua res
ofthe
forecast
errors canclosely estimate the average
variance of the forecast errorsand a
satisfactoryp erformance of the estimation of the
parametersof the model. It
could a
lsobe seen that t
he average variance andsum of squa res of forecast errors for
thenonstationary model a re generally
larger thanthe
stationarycase which
meanswe
havemore accurate estima tes for the pa ramete rs
in stationary case.
This makes sense because in s ta tionary case, the covariates does not chan ge with
respect to timet .
Hence less variation was introd
ucedinto the modelling.
Noteth a t for
(3=
(0.5, 1),the a verage
varianceof for ecasting error is sm aller
than that of (3 = (1, 1). This is
because t
hemean function
contains (3and th e forecasting varia nce
isa
functionof
means.
In ourparticular se tup, (3 = (0.
5, 1) will lead to a smaller sum of means than(3
=
(1, 1). For example, in stationary case, we have x~,t+l=
x~t=
x;,~,_1 =
x~ and f.Li,t+l=
f.L;t=
f.Li,t-l=
~ti· If we let ~tio, f.L·ib and represent the means when (3=
(0.5, 1) and (3 = (1, 1) respectively. Var(eia(1)) and Var(eib) are defined by (2.22) using Jl.ia and f.Lib respectively. We also let AVa and AVb be the average variance of forecast errors when (3 = (0.5, 1) and (3 = (1, 1) respectively, ThenConsidering the stationary covariate structure as (2.21) and (2.22), we find out that
0, and we know that p~
:S
p1 and p~:S
p2 since our p1 and p2 are numbers within interval 0 to 1. So 1 -P I -
p~2:
0. Therefore,Therefore, under stationary case, we expect to see that the average variance of fore- casting error is smaller when (3
=
(0.5, 1) than when (3=
(1, 1), regardless the combi- nation of p1 and p2 . This situation still holds for nonstationary case in our particular setup.~ l
II d
I20 40 60 80 100 0 20 40 60 80 100
(a) (b)
~ --- .-.-.--.·.·---·- ----
.. ----... -.·-----
--···- ---
·-.. ······----
·-·--- ..-
-. 1. ~ --- --- - ---- --- - --
I1,(') 0 - - - -·- - - - - - - - - - -
<'")
ci --·---·- - - - · --· - - - - · - - - .. N
6
~ ~
ci I I I I I I o I I I I I I
20 40 60 80 100 20 40 60 80 100
(c) (d)
~ ~---
~
0l
~~~~,~~~-,----.1---~1- . . ... ~ 1
20 40 60 80 100 0 20 40 60 80 100
(e) (f)
Figure 2.1: A plot of (a) values of stationary mean fort = 1 (solid line), t = 2 (dashed line), t = 3 (dotted line), t = 4 (dotted dashed line); (b) values of stationary variance for t = 1 (solid line), t = 2 (dashed line), t = 3 (clotted line), t = 4 (dotted dashed line); (c) values of stationary lag 1 correlation for t= 1 (solid line), t = 2 (dashed line), t = 3 (clotted line); (d) values of stationary lag 2 correlation for t= 1 (solid line), t = 2 (dashed line); (e) Average forecast overlaid on average of longitudinal data; and (f) proportion of absolute values of forecast error that are 0 or 1 (solid line) and
>
1 (clotted line); by communities obtained from 1000 simulations with p1 = 0.35, P2=0.20, ;3=(1,1), stationary covariates (2.23)-(2.24).
'It~
· -·- · -·-·I .
"'t~
- - - - · - - - ,I
...... -.-. --~ ------.-----.-.-.-.----- ....... -~.-.-.---.---.---.-.-.-.-
: • . . -. ... .. . ~._ ;~-... -,.L. ... -... ~ : '· . . . :- - j"'."4'.&.".L" ... ~
I ~ I I I I I : I I I I
0 20 40 60 80 100 0 20 40 60 80 100
(a) (b)
0 20 40 60 80 100 0 20 40 60 80 100
(c) (d)
]~~~~I
I I I I I I
. 0'(f'
~
~~ • • • • • - - · · · I;
. .·
... · ... •.················~o I I I I I I
0 20 40 60 80 100 0 20 40 60 80 100
(e) (f)
Figure 2.2: A plot of (a) values of nonstationary mean fort = 1 (solid line), t = 2 (clashed line), t = 3 (clotted line), t = 4 (clotted clashed line); (b) values of nonstationary variance fort = 1 (solid line), t = 2 (clashed line), t = 3 (clotted line), t = 4 (clotted clashed line);
(c) values of nonstationary lag 1 correlation for t
=
1 (solid line), t=
2 (clashed line), t = 3 (clotted line); (d) values of nonstationa.ry lag 2 correlation for t = 1 (solid line), t = 2 (clashed line); (e) A vera.ge forecast overlaid on average of longi tuclina.l data.; and (f) proportion of absolute values of forecast error that are 0 or 1 (solid line) and>
1 (dottedline); by communities obtained from 1000 simulations with p1 = 0.35, P2=0.20, ,8=(1,1), nonsta.tiona.ry cova.ria.tes (2.25)-(2.26)
Figure 2.1(a),(b),(c) and (d) show the stationary patterns in the rnean f..Lit,
varia
ncethe general pattern of the infections at the fifth time point. In order
to assess the
accuracy of out forecasts, we have also displayeda
graphshowing
the average ofthe
proportions of the forecast error e;t with absolute deviations 0,1and greater
than 1. Figure 2.1(f)s
hows that the deviations of magnitude
0and 1
appearto be
over 90% for the
first 50communities and around 60% for
the remaining 50 communities.
The deviations of m
agnitude
for 0 & 1 is about 60% for the last 50 communities
is caused by the large variation inthe number of infections
for these communities asseen
in Figure 2.1(e)
. For the purpose of comparing
the difference betweenth
e stationary case and nonstationary case, we constructed similar plots
in Figure 2.2for
a nonstationary case obtained fromcovaria
tes generated by using (2.25)and (2.26).
2.3 Lag 2 Mixed Binary Sum Infectious Disease Mixed Model
In Section 2.2 we have discussed the model under the
assu mption that
there is nocommunity effect.
In this section, we
will discuss the mode
l withan unobservable
community effect.
Supposetha
tfo
r the ·ithcommunity,
there exists a communityeffect
liand
li ~N( O ,
(J2). Conditionalon this
ithcommuni ty
effect /i, adynamic
mixed model for the number of infections at time t can be written as
Yi1l
1i Poi(~t.;1
),wheTe /-<l = e'";li3+r;yil
Yi2l,; L blj(PI)I,; +
di21rij=l
Yi,t.-1 Yi,t-2
Yitl
1;L blj(pi)I, , + L
b2j(nt,P2)1
1i+
ditl1i, foTt =
3, 4, ... , T, (2.27)j=l j=l
where b1j"" Bin(p1), b2j"" Bin(p2 ), with the following assumptions:
(2) d it r v Po,;(" l""it -u* p l n 1./-Li,/.-* l - p 2 n t/-Li,t-2 * ) · W l1er·e f-Li* t -- ex'.J' 3+r; '
(3)
Yi,t- I I ,,&
diilri are independent fort= 2, 3, ... , T.(4)
Yi,t- 2l
1i&
dit.lri are independent fort = 3, 4, ... , T.''~
1 • For the''i2
model to be well-defined, we require that
J-L;
2- Ptf-Li12
0, yieldingPI :S
!!:fl,,: . Similarly,'t l
the condition f-Lit - PI/.l'i,t-L - P2/-Li,t-2
2
0 leads to p1:S
''i'-::':'·'-2. Therefore, the range of P1 is0 < . (
f-Li2 f-Lit - P2f-Li,t-21 )
j'> 3 :S PI _
m,zn * , * , , OTt _ .
f-l·il f-Li,t-I
2.3.1 Basic Properties of the Proposed Mixed Model
2.3.1.1 Conditional Properties
This model is a generalization of model (2.1). Conditional on li, the model become exactly the same as model (2.1) with a different mean parameter f-i;t
=
exp(x~1
/3+ !; ).
Therefore, all the conditional properties are the same as the properties in Section 2.3.1. That is
(2.28) (2.29) (2.30)
(2.31)
where aw = 0, a;1 = 1, a·ik = p1a.i,k- l
+
p2ai,k-2, for k = 2, 3, ... , T - 1.2.3.1.2 Unconditional Properties
In order to proceed to the estimation and the forecasting, we need to find the unconditional properties of the mixed model. By using the assumption that /;
~
N(O, 0"2), we can use moment generating function or direct integration to easily find
out that
(2.32)
Then, we can find out the unconditional properties by taking expectation over 'Y.i·
The unconditional mean of 1~t is
E[Yit.]
E,, E[Yitl,i]
E[fL7 tJ
E[exp(x;t,e + /'i)]
exp(
.'E~t,B)E ( e 'Yi) exp(x:t,B)exp(o-
2/ 2)
e:r;p(x~t,B+ o-
2/ 2)
J.lil ..
The variance of Yi1. can be calculated in a similar way by conditioning on /'i
E,i VaT(Yiti , J + V aT
1;E(Yiti , J
E[JJ· 7 tl +
VadfL7 t l
exp(x; , t,e + o-
2/ 2) + E[(f.l7t )
2] -(E[J.L7t])
2fLit +
exp(2x~t,B)E(exp(2!'i))+ f-l ;t J.Lit +
e.r;p(2x~1
.,B+ 2o-
2)+
~L;t(2.33)