Model of

(1)

(2)

(3)

(4)

(5)

Estimation and Forecasting of a Lag 2 Dynamic Model for Infectious Diseases

by

©Chen Zhang

A thesis submitted to the School of Graduate Studies

in partial fulfilment of the requirements for the degree of

faster of Science

Department of M athernatics f!j Statistics l\!Iemorial University of Newfoundland

A'ugust, 2012

ewfoundland

(6)

Abstract

When a common inf

ec

tious disease is first detect

ed

in

a community,

it may quickly spread out through

a

ir, wat

er ,

public facilities and personal

contacts

. At a

given

time point, each infected individual may or may not infect other individuals in

the community. Meanwhile,

it is also possible that some individuals who

carr y the same

disease travel into

the community.

In the present work

, we

discuss estimation

and

for

ecasting of an extension to the

lag 1 longitudin

al

dynamic mod

el for correla

ted data used by Oyet & Sutradhar (2011

) for

modelling the spread

of infectious disease.

The lag 1 model only allow individuals with infection

a

t

ime point

t -

1 to cause new infections

at t

ime point

t.

Clearly, if at

time point t -

2, there is

an

individual wh

o

is still infected by the disease, it is also possible for this individual to infect

others at

time point

t .

The present model discussed in

this work allows for such a

possibility.

During the modelling, we

consider stationa

ry and nonstationary covariates. We also

ext end

the model to situations where unobservable communi

ty effect and t

he latent

co

mmunity effect is present. The regression

para

meter

/3 a

nd the parameter of latent

community effect

CJ~

are estimated by generalized quasi-likelihood

( GQL) approach.

The correlation parameters p1

and

p2

a

re estimated by using method of moments. In

each of the cases,

we examined the accuracy of

the estimates a

nd forecasts through simulation

stud

ies.

(7)

A cknow le d gem e nts

I want to express my deep

est

gratitud

e

to my supervisor, Dr. Alwell Oyet in paticular for his guidance, inspirations and

encouragement.

The

thesis

would be impossible without his intellectual, practical and financial support.

My sincere thanks also goes to Dr. Brajendra C. Sutradhar for providing me the very useful background used in

the

thesis. I have learned a lot from his several

courses

.

It is my pleasure to thank Dr. Yuan Yuan, Dr. Zhaozhi Fan, Dr. Jie Xiao, Dr. Xiaoqiang Zhao, Dr. Chunhua Ou,

and Haiyan

Yan

g for their advice and en courage-

ment. I have really enjoyed my time with them.

I would like to extend my thanks to all my friends, especially Zhen Wang, for her

constan

t encouragement and support during my masters study.

I wish to acknowledge with thanks the administrative staff and computer

tech-

nicians in the Department of Mathematics and Statistics for making their time an

d

help

available

to me.

Words fail me to

express

my appreciation

to

my parents whose dedication

,

love

and persistent co

nfidence in me, h

ave taken

the load

ofl my shoulder.

The financial support of

the

Memorial

University

of Newfoundlan

d

is gratefully

acknowledged.

Finally, I would like to thank everybody who was important to th

e successful real-

ization of thesis, as well

as expressing my ap ology t

hat I could not mention personally one by one.

(8)

6

7

(9)

2.2.1.2 The Variance . 2.2.1.3 The Covariance 2.2.1.4 The Correlation.

2.2.2 Estimation of Parameters of the Lag 2 Fixed Binary Sum In- fectious Disease Model . . . . .

2.2.2.1 GQL Estimation of

fJ

2.2.2.2 MM Estimation of Pt and P2 2.2.3 Forecasting Performance

2.2.4 Simulation Study . . . .

2.3 Lag 2 Mixed Binary Sum Infectious Disease Mixed Model . 2.3.1 Basic Properties of the Proposed Mixed Model .

2.3.1.1 Conditional Properties . . 2. 3 .1. 2 Unconditional Properties . 2.3.2 Estimation of Parameters

2.3.2.1 2.3.2.2 2.3.2.3

Estimation of

fJ .

Estimation of P1 and P2 Esitmation of (}²

2.3.3 Simulation Study . . . . .

3 Lag 2 Dynamic Binomial Sum Infectious Disease Model

3.1 Preliminaries . . . . . . . . . . . . . . . . . . 3.2 Lag 2 Binomial Sum Infectious Disease Model

11 14 16

17 17 18

20 22

31 33 33 33

35

36

37

39 43

46

47

3.2.1 Moments of Lag 2 Binary Sum Infectious Disease Model 50 3.2.1.1 The Mean . . . . . . . . . . . . . . . . . . . 50

(10)

3.2.1.2 The Variance . 3.2.1.3 The Covariance 3.2.1.4 The Correlation .

3.2.2 Estimation of the Parameters of the Lag 2 Binary Sum Infec- tious Disease Model . . . . . . .

3.2.2.1 GQL Estimation of

fJ

3.2.2.2 MM Estimation of p1 and P2 3.2.3 Forecasting Performance

3.2.4 Simulation Study

4 Concluding Remarks

Bibliography

51 54

57

58 58

59

61

63

71

74

(11)

List of Tables

2.1

Stationary Model Parameters Estimation Results.

24 2.2

Nonstationary Model Parameters Estimation Results.

25 2.3

Stationary Model Forecasting Error . .

26 2.4

N onstationary Model Forecasting Error.

27 2.5

Non-stationary

/3

Estimation for fixed p1 , p2 and 0"2

44 2.6

Nonstationary p1 and p₂Estimation for fixed f] and ^0"²

44 2.7

Non-stationary ^0"²Estimation for fixed

/3,

p1 and P2

44

2.8

Non-stationary Parameters Estimation

45

3.1

Stationary Model Parameters Estimation Results.

64 3.2

Non-stationary Model Parameters Estimation Results.

65

3.3

Stationary Model Forecasting Error ..

66

3.4

Non-stationary Model Forecasting Error.

67

(12)

List of Figures

2.1 P

lots for forecasting performance of binary sum model with stationary covariates .

. . .

.

. . .

. .

. . . . .

. . .

.

. .

.

. .

.

. . 29 2.2 P

lots for forecasting performance of binary sum

model

with

nonsta-

tionary covariates . . . . . . .

.

. .

.

. . . . . .

.

. . . .

. .

. . .

. . 30

3.1 P

lots for

forecasting performance of binomial sum model with station-

ary covariates

. . .

. . . . . . . . . . . .

.

. .

. . .

.

. .

. . . . . 68

3.2 Plots of forecasting performance of binomial

sum model with nonsta-

tionary covariates

. . .

.

. . . . . . . .

. .

. . . .

. .

. . . .

.

69

(13)

Chapter 1 Introduction

In recent years, statistical models have shown to be of great value in the modelling of infectious disease. In particular, Oyet & Sutradhar (2011) proposed a branching process with immigration type longitudinal count data model for modelling the number of infections in each community. In their paper, they assumed that one infected person may possibly infect none, one or more individuals in a small time interval.

Some immigrants with the same disease may enter the community during the same time interval which will increase the number of infected individuals in the community. Let Yit. be the number of infected individuals at time

t (t

= 2, 3, ... , T) in the community i (i

=

1, 2, ... , K), Oyet & Sutradhar (2011) model is given by

with assumptions:

Yi,t-1

Y.it =

L

^BJ(n^t>

^p) ⁺

^dit

j=l

Assumption 1. Y'il ^rvPoi(p,i1 =

exp(x;,d3) ).

(1. 1)

(14)

Assumption 2. dit ^rvPoi(P,it - Pl/-Li,t-d fort= 2, ... , T with /-Lit = exp(x~1⁽³⁾^,for all

t

= 1, ... ,

T.

Assumption 3. dit. and Yi,t.-1 are independent fort = 2, ... , T.

For model (1.1), they found that for all t = 1,2, ... ,T and k = 1,2, ... ,T-1

E(Yil,)

Cov(Yit, Yi , t- k)

/-Lit'

(

k-l )

IT n t -l ^l

l=O

ai,t-k,t-k a itt

Note that when n₁= 1, for all

t

= 1, 2, .. ,

T ,

the binomial random variable BJ(n1,

p)

will become a binary variable with the probability of infection p. The model (1.1) will then reduce to an autoregressive, of order 1, (AR(1)) type Poisson process. This reduced model implies that the infectious individuals at time

t -

1 can only infect none or one individual at time point

t.

1.1 Poisson AR(1) Process

(15)

the process can be written as

Yi .. t-1

YiL

= L

^Bj^(p)

⁺

^dⁱ^L·

j=l

(1.2)

Sutradhar (2011) (see also McKenzie (1988)) used (1.2) for

t =

1, 2, ... ,

T

to model longitudinal count data over time. The basic properties of the model arc

k d

c ( '' '\/" )

^{k f i}^it^.

p P,u an orr ^Iit, I i,L-k

=

p - . /-Lik

Mckenzie (1988) showed that the distribution of the process (1.2) is Poisson with mean p,;₁,

=

exp(x~

1

^/3)^by^using^a^lternate^probabili^tygenerating function (a.p.g.f.s). We can consider this model as a model for spread of disease for only some limited cases because it only allows each of the Yi,L-l infected individuals at time

t-

1 to infect at most one individual. When Yi,L- l is considered to be an offspring variable at timet-1 and d;t is the immigration variable, the model (1.2) represents a branching process with immigration for ]( = 1 and large

T .

This model -vvas recently considered by Sutradhar, Oyct and Gadag (2010) as a special case of a negative binomial branching process with immigration.

1.2 Generalized Quasi-Likelihood (GQL)

For the longitudinal regression setup, interest may be focused on the regression parameters for the marginal expectations of the longitudinal responses and the longi-

(16)

ing" correl

a

tion matrix based

generalized estima

ting

equ at

ion (GEE

) approach for

the

estimation of

the regression parameters and generalized quasi-li

kelihood (GQL) estimation approach. The

GEE

approach

was proposed

by

Liang & Zeger

(1

986).

It has been used

extensively

in recent years in

estimation for longitudinal count re- sponse

. However,

as

demonstrated by Crowder (1995), b

ecause of th e

uncertainty in the definition of the working

correlation

matrix, the GEE approach m

ay

in some

cases

lead to

a co

mplet

e

breakdown of the

estimation of t

he regression parameters.

Furthermore, Sutradhar

a

nd Das (1999) have d

emonstrated that even t

h

ough

the GEE approach in many situations yields

consistent estimators for t

he regression

pa-

rameters, the GEE approach may, however, produce less efficient estimates th

an the

independence

assumption

based quasi-likelihood

(

QL) or moment estimates. Sutrad- har

((2011) p

.4)

suggests t

hat

based on s

tudies by

Crowder (1995)

, Sutr

adh ar an d

Das (1999)

,

Sutradhar

(2003), and

Su

tradh ar (2010),

the

GEE approach can not

be trusted for regression

estimatio

n in discrete models such

as

longitudinal

binary or

count

data. Sutradhar (2003, Section

3) therefore suggested an efficient GQL ap-

proach for time indep

ende

nt

covari ates which

is an

extens

ion

of the QL approach

(or weighted least

squ ares approach) for the

independ

ent data

introduced by Wedd

er-

burn

(1974).

Sutradhar (2010) introduced nonstationary au

toco

rrelation structures for the cases when

covariates are time dependent, a

nd applied

the GQL approach

for

consistent a

nd

efficient estimation of the regression effects.

(17)

1.3 Motivation

From the model ( 1.1), ( 1. 2) and ( 1. 3), we can see that these lag 1 models only allow individuals with infection at time point

t -

1 to cause new infections at time point

t.

Clearly, if at time point

t -

2, there is an individual who is still infected by the disease, it is possible for this individual to infect others at time

t

as well. We

develop a model which include infections from time

t -

2. For simplicity, we first extend the Poisson AR( 1) process to a lag 2 model. Since the number of infections in each community may also be affected by unobserved community effects such as environmental pollution, we also extend the lag 2 model a little further by introducing a random variable to represent the latent community effect. However, these AR(1) type extended models will have similar limitations as a Poisson AR( 1) process when used to model the spread of infectious disease. That is, the infected individuals at time

t -

1 or

t -

2 can only infect at most one individual at time

t .

Consequently, we also consider an extension to Oyet & Sutradhar's (2011) lag 1 model for infectious disease which would be more appropriate to model the number of infections in reality.

(18)

Chapter 2 Lag 2 Dynamic Binary Sum Infectious Disease Model

2.1 Pr e limina ries

In this

chapter

, we begin to construct new models which will include the information from previous two stages. Suppose that we have K communities. First, we begin with

a

simple case, which deals with

a

n inf

ectiou s

individual who

can

only infect

a

t most one person

a

t time point ^t^.In

section

2, we consider both stationary

and nonstationary covariates assuming

that no

community effect

is present. That is,

all communities ar e assumed to be independent

of each other, but

they a

re

time-wise

(19)

simulated data.. Iu section 3,

we assume

there

is

an un observable community effect

which

can a.ff ct

our responses. For example,

this

unobservable community effect 'Yi could

be

wealth difference

or

education levels.

The

mean function

depends

on the regression effect and community effect. We

discuss t

he structures and assumption f

or t

his dynamic mixed model and obtain properties of the mixed model. Finally,

we

estimate the parameters involved in this mixed model with nonstationary covariates.

The !llain goal of this chapter is to estimate the parameter involved in the model

and to

forecast

the

number

of infections

in community at time

po

int

t.

We have found

out that the

Generalized Quasi-likelihood (GQL) method of Sutradhar (2003) for estimating

the

regression parameters of longitu

dinal response works

very

well

for estimating

the

regression

effects

of this model. In

the same year,

Sutradhar and Jowahccr

(2003)

have also used the GQL method to estimate the variance parameter

of the communi ty random effect

'Yi· 'vVc will usc this GQL approach for the estimation

of our

model parameters. We estimate the longitudinal correlation parameters

by

using

the

Method of Moments (M

f) .

Once all

parameters

are

properly

estimated, we

t

hen use

th

e information from

t

and t - 1 to examine the forecast performance of this model.

2.2 Lag 2 Fixed Binary Sum Infectious Disease Model

We begin our modelling by considering the simple case where the offspring random

varia

ble is binary

with correlation

index parameters p1 and P2

for two consecut

ive gen-

(20)

erations. 'vVe assume that K independent communities are at the risk of an infectious disease. Suppose that at initial time point, t

=

1, y;1 individuals in the ith community developed the disease where y;₁is assumed to follow a Poisson distribution with mean parameter f./,;₁

=

exp(<₁

f3 ).

Because we assume an infected individual can only effect

none or one individual for two time intervals, and also because there may be other infected individuals arriving from other communities, we shall model the number of infected people in ith community at time t as

Yi1

Poi(p .

.;1),

wheTe f-l·il = ex;¹!3

Y;t

Yi2

L

^bl^j^(P^l⁾

⁺

^d;2

j=l

}/i,t-1 }/i,t-2

L

^blj^(PJ)

⁺ L

^b2j(p2)

⁺

d;t, fort=

3, 4, ... , T ,

j=l j=l

(2.1)

where

b

_1j ^rv

Bin(p

_{1 ),}

b

₂₁^rv

Bin(p

_{2 )}. This is an extension of the AR(l) model used by Staudenmayer and Buonaccorsi (2005) and Sutradhar (2003). In model (2.1), we make the following assumptions:

(3)

lf.,~,_1 & d;t are independent for

t = 2, 3, .. . , T .

(21)

parameter must satisfy /-Li2 - ^Pli-Lil ~ 0 , yielding Pt ^::=:; ~:;~. Similarly, for ^/-Lit^-

>

0 t b t. fi d d

<

^{Lit^-PZf'i^t^{- '} Tl f th

Pti-Li,t-1 - p₂J-Li,t.- 2 _ o e sa 1s e , we nee Pt _

11;

1 • -. 1ere ore, e range of P1 is

0

< < . (

/-Li2 /-Lit - P2/-Li,t-2 1) j'

>

₃ d

1 ..

d

_ p1 _ 1mn - , , , or

t _

an zxe P2·

/-Lil ^Jl·^i,t-¹

For stationary case, if we let ^I-Lil

=

I-Li2

= · · ·

⁼ ^/-LiT

=

^J-Li^,then the range of P1 simplifies to

0 ::::; ^P1^{::::; (}1 - P2).

2.2.1 Bas ic Prope rties of the Lag 2 Fixe d Bina ry Sum Inf e c- tious Disease Mode l

2.2.1.1 The Mean

Based on the previous discussion, we know that E[Y,;1]

=

^1-Ln and E[Y,;2]

=

/-L·i2·

Then, it follows that for

t =

3, 4, ... T,

E[Y;t,j _E [ %' ^b

¹

^;(pi) ⁺ % '

^b,;^(p,)

⁺ ^d ^<rl

E [ y t

¹

^b11(P1)] ⁺ ^E [ y t.

²

^b ² ^1(P2)] ⁺ ^E ^[ ^dit ^.]

]=l ]=l

E,.,, _, E [ %' b

¹

; (Pt , ,_,] ⁺ Ê,.,,_, Ê [% ' b ,;( p,t ,_J ⁺ Ê[d,]

PtE[Yi,t-d

+

P2E[Yi,t-2J

+

^Jl·it ^- ^Pl^/-L·^i,t^{.-1 -} P2/-Li,t-2·

(2.2)

---l

I I I

(22)

-l

I

Next, we

consider some specific cases:

For t=

3,

Pl/Li2

+

^P2/Li^l

+

^/Li3^- ^Pl/Li2^- ^P^2/Li^l

/Li3·

Fort= 4,

/Li4.

By mathema tical induction

, if we have

E

[Yi,t-1] = ^f.Li,t-^land E[Yi,t-2] = ^f.Li,t^-²^,then:

E[Y;t]

Pl/Li,t- 1

+

^P^2/Li,t-2^+/Lit ^- ^P^1P^i,t-^l^- ^P^2/Li,t-2

/Lit

exp( x ; tfJ ). (2.3)

So

,

E[Y;t]

= fLit = exp(x~tf3), for all

t

= 1, 2, ... ,

T .

(23)

2.2.1.2 The Variance

The variance of this model can be derived by finding the conditional and unconditional variances. By assumptions (2) and (3), we have

(

yi 1- 1 Y; t-2

I )

Cov L ^b1 ^j ^, L ^b ^2.i ^Yi,t-1 ^, ^Yi ^,t- ² ⁼

⁰

J=l J=l

and

Cov(Yi,t,-I,dit) =

0 ,

and Cov (Yi , t-2 , dit) =

0.

Then

+!-Lit - Pl/-Li,t-1 - P2/-Li,t-2·

(2.4)

Letting O"i,t.- l,t-

1

^r^epresent^the^varia^nce^of

Yi , t- 1,

then

Eyi,t- l [V ar·(Yi t.I Yi ,t- 1, Yi,t-2 )] + V ary i,t-t ( E [ Yi t IYi,t-I, Yi, t-2])

+(!-Lit.- Pl/-Li,t.-1 - P2/-ti,t-2)]

+ /-Lit - P1 /-Li,t-1 - P2/-Li,t-2 ·

(2 .5)

(24)

Similarly, letting ^D"i^,t-2,t-2represent the variance of

Yi ,t-2,

we have

2

l

+f-li t - P1/1i,t-1- P2/1i,t-2

+

p 1ai,t-l,t-l

+f-lit-Ptf-li,t.- 1 - P2/1i,t-2

+

^p2 ^l^ai^,t-^t,t-1

+

V aT(JL;,t-1 P1

+

1'i,t-2P2 + !"it - fh J-li,t-1 - P2/1i,t-2) /1i,t-1PI

(1-

^P¹⁾

+

^Piai,t^-l^,t-1

+ p~ai,t-2,t-2

+!1·i,t.-2P2(1- P2) +f-lit - P1f-l·i,t-l - P2/1i,t-2

f-lit- (J-li,t-1- ai,t-l,t-dPi - (J-li,t-2-

ai,t- 2,t-2 )P~- (2.6)

From this formula, we can see that the variance of }'i₁ has a recursive relationship with the variance of

Yi ,t- J

and the variance of Yi,t-2 . We know that V

aT(YiL) =

^J-lil

(25)

and V

aT(Yi2) =

P,i

2

^fromour assumptions. It turns out that when

t =

3,

When

t

= 4,

/-Li4.

If

we continue doing the calculation, by mathematical induction if we assume that

O'i,t- l,i,t-1

=

^/-Li,t^-¹and O'i,t- 2,i,t-2

=

^/-Li,l.-^2,then using the formula of the variance, for t = 3, 4, ... , T, we have:

(26)

+P,i,t-2P2(1-P2) + J.kit - PlJ.ki,t.-1 - P2l-"i,t-2

(2.7)

So, V ar(Yit)

=

J.kit

=

exp(x~tf3) for all

t =

1, 2, ... , T.

2.2.1.3 The Covariance

The lag k covariance between Yit and Yi,t-k will also have a recursive relationship in terms of covariance between Yi,t-l and Yi.t-k & Yi,t.-2 and Yi,t-k· By assumption (2), we have

j=l

+Cov(dit, Yi,t-d

j=l

Yi,t -l Y:i,t-2

Cov(

L

^bt.i^(PI⁾^,^Yi,t^-^k)

⁺

^Cov⁽

L

^b2j(p2)^,^Yi^,t-d ^·

.i=l j=l

(27)

Consider only the first term of the equation,

PtCov(Yi,t- 1, Yi,t-k)

P1 CJ;,t-t ,t-k.

Similarly, we can show that:

=

P2CJi,t.-2,t-k·

Ther

efore,

we have:

(2.8)

We can use this formula to calculate the covariances for

some specific cases

. For t =

2,

by using th

e

properties of AR( 1) Poisson process, we have

(28)

Fort

=

3,

Fort= 4,

By summarizing lots of the specific covarianccs, We h

ave found a gener al

formula for

cova

riance of Yit and Yi,t-k for all

t =

2, ...

, T a

nd k = 1, 2, ... , T- 1

to be

t-k

Cov(Yit, Yi,t-k) = a;k

L

^Pt^Pr¹^f.Li,t-j+^l^-A:

⁺

^a^i,k-^1P2/.Li,t-^k,

j=l

wher

e

_a;0= 0, a;1

=

1, a;k = p 1a;,k-1

+

^p2ai,k-^2,fork = 2, 3, ... ,

T -

1.

2.2.1.4 The Correlation

(2.9)

Once we fou

nd

out

the covari ance

b

etween

Yit

and

Yi,t-k,

the

lag k correlation between Yit

a

nd Yi,t-k will

simply

b

e

(29)

We again consider some specific cases such as lag 1 and lag 2 correlations. These correlation formulas will be needed in the estimation section.

t ·-1

C ·('-/' '\/' ) - L j=l Pl~

^/-li^,t+^l^{- j}

on

^{1 ·}^it^.,^{1 i,t+}¹ ^- --'~-;::=====--

J

/-lit/-li,t+ 1

~~ j-1

C . ('-/' ,/' ) _

P1 L-.i=t P1fJ2 /1-i,t+l-:i

+

^P²^/-lit

01T 1ii,1i,t+2 - .

.J

!-lit/-li ,t+ 1

ate that when p₂

=

0, the lag k covariance and correlation will reduce to

(2.11)

(2.12)

which is the same for AR(1) based count data model considered by Sutra.clhar (2010, eqns, (15)-(16), p.178). Thus, this model is an extension to the AR(1) based count data model.

2.2.2 Estimation of Parameters of the Lag 2 Fixed Binary Sum Infectious Disease Model

2.2.2.1 GQL Estimation of (3

Following Sutradhar (2011, s c.6.4.2), let J-li

=

(Mit,P,i₂, · · · ,/Jit, · · · ,J-l;T)' be the

T

x 1 dimensional mean vector of Yi = (Y'il,

Y i2, · · · ,

Yit, · · · , YiT )'. If we assume Pt, and p₂are known, a consistent and efficient estimate of (3 can be obtained by solving

(30)

th

e so-called generali

zed quasi-likelihood (GQL)

estimating equ a

tion

f(

0

I

"\:""' ~ 8(3 1-Li

I.:;

-¹

(p)(:IJi - J.L;) =

0

i=l

(2.13)

as the true

correlation structure

1 Pit2 Pit:3 PilT

1 Pi23 Pi2T

Ci(P)

=

1 P·i,t-l,T 1

with Pi,t-k,t

=

Corr(Yi,t.-k, Y;t) for

t =

2, ... , T

and

k

=

^{1, ...}^,T

-

1. It is clear that E

[2::::::

1

~I.:i

¹

(p) (:y;- J.L;)J =

0, hence the GQL

estimate

will be

a consisten t estima

te. This GQL estimating equ

ation (2.13)

can be solved iteratively by using

the

Newton Raphson iterative equation

where ~(r-) is the value

of (3 at rth iteratio

n.

(31)

obtain a good stima te

for [3. These two

pa ra

meters can

be

consistently estimated

by

using the

meth od of moments

. Let Sil.t, Sit.,t+l,

a

nd Sit,t.+2

b

e the standardized

sample varia nce, the standardi

zed lag 1 sample autocovariance and the standardized lag

2 sample au tocovari

ance,

resp

ectively, defined as

Silt.

Sit,t+2

E[Sitt]

1

1\" T- !

L L

^CoTT(Yit^,^Yi,t.+J)/^K ^(T-

¹⁾

;=] t=l

/\" T-2

E[Sit, t+2 ] L L

CoTr(Yit, Yi,t+2)/ K (T-

2).

i=l t=l

Using first order approximation of th

e expectation of t

he ratio of two sample variance

,

we will have moment

equatio

ns

Sit,t+l

=

E

[ Sit,t . +t]

^;:::j E[S;t,t+d

=

^E[S^·

^]

S ;u S itt.

E[s ^•

• ^"

^tt]

^tt,t+!

Sit,t+2 _ E

[ Sit,t . +2 ] ,..., E [Sit,t. +2] _

E[S

]

- - - - - ,..._, - -i 'it

t+2 .

Siu Silt

E [Sitt] ,.

(32)

Then, one may obtain the estirnates for p1 and p2 by solving the marginal moment equations

Sit,t,+l E[S ] - 0 - - - itt+l -

Sitt, '

(2.15)

Sit,t+2 [ ]

-5- - - E Sit,t+2

=

0

itt

(2.16)

Due to the nonlinearity of the estimating equations (2.15) and (2.16), the solutions can be obtained by using Newton's iteration method.

(2.17)

(2.18)

where Pl(r) and

/5

₂(r) are the values of p₁and p2 at rth iteration respectively.

2.2.3 Forecasting Performance

Once all parameters of the model (2.1) have been estimated, we can carry out a one-step forecast for the purpose of planning and control. From model (2.1), it is clear that the conditional mean of

Yi

1 given

Yi ,t-l

and

Yi,t-

²will have the formula

(33)

if we define an l-step ahead forecasting function of Yi,t+l as Yit(l) = Yi,t+l E(Yi,t+tiYi,t+l-l,Yi,t+L-2), then, from (2.17)

, the one step ahe

ad forecasting function is given by

Yit( 1)

E(Yi,t+11Yit, Yi,t.-1)

fLi,t.+l + Pl(Yit - fLit) + P2(Yi,t.-l - f./,i,t-1),

(2

.20)

with

Y it(O) =

Yit· Once we have a one

step

ahead forecast, we can

calculate

the for

ecast error

e.iL ( 1) by using

eit .(1)

Yi,t+l -

Y;, .(1)

(Yi,t+l - fLi,t+d-Pl(Yit - fLit)- P2(Yi,t-l - f./,i,t-J). (2.21)

From the above equation, we noti

ced

that the conditional mean

and the mean of ei1• ( 1) is

(34)

The conditional variance of e;1

(1) lyil. ,

Yi,t-1 is given by

V ar(Yi,t+liYiL, Yi,t-d

Then, the variance of C;t(1) follows the formula

VaT (Cit ( 1))

2 2

/-Li,t+l - P1 /-Lit - P2fJ·i,t-l· (2.22)

2.2.4 Simulation Study

In this

section ,

we perform a simulation study. We

consider the

case of K

=

100

communities

and T

=

5

t

ime points. We will use fit,

t =

1,

2, 3

,

4

for the purpose of estimation and try

to

forecast the number of infections at

t = 5

in each

comrnunity.

First, we consider a

tirr1 e

independent

covariate

vector ^x:t

= (x;n,

^X;t2⁾ for the stationary

case,

where ^X;ⁿ

and x;

12

a

re

generated

as follows:

(35)

and

0, t = 1,2,3,4,5;i = 1,2, ...

,if

(2.24) 1,

t

= 1,2,3,4,5;i

=if+

^{1, ..}^.^,^K.

Then we consider a time dependent covariate vector x~L = (xitl, xit2) for studying the nonstationary case, where ^X-iⁿ and Xit 2 are generated as follows:

and

- 1,

1,

;:z;iLl = 0, 0.5,

1,

- 1,

Xit.2 = 0,

0.5,

0.5+(L-1)0.5 T

t

= 1, 2; i = 1, 2, ...

,if

t

= 3, 4, 5; i = 1, 2, ...

,if

t

= 1; 'i =

if+

^{1, ...}^,^J(

t

= 2, 3; i =

if+

^1,^...^,^J(

t

= 4, 5; i =

if +

^{1, ...}^,^J(

t =

1, 2, 3, 4, 5; i

=

1, 2, ... , ~

- 1· . - /( 1 ³¹(

t -

'z- 4

+ ' . .. ,

4

. [( 3I<

t

= 2.3; z = 4

+

1, ... , 4 4 5. . [( 1 3[(

t = ' ' z =

4

+ ' . .. ,

4

. 3[(

t

= 1,2,3,4,5;z = ₄

+

^1,^...^,^K.

(2.25)

(2.26)

Even though we have only two covariates in simulation, however, in practical cases,

(36)

factors

, such as geogra

phic

locations and policy restrictions, or, they can also be time

dependen

t

factors,

such as economic situa

tions

and age.

From t

he assumptio

ns,

we

know

that

0 <

p

<

min

( l:!:i1.

J.L;,-pzJ.Li,t-2

1)

j"or

t >

3. Since P·

i

s the lag 2

- I - J.Lil' J.Li,t-1 ' ' - 2

correlation,

we

can naturally assume

that

p2

<

p1 .

So

,

we

choose a small P2,

then

compute t

he upper bond

p*

=

^min

(l:!:i1. ,,. ;,-p

²¹

~i. t-z 1).

Then we use th

is

p₂ and

) JLil ) JLi,l-1 ) )

P1 =

^{p; -}

0.1

or p1

=

p~-

0.2 as t he

true values of P1 a

nd P2 for

the simulation. Using suitable

initial

values

of /3,

p1 a

nd p

2,

we

solve

the ma rginal

estimating equation for

/3 by using Newton Raphson

algo

rithm. Then

by

using

the

initial

valu s of p1 & p2

and

the estimat

e of

/3 obtained from previous

step,

we

obtain estimates

for

p1 & P2

by using moment es timating equat ions. We

use estimated p1 ,p2 to estimate

/3

again,

then

use this

new (3

and repeat the above steps to get the

improved estimates of p

1 and p2.

This

ite

ra tive

step continues

until

convergence. The table below shows estimated

/3

a

nd

p1,

p

2

from 1000

simulations.

Table 2.1: Stationary Model Parameters Estimation Results.

P

ara

meter Esti mation

(3

_PI

P 2 {3

SE8 ^P1 SEPI

P2

^SEP2 (0.5,

1.0)

0.40

0.10 (0.508, 0.994)

(0.224, 0.123)

0.388 0

.060 0.106 0.059 (1.0,

1.0) 0.40 0.10

(1.000,

1.000) (0.

260,

0.136

) 0.388

0

.060

0.1 02 0.057

(0.5,

1.0) 0.35 0.20

(0.517,

0.992) (0.224, 0.128)

0.345 0.055

0.178 0.064

(1.0,

1.0) 0.35 0.20

(1.018,

0.990)

(0.257,

0.138)

0.350

0.

057

0.176 0.064

(0.5,

1.0) 0.60 0.20

(0.522,

0.987)

(0.283,

0.155

)

0

.571

0.

057

0.177 0.070

(1.0,

1.0) 0.60 0.20

(1.052,

0.974)

(0.304,

0.162) 0.

575

0.059 0.1

70

0.067

(0.5,

1.0) 0.75 0

.20 (0.524, 0.991) (0.320, 0.178) 0.705

0.051 0.

163

0.040

(37)

Table 2.2: onstationary Model Parameters Estimation Results.

Parameter Estimation

/3

PI P2

s

^SEfi

^P1

^SEP^I ^P2 ^SEP2

(0.5, 1.0) 0.40 0.10

(0.500,

0.997) (0.070, 0.131) 0.393 0.074 0.128 0.080

(1.0,1.0)

0.40 0.10

(1.003

, 0.995)

(0.070, 0.123)

0.393 0.084 0.145 0.091

(0.5,

1.0) 0.35 0.20

(0.502,

0.996)

(0.072, 0.135) 0.362 0.073 0.174 0.084 (1.0,1.0)

0.35 0.20

(1.002,

0.992)

(0.073,

0.128) 0.366 0.075 0.175 0.093

(0.5

, 1.0) 0.60 0.20

(0.497,

1.000) (0.069,

0.131

) 0.587

0.078 0.197 0.104 (1.0

,1.0) 0.60 0.20

(1.002

, 0.995) (0.067,

0.119)

0.586 0.081 0.210 0.119

(0.5

, 1.0) 0.75 0.20

(0.500,

0.994) (0.064,

0.124)

0.725 0.072 0.189 0.102

(1.0,1.0)

0.75 0.20

(1.002

, 0.996) (0.067,

0.117) 0.723 0.078 0.209 0.129 (0.5, 1.0)

0.60 0.30

(0.495,

1.004)

(0.070, 0.140

)

0.593 0.075 0.264 0.107 (1.0,1.0)

0.60 0.30 (1.002, 0.994) (0.071,

0.125)

0.596 0.082

0.266

0.120

(0.5,

1.0) 0.45 0.40 (0.498, 0.998) (0.071, 0.142) 0.473

0.067 0.307 0.091 (1.0

, 1.0) 0.45 0.40

(0.996

, 1.003) (0.072,

0.131

)

0.475 0.076

0.288 0.105

From Table 2.1 and Table 2.2, we can see that the estimates for

f3

are very close

to the

true value of

f3

irrespective

of the combinations of parameters.

However,

for

some combinations

of p

₁and

p

2,

the estimates for p

₁and p2 may

not

be as accurate

as the

others,

for

instance, under

stationary

case,

when true

p2 =

0.20,

according

to the rho

₁restriction from our model assumption, rho₁should

be less

the 1 - p2 =

0.8.

If the true combination

is

PI = 0.75

and

PI = 0.20 ,

the estimates are

less

close to the

true

values

compare

to other

combinations.

Similar result

ha

ppens when true

p1 =

0.45 or 0.55,

p₂ = 0.40.

The

upper boundary for p2

is 0.

5

because

we

h

ave

assumed

that p1

2: p

₂.

These estimates arc less close to the

true values than others because

p

1

or

p₂are close to

its boundary.

For the purpose of examining the forecast performance of the model (2.1)

in forecasting the future infections, we use

the parameter

estimates obtained by using only

(38)

function in Section 2.2.3 to compute a one-step ahead forecast of the fifth observation.

The sum of squares of the forecast error as well as the variance of the forecast error for these 100 communities were calculated for each simulation run. We denote the average sum of squares of the forecast errors and the average variance of the forecast error by ASS and AV respectively. The results.summarized from 1000 simulations, are reported in Table 2.3 and Table 2.4.

Table 2.3: Stationary Model Forecasting Error.

(3 P1 P2

ASS AV

(0.5, 1.0) 0.40 0.10 1.658 1.743 (1.0, 1.0) 0.40 0.10 2.153 2.122 (0.5, 1.0) 0.35 0.20 1.802 1.798 (1.0, 1.0) 0.35 0.20 2.159 2.134 (0.5, 1.0) 0.60 0.20 1.298 1.353 (1.0, 1.0) 0.60 0.20 1.561 1.605 (0.5, 1.0) 0.75 0.20 0.872 1.007 (1.0, 1.0) 0.75 0.20 1.036 1.189 (0.5, 1.0) 0.60 0.30 1.190 1.288 (1.0, 1.0) 0.60 0.30 1.431 1.543 (0.5, 1.0) 0.45 0.40 1.381 1.456 (1.0, 1.0) 0.45 0.40 1.658 1.743

(39)

Table 2.4: Nonstationary Model Forecasting Error.

(3 P1

P2 ASS AV

(0.5, 1.0) 0.40 0.10

2.726 2.654 (1.0,

1.0) 0.40

0

.10 4.539 4.360

(0.5,

1.0) 0.35 0.20 2.775 2.695

(1.0,

1.0) 0.35 0.20

4.623 4.422 (0.5,

1.0) 0.60 0.20 2.103

2.047

(1.0, 1.0) 0.60

0.20 3.477 3.368

(0.5, 1.0) 0.75

0.20

1.490 1.534

(1.0 ,

1.0) 0.75 0.20

2.5

18

2.513 (0

.5,

1.0)

0.60 0.30 1.993 1.971

(1.0, 1.0)

0.60

0.30 3.329

3.326 (0.5, 1.0) 0.45 0.40

2

.329 2.298 (1.0

, 1.0)

0.45

0.40 3.906 3.831

From Table

(2.3

) and Table (2

.4)

, we see t

ha

t the value of average sum of squa res

and

the

average

variance

of the

forecast errors

are very close to each other

f or all

different combinations of p arameters. This indi cates

that the average sum of

squa res

of

the

for

ecast

errors can

closely estimate the average

variance of the forecast errors

and a

satisfactory

p erformance of the estimation of the

parameters

of the model. It

could a

lso

be seen that t

he average variance and

sum of squa res of forecast errors for

the

nonstationary model a re generally

larger than

the

stationary

case which

means

we

have

more accurate estima tes for the pa ramete rs

in stationa

ry case.

This makes sense b

ecause in s ta tionary case, the covariates does not chan ge with

respect to time

t .

Hence less varia

tion was introd

uced

into the modelling.

Note

th a t for

(3

=

(0.5, 1),

the a verage

variance

of for ecasting error is sm aller

than that of (3 = (

1, 1). This is

because t

he

mean function

contains (3

and th e forecasting varia nce

is

a

function

of

means.

In our

particular se tup, (3 = (0.

5, 1) will lead to a smaller sum of means than

(40)

(3

=

(1, 1). For example, in stationary case, we have x~,t+l

=

x~t

=

x;,~,_

1 =

x~ and f.Li,t+l

=

^f.L;t

=

^f.Li^,t-^l

=

~ti· If we let ~tio, f.L·ib and represent the means when (3

=

(0.5, 1) and (3 = (1, 1) respectively. Var(eia(1)) and Var(eib) are defined by (2.22) using Jl.ia and f.Lib respectively. We also let AVa and AVb be the average variance of forecast errors when (3 = (0.5, 1) and (3 = (1, 1) respectively, Then

Considering the stationary covariate structure as (2.21) and (2.22), we find out that

0, and we know that p~

:S

p₁and p~

:S

p2 since our p1 and p2 are numbers within interval 0 to 1. So 1 -

P I -

p~

2:

0. Therefore,

Therefore, under stationary case, we expect to see that the average variance of forecasting error is smaller when (3

=

(0.5, 1) than when (3

=

(1, 1), regardless the combination of p₁and p_{2 .} This situation still holds for nonstationary case in our particular setup.

(41)

~ l

^I

^I d

^I

20 40 60 80 100 0 20 40 60 80 100

(a) (b)

~ ^--- .-.-.--.·.·---·- ^----

^.^.^----.^{.. -}^.·--

^---

^--···

^- ^---

^·-..^···^···-

^---

^{·-·--- ..}

^-

^-. ¹

. ~ ^--- ^--- ^- ^---- ^--- ^- ^--

^I

1,(') 0 - - - -·- - - - - - - - - - -

<'")

ci --·---·- - - - · --· - - - - · - - - .. N

6

~ ~

ci I I I I I I o I I I I I I

20 40 60 80 100 20 40 60 80 100

(c) (d)

~ ~---

~

⁰

l

~~~~,~~~-,----.1---~1

- . . ... ^~ ¹

20 40 60 80 100 0 20 40 60 80 100

(e) (f)

Figure 2.1: A plot of (a) values of stationary mean fort = 1 (solid line), t = 2 (dashed line), t = 3 (dotted line), t = 4 (dotted dashed line); (b) values of stationary variance for t = 1 (solid line), t = 2 (dashed line), t = 3 (clotted line), t = 4 (dotted dashed line); (c) values of stationary lag 1 correlation for t= 1 (solid line), t = 2 (dashed line), t = 3 (clotted line); (d) values of stationary lag 2 correlation for t= 1 (solid line), t = 2 (dashed line); (e) Average forecast overlaid on average of longitudinal data; and (f) proportion of absolute values of forecast error that are 0 or 1 (solid line) and

>

1 (clotted line); by communities obtained from 1000 simulations with p₁ = 0.35, P2=0.20, ;3=(1,1), stationary covariates (2.23)-(2.24)

(42)

.

^'It

~

^{· -}^·^{- · -}^·^-^·

^I ^.

^"'t

~

- - - - · - - - ,

^I

...... -.-. --~ ------.-----.-.-.-.----- ....... -~.-.-.---.---.---.-.-.-.-

: • . . -. ... .. . ~._ ;~-... -,.L. ... -... ~ : '· . . . :- - j"'."4'.&.".L" ... ~

I ~ Î Î Î Î Î ^: Î Î Î Î

0 20 40 60 80 100 0 20 40 60 80 100

(a) (b)

0 20 40 60 80 100 0 20 40 60 80 100

(c) (d)

]~~~~I

I I I I I I

. ⁰^'(f'

~

^~~^{• • • •} ^•- - · · · ^I

;

^{. .}

^·

... · ... •.················~

o I I I I I I

0 20 40 60 80 100 0 20 40 60 80 100

(e) (f)

Figure 2.2: A plot of (a) values of nonstationary mean fort = 1 (solid line), t = 2 (clashed line), t = 3 (clotted line), t = 4 (clotted clashed line); (b) values of nonstationary variance fort = 1 (solid line), t = 2 (clashed line), t = 3 (clotted line), t = 4 (clotted clashed line);

(c) values of nonstationary lag 1 correlation for t

=

1 (solid line), t

=

2 (clashed line), t = 3 (clotted line); (d) values of nonstationa.ry lag 2 correlation for t = 1 (solid line), t = 2 (clashed line); (e) A vera.ge forecast overlaid on average of longi tuclina.l data.; and (f) proportion of absolute values of forecast error that are 0 or 1 (solid line) and

>

¹^(dot^ted

line); by communities obtained from 1000 simulations with p₁= 0.35, P2=0.20, ,8=(1,1), nonsta.tiona.ry cova.ria.tes (2.25)-(2.26)

Figure 2.1(a),(b),(c) and (d) show the stationary patterns in the rnean ^f..Lit,

varia

nce

(43)

the general pattern of the infections at the fifth time point. In order

to assess the

accuracy of out forecasts, we have also displayed

a

graph

showing

the average of

the

proportions of the forecast error e;t with absolute deviations 0,1

and greater

than 1. Figure 2.1(f)

s

hows that the deviations of m

agnitude

0

and 1

appear

to be

over 90% for th

e

first 50

communities and around 60% for

the r

emaining 50 communities.

The deviations of m

agnitude

for 0 & 1 is about 60% for the last 50 commu

nities

is caused by the large variation in

the number of infections

for these communities as

seen

in Figure 2.1

(e)

. For the purpose of comp

aring

the difference between

th

e stationary case and nonst

ationary case, we constructed similar plots

in Figure 2.2

for

a nonstationary case obtained from

covaria

tes generated by using (2.25)

and (2.26).

2.3 Lag 2 Mixed Binary Sum Infectious Disease Mixed Model

In Section 2.2 we have discussed the model under the

assu mption that

there is no

community effect.

In this section, w

e

will discuss th

e mode

l with

an unobservable

community effect.

Suppose

tha

t

fo

r the ·ith

community,

there exists a community

effect

li

and

li ~

N( O ,

(J2). Conditional

on this

ith

communi ty

effect /i, a

dynamic

(44)

mixed model for the number of infections at time t can be written as

Yi1l

₁ⁱ Poi(~t.;

1

⁾^,^wheTe^/^-^<l⁼^e'";^li3+r;

yil

Yi2l,; L blj(PI)I,; +

di21ri

j=l

Yi,t.-1 Yi,t-2

Yitl

_1;

L ^blj(pi)I, ^, + L

^b2j(nt,

^P2)1

1ⁱ

+

^ditl1^i, foT

t =

3, 4, ... , T, (2.27)

j=l j=l

where b1j"" Bin(p₁), b₂j"" Bin(p_{2 ),}with the following assumptions:

(2) d _it_{r v} Po,;(_" l""it -u* p l n 1./-Li,/.-* l - p 2 n t/-Li,t-2 * ) · W l1er·e f-Li* t -- ex'.J' ³+r; '

(3)

Yi,t- I I ,,&

diilri are independent fort= 2, 3, ... , T.

(4)

Yi,t- 2l

₁

i&

dit.lri are independent fort = 3, 4, ... , T.

''~

¹^• For the

''i2

model to be well-defined, we require that

J-L;

₂- Ptf-Li₁

2

0, yielding

PI :S

^!!:fl^,,:^.Similarly,

't l

the condition f-Lit - PI/.l'i,t-L - P2/-Li,t-2

2

0 leads to p1

:S

''i'-::':'·'-². Therefore, the range of P1 is

0 < . (

f-Li2 f-Lit - P2f-Li,t-2

1 )

j'

> ₃ :S PI _

m,zn * , * , , OT

t _ .

f-l·il f-Li,t-I

(45)

2.3.1 Basic Properties of the Proposed Mixed Model

2.3.1.1 Conditional Properties

This model is a generalization of model (2.1). Conditional on li, the model become exactly the same as model (2.1) with a different mean parameter f-i;t

=

exp(x~

1

^/3

+ !; ).

Therefore, all the conditional properties are the same as the properties in Section 2.3.1. That is

(2.28) (2.29) (2.30)

(2.31)

where ^aw= 0, a;1 = 1, a·ik = p1a.i,k- l

+

p2ai,k-2, for k = 2, 3, ... , T - 1.

2.3.1.2 Unconditional Properties

In order to proceed to the estimation and the forecasting, we need to find the unconditional properties of the mixed model. By using the assumption that /;

~

N(O, 0"2), we can use moment generating function or direct integration to easily find

out that

(2.32)

(46)

Then, we can find out the unconditional properties by taking expectation over 'Y.i·

The unconditional mean of 1~t is

E[Yit.]

E,, E[Yitl,i]

E[fL7 tJ

E[exp(x;t,e + /'i)]

exp(

.'E~t,B)

E ( e 'Yi) exp(x:t,B)exp(o-

²

/ 2)

e:r;p(x~t,B

+ o-

²

/ 2)

J.lil ..

The variance of Yi₁^.can be calculated in a similar way by conditioning on /'i

E,i VaT(Yiti , J + V aT

1

;E(Yiti , J

E[JJ· 7 tl +

^V

adfL7 t l

exp(x; , t,e + o-

²

/ 2) + E[(f.l7t )

²^{] -}

(E[J.L7t])

²

fLit +

exp(2x~t,B)E(exp(2!'i))

+ f-l ;t J.Lit +

e.r;p(2x~

1

^.,B

+ 2o-

²)

+

~L;t

(2.33)

Model of

Estimation and Forecasting of a Lag 2 Dynamic Model for Infectious Diseases

Abstract

ec

ed

a community,

a

er ,

contacts

given

the community. Meanwhile,

carr y the same

the community.

, we

and

ecasting of an extension to the

al

el for correla

) for

of infectious disease.

a

t

t -

at t

t.

time point t -

an

o

others at

t .

this work allows for such a

consider stationa

ext end

ty effect and t

co

para

/3 a

community effect

are estimated by generalized quasi-likelihood

and

a

each of the cases,

the estimates a

stud

A cknow le d gem e nts

est

e

encouragement.

thesis

the

courses

and Haiyan

g for their advice and en courage-

constan

tech-

d

available

express

to

,

and persistent co

ave taken

ofl my shoulder.

the

University

d

acknowledged.

e successful real-

as expressing my ap ology t

Contents

6

fJ

fJ .

20 22

35

37

46

47

fJ

57

^p) ⁺

IT n t -l ^l