Maximum likelihood estimation for stochastic diﬀerential equations with random eﬀects

(1)

Maximum likelihood estimation for stochastic differential equations with random effects

Running headline: Mixed stochastic differential equations

Maud Delattre

¹

, Valentine Genon-Catalot

²

and Adeline Samson

^2∗

1 Laboratoire Mathématiques, Université Paris Sud, France

2 UMR CNRS 8145, Laboratoire MAP5, Université Paris Descartes, Sorbonne Paris Cité, France

Abstract

We consider N independent stochastic processes (Xi(t), t ∈[0, Ti]), i= 1, . . . , N, defined by a stochastic differential equation with drift term depending on a random variableφi. The distribution of the random effect φi depends on unknown parameters which are to be estimated from the continuous observation of the processesXi. We give the expression of the exact likelihood. When the drift term depends linearly on the random effect φi and φi has Gaussian distribution, an explicit formula for the likelihood is obtained. We prove that the maximum likelihood estimator is consistent and asymptotically Gaussian, whenTi =T for all iandN tends to infinity. We discuss the case of discrete observations. Estimators are computed on simulated data for several models and show good per- formances even when the length time interval of observations is not very large.

Key Words: Asymptotic normality, consistency, maximum likelihood estimator, mixed-effects models, stochastic differential equations.

1 Introduction

Statistical analysis of data collected over time on a series of subjects requires to account for both the intra-individual variability, i.e. the variability occurring

∗Corresponding author

Email addresses: [email protected] (Maud Delattre), valentine.genon- [email protected] (Valentine Genon-Catalot), [email protected] (Ade- line Samson)

(2)

within the dynamics of each individual over time, and the variability existing between subjects. Modeling of such data goes through mixed-effects models which are very popular in the biomedical field (Davidian and Giltinan, 1995; Pinheiro and Bates, 2000). In mixed-effects stochastic differential equations (SDEs), the model for each individual set of data is given by a SDE, thus modeling the intra- individual variability in the data, and the parameters of each individual SDE are random variables, thus handling the variability between subjects. A major area of application for mixed effects SDEs is in pharmacokinetic/pharmacodynamic modeling, where they have been introduced as an alternative to the classical ODE-based models (Ditlevsen and De Gaetano, 2005b; Overgaardet al., 2005;

Donnet and Samson, 2008). SDEs with random effects have also been proposed for neuronal data (Picchiniet al., 2008).

Maximum likelihood estimation (MLE) of the parameters of the random effects, also called population parameters, is generally not straightforward as the likelihood function can rarely be expressed in a closed-form. Approximations of the likelihood have been proposed, based on linearization (Beal and Sheiner, 1982) or Laplace’s approximation (Wolfinger, 1993). Alternative methods have also been developed such as the SAEM algorithm (Kuhn and Lavielle, 2004). Max- imum likelihood estimation in SDEs with random effects has been tackled in a few papers. Ditlevsen and De Gaetano (2005a) show that in the specific case of a mixed-effects Brownian motion with drift, the likelihood function can be explicitly derived, leading to explicit parameters estimators. For general mixed SDEs, approximations of the likelihood have been proposed (Picchini et al., 2010; Picchini and Ditlevsen, 2011).

For theoretical properties of the MLE in the context of mixed effects models, the main contribution to our knowledge is due to (Nie and Yang, 2005; Nie, 2006, 2007) and covers the asymptotic properties of the MLE for the population parameters under several asymptotic frameworks, depending on whether the number of subjects and/or the number of observations per subject goes to infinity. Nie’s results are nevertheless based on a series of technical assumptions, which may not be easy to check.

In the present work, we focus on mixed-effects SDEs with drift term depending on random effects and diffusion term without random effects. More precisely, we considerN real valued stochastic processes(Xi(t), t≥0),i= 1, . . . , N, with dynamics ruled by the following SDEs:

d Xi(t) =b(Xi(t), φi)dt+σ(Xi(t))dWi(t), Xi(0) =xⁱ, i= 1, . . . , N, (1)

(3)

where (W1, . . . , WN) are N independent Wiener processes, φ1, . . . , φN are N i.i.d. R^d-valued random variables, (φ₁, . . . , φ_N) and (W₁, . . . , W_N) are independent andxⁱ, i = 1, . . . , N are known real values. The diffusion coefficient σ : R → R is a known real-valued function. The drift function b(x, ϕ) is a known function defined onR×R^d and real-valued. Each process (X_i(t))represents an individual and the random vectorφi represents the random effect of individuali. We assume that the random variablesφ₁, . . . , φ_N have a common distributiong(ϕ, θ)dν(ϕ)onR^d, whereθis an unknown parameter belonging to a setΘ⊂R^p and, for all θ,g(ϕ, θ)is a density w.r.t. a dominating measure ν onR^d. Below, we denote by θ0 the true value of the parameter. The process (Xi(t))is continuously observed on a time interval[0, Ti]withTi>0given. Our aim is to estimate the parametersθ of the density of the random effects from the observations{Xi(t),0 ≤t ≤Ti, i = 1, . . . , N}. We introduce assumptions ensuring that the models (1) are well-defined together with the exact likelihood function. Then, we focus on the special case of one-dimensional linear Gaussian random effects,i.e. b(x, φi) =φib(x), wherebis a known real function andφiis Gaussian. It turns out in this case that the likelihood has a simple and explicit expression depending onθ and the sufficient statistics:

Ui= Z T_i

0

b(Xi(s))

σ²(Xi(s))dXi(s), Vi= Z T_i

0

b²(Xi(s))

σ²(Xi(s))ds, i= 1, . . . , N.

For the asymptotic study, the main difficulties are encountered to obtain specific moment properties of the random variables(Ui, Vi), and to prove identifiability.

We prove the consistency and the asymptotic normality of the exact MLE asN tends to infinity and give the expression of the Fisher information matrix. The results are extended to Gaussian multidimensional linear random effects. The present likelihood theory is derived from continuous observations of theX_i⁰s. In practice, one rather disposes of discrete observations on the time interval[0, Ti].

Thus we suggest to discretize the r.v.’s Ui, Vi in the expression of estimators and we show that under conditions on the discretization step, and thus on the number of observations per subject, the asymptotic properties of the estimates based on continuous observations are preserved. Our simulations are presented within the framework of discretely observed stochastic processes.

The paper is organized as follows. Section 2 introduces the notations and assumptions. In section 3, we make the likelihood function explicit. In sections 4 and 5, we show that the strong consistency and the asymptotic normality of

(4)

the MLE when the model includes a Gaussian one-dimensional and a Gaussian multi-dimensional random effect respectively. The impact of discretization on the estimators is detailed in section 6. A simulation study is presented in section 7. Concluding remarks are given in section 8. Proofs are gathered in appendix.

2 Model, assumptions and notations

Consider N real valued stochastic processes (X_i(t), t≥0), i= 1, . . . , N, with dynamics ruled by (1). The processes(W1, . . . , WN)and the r.v.’s φ1, . . . , φN

are defined on a common probability space(Ω,F,P). We introduce assumptions ensuring that the processes (1) are well defined and allowing to compute the exact likelihood of our observations. Consider the filtration(Ft, t≥0) defined by Ft = σ(φ_i, W_i(s), s ≤ t, i = 1, . . . , N). As Ft = σ(W_i(s), s ≤ t)∨ F_tⁱ, with F_tⁱ = σ(φi, φj, Wj(s), s ≤ t, j 6= i) independent of Wi, each process Wi

is a (F_t, t≥ 0)-Brownian motion. Moreover, the random variablesφ_i are F₀- measurable.

(H1) (i) The function(x, ϕ)→b(x, ϕ)isC¹onR×R^d, and such that:

∃K >0,∀(x, ϕ)∈R×R^d, b²(x, ϕ)≤K(1 +x²+|ϕ|²), (ii) The functionσ(.)isC¹onRand

∀x∈R, σ²(x)≤K(1 +x²).

Under (H1), for allϕ, the stochastic differential equation

d X_i^ϕ(t) =b(X_i^ϕ(t), ϕ)dt+σ(X_i^ϕ(t))dWi(t), X_i^ϕ(0) =xⁱ, (2) admits a unique strong solution process (X_i^ϕ(t), t ≥ 0) adapted to the filtration (Ft, t ≥ 0). Let C(R⁺,R) be the space of continuous functions on R⁺, endowed with the Borelσ-field associated with the topology of uniform convergence on compact sets. The distribution of X_i^ϕ(.) is uniquely defined on this space. Moreover, asxⁱ is deterministic, for all integerk, allϕand allt≥0,

sup

s≤tE[X_i^ϕ(s)]^2k <+∞. (3)

For the observed processes, we have the following result.

Proposition 1. Under (H1), for i = 1, . . . , N, equation (1) admits a unique solution process (Xi(t), t ≥ 0), adapted to the filtration (Ft, t ≥ 0). Given

(5)

that φi = ϕ, the conditional distribution of (Xi(t), t ≥ 0) is identical to the distribution of the process (X_i^ϕ(t), t ≥ 0). The processes (X_i(t), t ≥ 0), i = 1, . . . , N are independent.

If fork≥1,E|φi|^2k<∞, then for all T >0,sup_t∈[0,T]E[Xi(t)]^2k<∞.

Remark 1. Assumption (H1) can be weakened but the proof of Proposition 1 would be longer. We need that for all fixed ϕ, equation (2) admits a unique strong solutionX_i^ϕ and the measurability of the mappingϕ→X_i^ϕ(·).

3 Likelihood

We introduce the canonical model associated with the observations. Let CT_i

denote the space of real continuous functions(x(t), t∈[0, Ti])defined on[0, Ti], endowed with the σ-field CT_i associated with the topology of uniform convergence on[0, T_i]. Under (H1), we introduce the distributionQ^x_ϕⁱ^,Tⁱ on(C_T_i,CTi) of(X_i^ϕ(t), t∈[0, Ti])given by (2). OnR^d×CT_i, letP_θⁱ=g(ϕ, θ)dν(ϕ)⊗Q^x_ϕⁱ^,Tⁱ denote the joint distribution of(φ_i, X_i(.))and letQⁱ_θdenote the marginal distribution of(Xi(t), t∈[0, Ti])on(CT_i,CT_i). From now on, we denote by(φi, Xi(.)) the canonical process ofR^d×CT_i. Let us consider the following assumptions.

(H2) Fori= 1, . . . , N, and for allϕ, ϕ⁰ Q^x_ϕⁱ^,Tⁱ

Z Ti

0

b²(X_i^ϕ(t), ϕ⁰)

σ²(X_i^ϕ(t)) dt <+∞

!

= 1.

(H3) Forf =_∂ϕ^∂b

j, j= 1, . . . , d,there existc >0and some γ≥0 such that sup

ϕ∈R^d

|f(x, ϕ)|

σ²(x) ≤c(1 +|x|^γ).

Proposition 2. Assume (H1)-(H3) and let ϕ₀∈R^d.

• The distributionsQ^x_ϕⁱ^,Tⁱ are absolutely continuous w.r.t. Qⁱ:=Q^x_ϕⁱ^,Tⁱ

0 with density:

dQ^x_ϕⁱ^,Tⁱ

dQⁱ (X_i) =L_T_i(X_i, ϕ) =e^`^Ti^(Xⁱ^,ϕ) with `_T_i(X_i, ϕ) = Z Ti

0

b(X_i(s), ϕ)−b(X_i(s), ϕ₀)

σ²(Xi(s)) dX_i(s)−

Z Ti

0

b²(X_i(s), ϕ)−b²(X_i(s), ϕ₀) 2σ²(Xi(s)) ds, where(X_i=X_i(s), s≤T_i)denotes the canonical process of C_T_i given by (Xi(s)(x) =x(s), s≤Ti).

(6)

• The function ϕ → LT_i(Xi, ϕ) admits a continuous version Qⁱ-a.s. and (X_i, ϕ)→L_T_i(X_i, ϕ)is measurable on (C_T_i×R^d,C_T_i⊗ B(R^d)).

Remark 2. • For a given drift functionb(x, ϕ), it is often possible to check directly that ϕ → LT_i(X, ϕ) is continuous even if (H3) is not fulfilled.

Then, the joint measurability follows.

• The advantage of knowing σ(·) is that we can consider continuous time observations. This yields simple expressions of the conditional likelihood given the random effect (see below equation (4)).

To simplify notations, we assume that there is one valueϕ0such thatb(x, ϕ0)≡ 0. Thus, we can choose the dominating measure Qⁱ = Q^x_ϕⁱ^,Tⁱ

0 which is the distribution of (2) with null drift. FormulaLT_i(Xi, ϕ)simplifies into:

LT_i(Xi, ϕ) = exp Z T_i

0

b(Xi(s), ϕ)

σ²(Xi(s))dXi(s)−1 2

Z T_i

0

b²(Xi(s), ϕ) σ²(Xi(s)) ds

! . (4)

By independence of the individuals,P_θ=⊗^N_i=1P_θⁱis the distribution of(φ_i, X_i(.)), i= 1, . . . , N on the product spaceQN

i=1R^d×CT_i andQθ=⊗^N_i=1Qⁱ_θ is the distribution of the whole sample(Xi(t), t∈[0, Ti], i= 1, . . . , N)onC=QN

i=1CT_i. We now compute the density ofQθ w.r.t. Q=⊗^N_i=1Qⁱ. We denote byEθ the expectation w.r.t. Pθ.

Proposition 3. Assume (H1)-(H3).

• The probability measureQⁱ_θ admits a density w.r.t. Qⁱ equal to:

dQⁱ_θ dQⁱ(X_i) =

Z

R^d

L_T_i(X_i, ϕ)g(ϕ, θ)dν(ϕ) :=λ_i(X_i, θ).

• The distribution Qθ on C=QN

i=1CT_i admits a density given by dQθ

dQ(X₁, . . . , X_N) =

N

Y

i=1

λ_i(X_i, θ).

• The exact likelihood of the whole sample(Xi(t), t∈[0, Ti], i= 1, . . . , N)is Λ_N(θ) =

N

Y

i=1

λ_i(X_i, θ). (5)

Remark 3. • It is worth stressing that to obtain the likelihood function λi(Xi, θ), we just take the SDE likelihood conditional on the random effects

(7)

(4), multiply it with the density of the random effects, and integrate out with respect to the random effects.

• Another point is to be stressed. If the drift term depends on an additional fixed effect β, i.e. b(x, ϕ) = b(x, β, ϕ), then the SDE conditional likelihood given the random effect is L_T_i(X_i, β, ϕ) where b(x, ϕ) is replaced by b(x, β, ϕ). Afterwards, the marginal likelihood λi(Xi, β, θ) is obtained analogously:

λi(Xi, β, θ) = Z

R^d

LT_i(Xi, β, ϕ)g(ϕ, θ)dν(ϕ).

On this general expression, if we can check Nie (2006)’s assumptions, then weak consistency and asymptotic normality of the MLE will follow. However, these assumptions, even when the random effects are Gaussian, are not easy.

4 Gaussian one-dimensional linear random effects

In this section, we consider model (1) with driftb(x, ϕ) =ϕb(x)where ϕ∈R, b(.), σ(.)are known functions. In this case, we simplify (H1)-(H2) and assume thatb, σareC¹ and have linear growth, which implies Proposition 1. And, we assume that RTi

0 b²(Xi(s))/σ²(Xi(s))ds < ∞, Q^x_ϕⁱ^,Tⁱ-a.s. for all ϕ. As ϕ → L_T_i(X_i, ϕ)is obviously continuous, (H3) is not required. We also assume that, for i = 1, . . . , N, Ti = T, xⁱ = x so that the observed processes (Xi(t), t ∈ [0, T]), i= 1, . . . , N arei.i.d.. Let us introduce:

Ui= Z T

0

b(Xi(s))

σ²(X_i(s))dXi(s), Vi= Z T

0

b²(Xi(s))

σ²(X_i(s))ds, (6) which are well defined under (H1)-(H2). Hence,

λi(Xi, θ) = Z

R

g(ϕ, θ) exp (ϕUi−ϕ²

2 Vi)dν(ϕ). (7)

4.1 Exact likelihood

We propose here to model the random effects distribution by a Gaussian distri- butionN(µ, ω²), and setθ= (µ, ω²)∈R×(0,+∞)for the unknown parameters to be estimated. This choice leads to an explicit exact likelihood.

(8)

Proposition 4. Assume that g(ϕ, θ)dν(ϕ) =N(µ, ω²). Then,

λi(Xi, θ) = 1

(1 +ω²V_i)^1/2exp

"

− Vi

2(1 +ω²Vi)

µ−Ui

Vi

²# exp

U_i² 2Vi

.

The conditional distribution, underP_θⁱ, ofφi givenXi is the distribution N

µ+ω²Ui

1 +ω²Vi

, ω² 1 +ω²Vi

.

Therefore, the logarithm of the likelihood function (5) is explicitly given by LN(θ) =−1

2

N

X

i=1

log(1 +ω²Vi)−1 2

N

X

i=1

V_i 1 +ω²Vi

µ−U_i

Vi

2

+

N

X

i=1

U_i² 2Vi

. (8)

The derivatives of the log-likelihood (8) are

∂

∂µLN(θ) =

N

X

i=1

U_i 1 +ω²Vi

−µ V_i 1 +ω²Vi

,

∂

∂ω²LN(θ) = 1 2

N

X

i=1

"

Ui

1 +ω²Vi

−µ Vi

1 +ω²Vi

²

− Vi

1 +ω²Vi

# .

Whenω²₀ is known, we obtain the explicit estimator forµ₀:

µb_N = PN

i=1 Ui

1+ω²₀Vi

PN i=1

V_i 1+ω²₀V_i

. (9)

When both parameters are unknown, the maximum likelihood estimators of θ₀= (µ₀, ω²₀)are given by the system:

µbN =

N

X

i=1

Vi

1 +ωb_N²V_i

!⁻¹ _N X

i=1

Ui

1 +ωb_N²V_i

! ,

N

X

i=1

µb_N −U_i Vi

2

V_i² (1 +ωb²_NVi)² =

N

X

i=1

V_i 1 +ωb²_NVi

.

Remark 4. Note that, when the effect φ_i is non random and φ_i ≡ µ₀, the estimator ofµ0 is standardly given by:

eµ_N = PN

i=1Ui

PN

i=1V_i, (10)

(9)

which corresponds to ω²₀= 0 inµbN.

4.2 Preliminary moments properties

For studying the maximum likelihood estimators ofθ= (µ, ω²), we need inves- tigate properties of the following random variables:

γ_i(θ) = Ui−µVi

1 +ω²V_i, I_i(ω²) = Vi

1 +ω²V_i. (11) Indeed underQθ,(γi(θ), Ii(ω²))i=1,...,N arei.i.d. and the score function is

∂

∂µLN(θ) =

N

X

i=1

γi(θ), ∂

∂ω²LN(θ) = 1 2

N

X

i=1

(γ²_i(θ)−Ii(ω²)). (12)

Evidently, 0 < I_i(ω²) ≤ 1/ω² is bounded. By the following lemma, which is crucial for the statistical study, we prove that γi(θ) admits a finite Laplace transform, hence moments of any order.

Lemma 1. For all θ= (µ, ω²)∈R×(0,+∞), and all u∈R, Eθ(exp (u U1

1 +ω²V1

))<+∞.

We can now compute some useful moments of functions ofγ1(θ), I1(ω²).

Proposition 5. For all θ∈R×(0,+∞), the following relations hold:

Eθ(γ1(θ)) = 0, Eθ(γ₁²(θ)) =Eθ(I1(ω²)), Eθ(γ₁³(θ)) = 3Eθ(γ1(θ)I1(ω²)), Eθ γ₁²(θ)−I1(ω²)²

= 4Eθ(γ₁²(θ)I1(ω²))−2Eθ(I₁²(ω²)).

4.3 Convergence in distribution of the normalized score function

Based on lemma 1 and Proposition 5, we can state:

Proposition 6. For allθ, underQθ, asN tends to infinity, the random vector

√1 N

∂

∂µLN(θ)

∂

∂ω²LN(θ)

!

= 1

√N

PN i=1γi(θ)

1 2

PN

i=1(γ_i²(θ)−I_i(ω²))

!

(10)

converges in distribution toN2(0,I(θ))and the matrix

−1 N

∂²

∂µ²LN(θ) _∂µ∂ω^∂² ₂LN(θ)

∂²

∂µ∂ω²L_N(θ) _∂ω^∂₂²_∂ω₂L_N(θ)

!

converges in probability toI(θ)where

I(θ) = E_θ(I₁(ω²)) E_θ(γ₁(θ)I₁(ω²))

Eθ(γ1(θ)I1(ω²)) Eθ(γ₁²(θ)I1(ω²))−¹₂Eθ(I₁²(ω²))

!

(13)

is the covariance matrix of the vector γ1(θ)

1

2(γ²₁(θ)−I₁(ω²))

! . The following corollary holds immediately.

Corollary 1. Whenω²=ω²₀ is known, the explicit estimatorµb_N (9) is consistent and√

N(µbN−µ0)converges in distribution underQµ₀toN(0,1/Eµ₀(V1/(1+

ω²₀V₁)))where 1/E_µ₀(V₁/(1 +ω²₀V₁))≥ω₀².

If φ₁, . . . , φ_N were observed, the MLE of µ₀ would be φ¯= _N¹(φ₁+. . .+φ_N) which satisfies √

N( ¯φ−µ0) ∼ N(0, ω₀²). As φ1, . . . , φN are not observed, we obtain that the MLEµbN has a larger asymptotic variance.

4.4 Consistency and asymptotic normality

When both parameters µ0, ω₀² are unknown, we need to introduce additional assumptions to prove identifiability and consistency. Recall thatQis the distribution onC such that the canonical processes(Xi(s), s≤T, i= 1, . . . , N)are i.i.d. andXi satisfies the SDE with null drift:

dX_i(t) =σ(X_i(t))dW_i(t), X_i(0) =x.

We assume that

(H4) The function b(.)/σ(.) is not constant. Under Q, the random variable (U₁, V₁) admits a density f(u, v) w.r.t. the Lebesgue measure on R× (0,+∞)which is jointly continuous and positive on an open ball ofR× (0,+∞).

(H5) The parameter set Θis a compact subset ofR×(0,+∞).

(H6) The true valueθ0 belongs toΘ.^◦

(11)

(H7) The matrixI(θ0)is invertible (see (13)).

Under smoothness assumptions on functionsb, σ, assumption (H4) will be fulfilled by application of Malliavin calculus tools^†. The case where b(.)/σ(.) is constant is rather simple and is treated separately in Section 7. Assumptions (H5)-(H7) are classical. We first state an identifiability result.

Proposition 7. Set K(Q¹_θ

0, Q¹_θ)the Kullback information ofQ¹_θ

0 w.r.t. Q¹_θ. (i) Under (H1)-(H2) and (H4), Q¹_θ =Q¹_θ

0 implies that θ =θ0. Hence, θ→ K(Q¹_θ

0, Q¹_θ)admits a unique minimum atθ=θ0. (ii) Under (H1)-(H2), the function θ → K(Q¹_θ

0, Q¹_θ) is continuous on R× (0,+,∞).

We are now able to prove the consistency and asymptotic normality ofθˆN. Proposition 8. 1. Assume (H1)-(H2) and (H4)-(H5). Let θˆ_N be a maxi-

mum likelihood estimator defined as any solution ofLN(ˆθN) = sup_θ∈ΘLN(θ).

Under Qθ₀,θˆN converges in probability toθ0.

2. Assume (H1)-(H2) and (H4)-(H7). The maximum likelihood estimator satisfies, as N tends to infinity,

√

N(ˆθN −θ0)→_DN2(0,I⁻¹(θ0)).

Note that the consistency obtained here is a strong consistency in the sense that any solution of the likelihood equation is consistent.

5 Gaussian multidimensional linear random ef- fects

In this section, we extend the previous results to multidimensional linear random effects. Letφi = (φ¹_i, . . . , φ^d_i)⁰ be a d-dimensional random vector and b(x) = (b¹(x), . . . , b^d(x))⁰ be a functionR→R^d, wherex⁰ denotes the transpose of x.

Consider the SDE

d Xi(t) =φ⁰_ib(Xi(t))dt+σ(Xi(t))dWi(t), Xi(0) =x. (14)

†Ifσandf=b/σareC^∞, if their derivatives of any order greater than 1 are bounded, ifσ is bounded below, and if the Lebesgue measure of the set of valuesxsuch thatf(x)f⁰(x) = 0 is zero, then assumption (H4) holds.

(12)

We assume that b¹(x), . . . , b^d(x) are such that b(x, ϕ) = Pd

j=1ϕ^jb^j(x) satisfies (H1)-(H2) and that (φ_i, i = 1, . . . , N) are i.i.d. Gaussian vectors, with expectation vector µ and covariance matrix Ω ∈ Sd(R) where Sd(R) is the set of positive definite symmetric matrices. The parameter to be estimated is θ= (µ,Ω)∈R^d× Sd(R). To compute the likelihood, we introduce the random vectors

Ui = Z T

0

b(Xi(s))

σ²(Xi(s))dXi(s), and thed×drandom matrices

Vi = Z T

0

b(X_i(s))b⁰(X_i(s)) σ²(Xi(s)) ds.

The following assumption is now required.

(H8) Fori= 1, . . . , N the matrixV_i is positive definiteQⁱ-a.s. andQⁱ_θ-a.s. for allθ.

If the functions(b^j/σ²) are not linearly independent, (H8) is not true. Thus, (H8) can be interpreted as ensuring a well-defined dimension of the vectorφi. We deduce the invertibility of matrices involved in the likelihood computation.

Lemma 2. Under (H8), the matricesV_i+ Ω⁻¹,I_d+V_iΩ,I_d+ ΩV_i are invertible Qⁱ-a.s. and Qⁱ_θ-a.s. for all θ.

Then we can compute the likelihood.

Proposition 9. Under (H8), set R⁻¹_i = (Id+ViΩ)⁻¹Vi, we have λi(Xi, θ) = 1

pdet(I_d+V_iΩ)exp

−1

2(µ−V_i⁻¹Ui)⁰R⁻¹_i (µ−V_i⁻¹Ui)

exp 1

2U_i⁰V_i⁻¹Ui

. The conditional distribution ofφ_i givenX_i is the Gaussian distribution

Nd (Id+ ΩVi)⁻¹µ+ (Ω⁻¹+Vi)⁻¹Vi,(Id+ ΩVi)⁻¹Ω The likelihood isΛN(θ) =QN

i=1λi(Xi, θ).

Remark 5. To compute the likelihoodλi(Xi, θ), we do not need that Ωbe invertible. We only have to integrateL_T_i(X_i, ϕ)w.r.t. the distribution ofφ_i. This includes the case where this distribution admits no density w.r.t. the Lebesgue measure ofR^d. In particular, some components of φ_i may be deterministic (the

(13)

corresponding line ofΩ being a line of zeros). Hence, our method contains the case of fixed and random effects parameters.

The score function (respectively ad-vector and ad×dmatrix) is given by:

∂

∂µLN(θ) =

N

X

i=1

γi(θ), ∂

∂ΩLN(θ) = 1 2

N

X

i=1

(γi(θ)γ⁰_i(θ)−Ii(Ω))

whereθ= (µ,Ω)and :

γ_i(θ) = (I_d+ ΩV_i)⁻¹(U_i−V_iµ), I_i(Ω) = (I_d+ ΩV_i)⁻¹V_i. (15) WhenΩ₀ is known, the estimator forµ₀ is explicit.

Lemma 1 and Propositions 5 and 6 can be readily extended to the multidimensional case.

Proposition 10. 1. For allθ= (µ,Ω)∈R×(0,+∞), and allu∈R, Eθ(exp (u⁰(Id+ ΩVi)⁻¹U1))<+∞.

2. For allθ∈R×(0,+∞), the following relations hold:

E_θ(γ₁(θ)γ₁⁰(θ)) =E_θ(I₁(Ω)), E_θ(γ₁(θ)γ⁰₁(θ)γ₁(θ)) = 3E_θ(I₁(Ω)γ₁(θ)), E_θ(γ₁(θ)) = 0, E_θ(γ₁(θ)γ⁰₁(θ)−I₁(Ω))²= 4E_θ(I₁(Ω)γ₁(θ)γ₁⁰(θ))−2Eθ(I₁²(Ω)).

3. For allθ, underQθ, asN tends to infinity, the random vector

√1 N

∂

∂µLN(θ)

∂

∂ΩLN(θ)

!

= 1

√N

PN i=1γi(θ)

1 2

PN

i=1(γi(θ)γ_i⁰(θ)−Ii(Ω))

!

converges in distribution toN(0,I(θ))whereI(θ)is the covariance matrix of the vector

γ1(θ)

1

2(γ₁(θ)γ⁰₁(θ)−I₁(Ω))

!

which is also the limit of the observed Fisher information matrix.

The study ofbθ_N = (bµ_N,Ωb_N)can be done as above.

6 Discrete data

In this section, we briefly discuss the case of discrete data. Let us assume that we observe synchronously the processes Xi(t) at times tⁿ_k = tk = k^T_n,

(14)

k= 0,1, . . . , n. To build estimatorsbθ⁽ⁿ⁾_N based on these data, we simply replace the r.v.’sUi, Vi,i= 1, . . . , N by their discretized versions:

U_iⁿ =

n−1

X

k=0

b(Xi(tk))

σ²(Xi(tk))(Xi(tk+1)−Xi(tk)), (16) V_iⁿ =

n−1

X

k=0

b²(Xi(tk))

σ²(X_i(t_k))(t_k+1−t_k). (17) Looking at the expressions of (8) and its derivatives w.r.t. θ, it is enough to study the differencesU_i−U_iⁿ,V_i−V_iⁿ. We can prove

Lemma 3. Assume thatb/σ is bounded and Lipschitz, σ(.)≥ >0,b andσ Lipschitz, then for allp≥1and all i= 1, . . . , N, there exists a constantC such that

E^θ0(|Vi−V_iⁿ|^p+|Ui−U_iⁿ|^p)≤ C n^p/2. We deduce

Proposition 11. If n→+∞, then θbN −θb_N⁽ⁿ⁾=oP_θ₀(1). Ifn=n(N)→+∞

in such a way that _Nⁿ →+∞, then√

N(bθ_N−θb_N⁽ⁿ⁾) =o_P_θ

0(1).

Remark 6. The estimatorbθ⁽ⁿ⁾_N is actually obtained by discretization of the continuous time conditional SDE likelihood (4). This would be the correct discrete- time likelihood if the conditional distribution of X_i^ϕ(tⁿ_k+1) given X_i^ϕ(tⁿ_k) was Gaussian with meanX_i^ϕ(tⁿ_k)+ϕ b(X_i^ϕ(tⁿ_k))(tⁿ_k+1−tⁿ_k)and varianceσ²(X_i^ϕ(tⁿ_k))(tⁿ_k+1− tⁿ_k). Of course one can use the estimators θb⁽ⁿ⁾_N for fixed n but they will have a systematic asymptotic bias (asN→ ∞) depending on n.

7 Simulation study

This section concerns the situationb(x, ϕ) =Pd

j=1ϕ^jb^j(x)withd= 1ord= 2.

Several models are simulated. For each SDE model, 100 datasets are generated withNsubjects on the same time interval[0, T]and three experimental designs:

(N = 20,T = 5), (N = 50,T = 5) and (N = 50,T = 10). The empirical mean and variance of the MLE are computed from the 1000 datasets. When possible, N⁻¹I_N⁻¹(θ0)is also computed and compared with the empirical variance.

(15)

7.1 When b(x) = c σ(x)

Let us consider the case whereb(x) =c σ(x), withc6= 0known. Then we have Vi=c²T, Ui=c

Z T

0

dX_i(s)

σ(Xi(s)). (18)

The estimators ofµ0, ω₀² are simple and explicit:

µbN = 1 c²T N

N

X

i=1

Ui= 1 c²T

U¯N, ωb²_N = 1 (c²T)²

1 N

N

X

i=1

(Ui−U¯N)²−c²T

! .

Using thatUi=c²T φi+cWi(T), an elementary study shows that bµN and bω²_N are strongly consistent, that√

N(µbN−µ0)has distributionN(0, ω₀²+ 1/(c²T)) and that√

N(ωb_N² −ω²₀)converges in distribution toN(0,2(ω²₀+1/(c²T))²). The asymptotic variances ofµbN andωb²_N are increased in comparison with the case of non random effects.

We stress the fact that, whatever the drift funtion b(.), when b(.)/σ(.)is constant, the estimators have the same distribution.

Example 1. Consider a mixed-effects Brownian motion with drift dXi(t) = φidt+σ dWi(t), Xi(0) = 0,

with φi ∼ N(µ, ω²). This model is considered by Ditlevsen and De Gaetano (2005a) and the estimators are the same. We use two sets of population parameters: (µ=−1, ω² = 1) and (µ = 5, ω² = 1). All simulations are performed with a discretization step-sizeδ= 0.001on[0, T]andσ= 1(σknown). Results, presented in Table 1, are satisfactory overall. IncreasingN improves the accuracy of both estimatesµbN andωb²_N. For T = 5, both estimates are less biased when the number of subjects is50instead of20. The variance of the estimates is also decreased with larger values of N. In general the empirical variances coincide with the values of the asymptotic variances. Apparently, increasingT does not have any significant impact on the properties of the estimates. Ad- ditional simulations with other values forµ0and ω₀²have basically shown that the properties of the estimates degrade whenω₀²takes bigger values.

[Table 1 about here.]

We have also considered simulations of the mixed-effects geometric Brownian motion where b(x) = x, σ(x) = σ x and the model where b(x) = √

1 +x²,

(16)

σ(x) = σ√

1 +x². On our simulated data, the true parameter values were correctly estimated.

7.2 General case

Example 2. Consider an Ornstein-Uhlenbeck process with one random effect dX_i(t) = φ_iX_i(t)dt+σ dW_i(t), X_i(0) = 0,

withφi∼ N(µ, ω²). We separate three situations: i)ω²is known (θ=µ),ii)µ is known (θ=ω²),iii)both parameters are unknown (θ= (µ, ω²)). Whenω²₀is known, we use the explicit expression ofµb_N. Otherwise, numerical optimization procedures are required for maximizing the log-likelihood with respect toµand ω². In situationsi)andii), the asymptotic variance of the estimate has an explicit expression and is computed. Each individual diffusion is simulated with a discretization step-sizeδ= 0.001on[0, T]andσ= 1. Several sets of parameter values are used: (µ = −5, ω² = 1) and (µ = 10, ω² = 1). Table 2 displays the results of the three inferencesi),ii)andiii). Results highlight the accuracy of the estimates of both parametersµandω² whatever the design and the parameter values. In the three considered inference situations, increasingN leads to smaller bias of the parameter estimates and smaller variances. Moreover, we notice great similarities betweenN⁻¹I_N⁻¹(θ₀)and the empirical variance ofθb_N, especially whenN is large. Finally, we don’t observe any significant impact of T neither on the bias nor on the variance of the parameter estimates.

Example 3. Consider an Ornstein-Uhlenbeck process with two random effects dXi(t) = (φ¹_iXi(t) +φ²_i)dt+σ dWi(t), Xi(0) = 0,

with φ_i = (φ¹_i, φ²_i)⁰ ∼ N₂(µ,Ω), µ = (µ₁, µ₂)⁰ and a diagonal matrix Ω with components(ω₁², ω₂²). For this model, assumption (H8) is satisfied. Indeed

Vi= 1 σ²

RT

0 X_i²(s)ds RT

0 X_i(s)ds RT

0 Xi(s)ds T

! .

Using the equality case in the Cauchy-Schwarz inequality, we see thatdet(Vi) = 0if and only if X_i(t) is constant on[0, T]which is impossible. The estimation

(17)

ofθ= (µ1, µ2, ω₁², ω²₂)is obtained by optimizing numerically the log-likelihood.

Each individual diffusion is simulated with a discretization step-sizeδ= 0.001on [0, T]and σ= 1. Several sets of parameter values are used: (µ1 =−0.1, µ2 = 1, ω₁² = 0.01, ω₂² = 1) and (µ1 = −0.1, µ2 = 1, ω²₁ = 0.001, ω₂² = 1). Table 3 displays the results of estimation. Parameters are well estimated although µ₂ has a larger bias whenω₂²is greater. Biases decrease whenN increases. Again, the influence ofT is small.

Example 4. Consider the process with single random effect dX_i(t) = φ_iX_i(t)dt+σp

1 +X_i(t)²dW_i(t), X_i(0) = 0,

with φ_i ∼ N(µ, ω²). We obtain the estimate for θ = (µ, ω²) by numerical optimization of the log-likelihood with respect to µ and ω². We simulate N individual diffusions with a discretization step-size δ = 0.001 on [0, T], and several sets of parameter values are used: (µ=−1, ω²= 1)and(µ= 5, ω²= 1).

Table 4 displays the results of estimation. The results are satisfactory overall, although parameterω²is estimated with larger bias than parameterµ. Bias and empirical variances of the estimates decrease whenN becomes larger. T = 10 leads to better results (smaller bias) for the estimation ofω² thanT = 5.

8 Concluding remarks and discussion

In this paper, we have studied maximum likelihood estimation fori.i.d. observations of stochastic differential equations including a random effect in the drift term. When the drift term depends linearly on the random effect, we prove that the likelihood is given by a closed-form formula and that the exact MLE is strongly consistent and asymptotically Gaussian as the number of observed processes tends to infinity.

For the clarity of exposure, we have considered only one-dimensional SDEs, but the theory can be done in the same way for multidimensional SDEs. For a drift term depending linearly on the random effects, the likelihood is still explicit although its formulae may be much more cumbersome.

Our model contains only random effects parameters. However, in our context, the random variablesφi can be written as µ+ηi withηi ∼ N(0, ω²). Thus µ

(18)

can be interpreted here as a fixed effect parameter. Ifω= 0, the estimator ofµ is given by the standard formula (10). One can also consider the extension to a drift termb(x, ϕ, β) =ϕb(x, β)whereβ is an additional fixed parameter. Then the exact likelihood forφi ∼i.i.d. N(µ, ω²)is computed analogously, one needs only replaceU_i, V_i by

U_i(β) = Z T

0

b(X_i(s), β)

σ²(Xi(s))dX_i(s), V_i(β) = Z T

0

b²(X_i(s), β) σ²(Xi(s)) ds in Proposition 4.

Several interesting extensions of the present study are possible but require fur- ther work and other tools to accomodate the estimation procedure. A first direction would be to consider that processes are observed with measurement errors (Overgaardet al., 2005; Donnet and Samson, 2008). However this makes exact likelihood theory still more difficult with little hope to get explicit expressions for general diffusion models.

Another direction would be to consider that the diffusion coefficient contains unknown fixed or random parameters. In this case, we have to consider discrete time observations and approximate likelihood. This subject is ongoing work.

Acknowledgements. We thank Arnaud Gloter for his helpful contribution on assumption (H4).

References

Beal, S. and Sheiner, L. (1982). Estimating population kinetics.Critical Reviews in Biomedical Engineering 8, 195–222.

Davidian, M. and Giltinan, D. (1995). Nonlinear models to repeated measurement data. Chapman and Hall.

Ditlevsen, S. and De Gaetano, A. (2005a). Mixed effects in stochastic differential equation models. REVSTAT Statistical Journal 3, 137–153.

Ditlevsen, S. and De Gaetano, A. (2005b). Stochastic vs. deterministic uptake of dodecanedioic acid by isolated rat livers. Bull. Math. Biol.67, 547–561.

Donnet, S. and Samson, A. (2008). Parametric inference for mixed models defined by stochastic differential equations. ESAIM P&S 12, 196–218.

(19)

Karatsas, I. and Shreve, S. (1997). Brownian Motion and Stochastic Calculus.

Springer-Verlag.

Kuhn, E. and Lavielle, M. (2004). Coupling an approximation version of em with an mcmc procedure. ESAIM Probability and Statistics 8, 115–131.

Lipster, R. and Shiryaev, A. (2001). Statistics of random processes I : general theory. Springer.

Nie, L. (2006). Strong consistency of the maximum likelihood estimator in generalized linear and nonlinear mixed-effects models. Metrika 63, 123–143.

Nie, L. (2007). Convergence rate of the mle in generalized linear and nonlinear mixed-effects models: Theory and applications. Journal of Statistical Plan- ning and Inference 137, 1787–1804.

Nie, L. and Yang, M. (2005). Strong consistency of the mle in nonlinear mixed- effects models with large cluster size.Sankhya: The Indian Journal of Statis- tics67, 736–763.

Overgaard, R., Jonsson, N., Tornøe, C. and Madsen, H. (2005). Non-linear mixed-effects models with stochastic differential equations: Implementation of an estimation algorithm. J Pharmacokinet. Pharmacodyn.32, 85–107.

Picchini, U., De Gaetano, A. and Ditlevsen, S. (2010). Stochastic differential mixed-effects models. Scand. J. Statist.37, 67–90.

Picchini, U. and Ditlevsen, S. (2011). Practicle estimation of high dimensional stochastic differential mixed-effects models.Computational Statistics & Data Analysis55, 1426–1444.

Picchini, U., Ditlevsen, S. and De Gaetano, A. (2008). Maximum likelihood estimation of a time-inhomogeneous stochastic differential model of glucose dynamics. Math. Med. Biol.25, 141–155.

Pinheiro, J. and Bates, D. (2000).Mixed-effect models in S and Splus. Springer- Verlag.

Revuz, D. and Yor, M. (1999). Continuous martingales and Brownian motion, vol. 293. Springer-Verlag, Berlin, third edn.

(20)

van der Vaart, A. W. (1998). Asymptotic statistics, vol. 3 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge. ISBN 0-521-49603-9; 0-521-78450-6.

Wolfinger, R. (1993). Laplace’s approximation for nonlinear mixed models.

Biometrika 80, 791–795. ISSN 0006-3444.

The corresponding author is

Adeline Samson ([email protected]) Laboratoire MAP5

Universite Paris Descartes 45 rue des St Peres 75006 Paris France

9 Appendix: proofs

Proof of Proposition 1. Consider the two-dimensional SDE:

dXi(t) = b(Xi(t), φi(t))dt+σ(Xi(t))dWi(t), Xi(0) =xⁱ, dφi(t) = 0, φi(0) =φi.

Under (H1), the above system admits a unique strong solution and there exists a functionalF such thatX_i(.) =F_.(xⁱ, φ_i, W_i(.))whereF_.:R×R^d×C(R⁺,R)→ C(R⁺,R)is measurable (seee.g.Karatsas and Shreve, 1997, p.310).

Moreover,X_i^ϕ(.) =F_.(xⁱ, ϕ, W_i(.)). By the Markov property of the joint process ((Xi(t), φi(t) ≡ φi), t ≥ 0), the conditional distribution of Xi(.) given φi = ϕ is identical to the distribution of X_i^ϕ(.). As (φi, Wi(.)) are independent, the processes (Xi(.)) are independent. As (xⁱ, φi) is the initial condition, the moment result follows.

Proof of Proposition 2. Under (H1)-(H2), the first part is classical (see e.g.

Lipster and Shiryaev, 2001).

To prove the continuity inϕ, two kinds of terms are to be studied. The first is, for a givenXi, the ordinary integral

ϕ→ Z T_i

0

b²(Xi(s), ϕ)

σ²(X_i(s)) ds. (19)

Using (H1)-(H3) and the continuity theorem for ordinary integrals, we obtain easily the continuity of (19).

(21)

Second, the stochastic integral ϕ→

Z T_i

0

b(Xi(s), ϕ)

σ²(Xi(s))dXi(s) :=I1(ϕ) +I2(ϕ), (20) with

I₁(ϕ) = Z Ti

0

b(X_i(s), ϕ)

σ²(Xi(s))b(X_i(s), ϕ₀)ds, I2(ϕ) =

Z T_i

0

b(Xi(s), ϕ)

σ²(Xi(s)) (dXi(s)−b(Xi(s), ϕ0)ds).

The function ϕ → I1(ϕ) is studied using (H1)-(H3) and the continuity theorem for ordinary integrals again. ForI₂(ϕ), using the Burkholder-Davis-Gundy inequality, we get, using (H1)-(H3):

EQⁱ(I₂(ϕ)−I₂(ϕ⁰))^2k ≤C_kEQⁱ

"

Z T_i

0

(b(X_i(s), ϕ)−b(X_i(s), ϕ⁰))² σ²(Xi(s)) ds

#^k

≤Ck|ϕ−ϕ⁰|^2kEQⁱ

"

Z Ti

0

c²K(1 +|Xi(s)|^2γ)(1 +|Xi(s)|)²ds

#k

≤C(k, γ)|ϕ−ϕ⁰|^2k Z Ti

0

(1 +EQⁱ(|X_i(s)|^2(γ+1))ds.

Using (3) and choosing2k > d, the Kolmogorov criterion (see e.g. Revuz and Yor, 1999) yields thatϕ→I2(ϕ)admits a continuous versionQⁱ-a.s..

AsXi →LTi(Xi, ϕ)is measurable for all ϕand ϕ→LTi(Xi, ϕ)is continuous for all Xi, the joint measurability can be proved as follows. For m ∈ N and k = (k₁, . . . , k_d) ∈ Z^d, set B_k,m = Qd

i=1[k_i/2^m,(k_i+ 1)/2^m[. These sets are disjoint and for allm,R^d =∪_k∈_ZdBk,m. Letϕk,m = (ki/2^m, i= 1, . . . , d)and set:

Lm(Xi, ϕ) = X

k∈Z^d

LT_i(Xi, ϕk,m)1B_k,m(ϕ).

Thus, Lm(Xi, ϕ) is jointly measurable. As LT_i(Xi, ϕ) is continuous w.r.t. ϕ, Lm(Xi, ϕ)→_m→+∞LT_i(Xi, ϕ). Hence, the result.

Proof of Proposition 3. ForH a positive measurable function on CT_i, we have:

E_Qi

θ(H(X_i)) =E_Pi

θ(H(X_i)) =E_Pi θ[E_Pi

θ(H(X_i)|φi)].

By Propositions 1 and 2, asLT_i(Xi, ϕ)is the density ofQ^x_ϕⁱ^,Tⁱw.r.t. Qⁱ, we get:

E_Pi

θ(H(Xi)|φi=ϕ) =E_Qxi,Ti

ϕ (H(Xi)) =EQⁱ(H(Xi)LT_i(Xi, ϕ)). Using the joint measurability of LT_i(Xi, ϕ)w.r.t. (Xi, ϕ), the Fubini theorem