Asymptotic criteria for designs in nonlinear regression with model errors

(1)

HAL Id: hal-00158853

https://hal.archives-ouvertes.fr/hal-00158853

Submitted on 28 Feb 2008

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Asymptotic criteria for designs in nonlinear regression

with model errors

Andrej Pazman, Luc Pronzato

To cite this version:

Andrej Pazman, Luc Pronzato. Asymptotic criteria for designs in nonlinear regression with model

errors. Mathematica Slovaca, 2006, 56 (5), pp.543-553. �hal-00158853�

(2)

with model errors Andrej P´azman and Luc Pronzato

Department of Applied Mathematics and Statistics Faculty of Mathematics, Physics and Informatics

Comenius University

Mlynsk´a dolina 84248 Bratislava, Slovakia [email protected]

and

Laboratoire I3S, CNRS/UNSA, Bˆat. Euclide, Les Algorithmes, 2000 route des Lucioles, BP 121, 06903 Sophia-Antipolis Cedex, France

[email protected]

The research of the first author supported by the Slovak VEGA-grants No. 1/0264/03 and 1/3016/06

The work of the second author was partially supported by the IST Programme of the European Community, under the PASCAL Network of Excellence,

IST-2002-506778. This publication only reflects the authors’s view.

AMS subject classification: 62K05, 62J02

Keywords and phrases: nonlinear regression, least squares, optimum design, asymptotic normality, measures of nonlinearity, information matrix

Abstract

We derive bounds for the design optimality criteria under the assumption that the supposed regression model y (xk) = η (xk, θ) + εk; k = 1, 2, ... does not correspond

to the true one. The investigation is based on the asymptotic properties of the LSE of θ, and full proofs of these properties are presented under the assumption that the sequence of design points {xk}∞k=1 is randomly sampled according to a design

measure ξ. The bounds and the asymptotic properties are related to the intrinsic measure of nonlinearity of the model.

1. Introduction.

In optimal experimental design one often assumes that there are no errors in the considered model, a condition rarely satisfied in practice. Papers considering robustness to model errors are usually dealing with linear models, and the main stress is on bias issues, that is on the term kν (.) − η (., θ)kξ in Theorem 1, below

(cf. [6,9,11]). Here we want to show that the situation is different in a nonlinear regression model

y (xk) = η (xk, θ) + εk ; θ ∈ Θ

(1)

(3)

2

with εki.i.d. random variables, E (εk) = 0, V ar (εk) = σ2 ; k = 1, 2, .... The model

error consists in that the true description of the data is y (xk) = ν (xk) + εk

(2)

where ν (xk) are unknown, and are outside the model (1), i.e. there is no θ ∈ Θ such

that ν (xk) = η (xk, θ) ; k = 1, 2, .... We shall show that changes of the asymptotic

variance of the LS estimators are modified by this misspecification of the model. The main question to be answered here is ”how far” from the model can be the values ν (xk) , so as not to influence too much the usual criteria for the design of

the experiment.

Assumptions on models: We take assumptions usual in experimental design

the-ory. The set Θ is a compact subset of Rp_{having no isolated points, i.e. int(Θ) = Θ.}

Each xk is an element of X , the design space, which is a compact subset of an

Eu-clidean space. The function η (x, θ) is continuous on X × Θ, and its first and second order derivatives with respect to the components of θ exist, and are continuous on X × int(Θ). The function ν (x) is continuous on X .

2. The asymptotic properties of the LSE. The estimator considered is the LSE

ˆ θ(N )= arg min θ∈Θ ° ° °y (N ) − η(N )(θ)°° ° 2 where y(N ) _{= (y (x} 1) , ..., y (xN))T, η(N )(θ) = (η (x1, θ) , ..., η (xN, θ))T.The se-quence of estimatorsn ˆθ(N )o∞

N =1has no stabilized asymptotic properties, even when

model (1) is true, unless the sequences {η (xk, θ)}∞k=1,

n_∂η(x k,θ) ∂θi o∞ k=1, n_∂2_η(x k,θ) ∂θi∂θj o∞ k=1

have some regularity properties. There are different ways how to formulate these properties. The classical approach in [5] requires that these sequences has ”finite tail products”, the book [4] requires assumptions formulated as suprema and in-fima of very complex terms , the same is true for [2]. Here we start from the point of view that in the context of experimental design the stress should be on formu-lating the assumptions in terms of the design measure, which by definition is a probability measure ξ on X . The standard interpretation of ξ is that the sequence of frequencies of the design points {xk}∞k=1 must approach ξ (in a certain sense).

Here we restrict our attention to the particular case that {xk}∞k=1 is obtained by

independent random sampling from ξ. This allows to obtain complete proofs of asymptotic properties of ˆθ(N ) _{in a much simpler way than in classical papers on}

nonlinear regression, maintaining the main feature of a general approach. The proofs of the following statements are presented in Section 4.

Lemma 1_{. If {x}k}∞k=1 is randomly sampled from the probability distribution

ξ, then, with probability equal to one, for any real function a (x, ε, θ) defined and

continuous on X × R × Θ, and such that

Z R ·Z X max θ∈Θ|a (x, ε, θ)| ξ (dx) ¸ P (dε) < ∞

(4)

the uniform strong law of large numbers (USLLN) holds, i.e. lim N →∞ 1 N N X k=1 a (xk, εk, θ) = Z R ·Z X a (x, ε, θ) ξ (dx) ¸ P (dε)

uniformly over θ ∈ Θ, and a.s. with respect to P and ξ.

Let us denote

kν (.) − η (., θ)k2ξ =

Z

X

[ν (x) − η (x, θ)]2ξ (dx)

Theorem 1. _{Suppose that the sequence {x}k}∞k=1 is randomly sampled from

ξ. Then under the assumptions on the model (2), the sequence of estimators

n ˆ_θ(N )o∞

N =1 converges a.s. to the set

˜

Θξ= arg min

θ∈Θkν (.) − η (., θ)k 2 ξ

Moreover the usual estimator of σ2

£s2¤(N ) = 1 N − p N X k=1 h y (xk) − η ³ xk, ˆθ(N ) ´i2

converges a.s. to σ2_{+ min}

θ∈Θkν (.) − η (., θ)k2ξ.

Remark: According to [5], the estimator ˆθ(N ) _{is a measurable function of}

xk, εk; k = 1, ..., N, if it is defined uniquely, otherwise, there is a measurable choice

for ˆθ(N )_.

Theorem 2_{. Suppose that the sequence {x}k}∞k=1 is randomly sampled from

ξ. Suppose that ˜Θξ = n ˜θ

o

is a one-point set, and that ˜θ ∈int(Θ). Suppose that M³ξ, ˜θ´is nonsingular, where M (ξ, θ) = Z X ∂η (x, θ) ∂θ ∂η (x, θ) ∂θT ξ (dx) and that Cint ³ ξ, ˜θ´ °° °ν (.) − η ³ ., ˜θ´°° ° ξ < 1 (3) where Cint(ξ, θ) = max u∈Rp_,u6=0 ° ° °£I − Pθ¤hP i,jui∂ 2_η(.,θ) ∂θi∂θj uj i° ° °_ξ uT_{M (ξ, θ) u}

is the generalized intrinsic measure of nonlinearity (curvature) of Bates and Watts (cf. [1]) in model (1), at the point θ, and under the design ξ. Denote

Dν(ξ, θ) = Z X[η (x, θ) − ν (x)] ∂2_{η (x, θ)} ∂θ∂θT ξ (dx) Mν(ξ, θ) = Z X[η (x, θ) − ν (x)] 2∂η (x, θ) ∂θ ∂η (x, θ) ∂θT ξ (dx)

(5)

4 Then M³ξ, ˜θ´+Dν ³ ξ, ˜θ´, M³ξ, ˜θ´+σ−2_M ν ³

ξ, ˜θ´are nonsingular matrices, and the sequence n√N³ ˆθ(N )_{− ˜}_θ´o∞

N =1 converges in distribution to a random vector

distributed normally, N³0, σ2_J−1 ν ³ ξ, ˜θ´´, where Jν ³ ξ, ˜θ´ = hM³ξ, ˜θ´+ Dν ³ ξ, ˜θí ×hM³ξ, ˜θ´+ σ−2Mν ³ ξ, ˜θí−1hM³ξ, ˜θ´+ Dν ³ ξ, ˜θí

Remarks: Cf. [7] for a detailed derivation of the intrinsic curvature for the case of a finite number of design points; the generalization to L2(ξ) is straightforward.

We denoted by I the identity operator in L2(ξ), and Pθis an orthogonal projector

in L2(ξ) defined by ¡Pθ_{φ¢ (x) =} ∂η (x, θ) ∂θT M −1_{(ξ, θ)}Z X ∂η (x∗_{, θ)} ∂θ φ (x ∗_{) ξ (dx}∗₎

for any φ (.) ∈ L2(ξ). Notice also that without the assumption of nonsingularity

of M³ξ, ˜θ´we can have irregularities of the distribution of ˆθ(N )_{, even when model}

(1) is correct (cf. [8]).

3. Bounds for the criteria of optimality.

When model (1) is correct, and ˜θ is the true parameter value, then σ2h_M³_{ξ, ˜}_θ´i−1

is the asymptotic variance matrix of ˆθ(N )_{. The experimenter designing the}

experi-ment does not know that model (1) is wrong, and even if he suspects that, he does not know the true mean function ν (.). The usual approach consists in taking a point θ(o)_{∈int(Θ), which is supposed to be close to the hypothetical true value of}

θ, and applying an optimality criterion Φ on the information matrix M¡ξ, θ(o)_¢.

One then computes the optimum design in a standard way, still supposing that model (1) is true. The question is how efficient is the resulting design in model (2) Any optimality criterion should respect the ordering of information matrices, hence a standard assumption on optimality criteria is the monotonicity

M ≤ M∗_{⇒ Φ [M] ≤ Φ [M}∗_]

where M ≤ M∗ _{means u}T_{M u ≤ u}T_M∗_{u for every u ∈ R}p_{. A good design ξ}

should thus give a large value of Φ£M ¡ξ, θ(o)_{¢¤. This ordering is opposite to the}

standard one in most books and papers on optimum experimental design, but it is recommended in [10] using information arguments: the criterion should measure the amount of information contained in the experiment. From this interpretation other important properties also follow: the positivity of Φ (Φ [M ] > 0), and the homogeneity of Φ

Φ [kM ] = kΦ [M ] ; k > 0

For example Φ [M ] = det1/p[M ] is a positive homogeneous monotone form of the criterion of D-optimality.

(6)

A design ξ∗_{is called locally Φ-optimal (in the locality of θ}(o)_{) when Φ}_{£M ¡ξ}∗_{, θ}(o)_{¢¤ =}

maxξΦ£M ¡ξ, θ(o)¢¤ On the other hand, when model (2) holds, the criterion

func-tion Φ should be applied to Jν

³

ξ, ˜θ´. Suppose ˜θ = θ(o)_{and compare both criteria.}

The following theorem gives limits for ΦhJν

³

ξ, ˜θ´iin terms of ΦhM³ξ, ˜θ´i Theorem 3. When Φ is positive, monotone and homogeneous, and when the as-sumptions of Theorem 2 hold, then we have the following bounds for ΦhJν

³ ξ, ˜θí δ2 b Φ h M³ξ, ˜θí≤ ΦhJν ³ ξ, ˜θí≤ ∆2ΦhM³ξ, ˜θí where δ = 1 − Cint ³ ξ, ˜θ´ °° °ν (.) − η ³ ., ˜θ´°° ° ξ ∆ = 1 + Cint ³ ξ, ˜θ´ °° °ν (.) − η ³ ., ˜θ´°° °_ξ b = 1 + maxx∈X h ν (x) − η³x, ˜θí2 σ2 Proof. Denote M = M³ξ, ˜θ´, Mν = Mν ³ ξ, ˜θ´, Dν = Dν ³ ξ, ˜θ´, J = Jν ³ ξ, ˜θ´, J∗₌h_M³_{ξ, ˜}_θ´_{+ D} ν ³ ξ, ˜θí hM³ξ, ˜θí−1hM³ξ, ˜θ´+ Dν ³ ξ, ˜θí, ν = ν (.) , ˜η = η³., ˜θ´, C = Cint ³

ξ, ˜θ´, P = Pθ˜_{. From the definition of ˜}_{θ we have}

ν − ˜η = (I − P ) (ν − ˜η). So for every u ∈ Rp _{we have}

uT(M + Dν) u = uTM u " 1 +< ˜η − ν, (I − P ) u T ∂2η(.,θ) ∂θ∂θT |_θ˜u >ξ uT_{M u} #

where < a, b >ξ=R_Xa (x) b (x) ξ (dx). From the Schwarz inequality

uT_{(M + D} ν) u ≤ ³ 1 + kν − ˜ηkξC ´ uT_{M u = ∆ u}T_{M u} uT(M + Dν) u ≥ ³ 1 − kν − ˜ηkξC ´ uTM u = δ uTM u (4)

Hence, since uT_M−1_{u = max} α h ¡uT_α¢2 /αT_{M α}i_{(cf. [3]), we have} 1 ∆u T_M−1 u ≤ uT(M + Dν)−1u ≤ 1 δu T_M−1_u Set u = (M + Dν) v, v ∈ Rp, to obtain 1 ∆v T_J∗_{v ≤ v}T_{(M + D} ν) v ≤ 1 δv T_J∗_v

which together with (4) gives

δ2vTM v ≤ vTJ∗_{v ≤ ∆}2_vT_{M v}

Since Mν is p.s.d. we have

(7)

6

so that

δ2vTM v ≤ bvTJv ∆2vTM v ≥ vTJv Since Φ is monotone and homogeneous it follows that

δ2 b Φ h M³ξ, ˜θí≤ ΦhJν ³ ξ, ˜θí≤ ∆2ΦhM³ξ, ˜θí ✷

4. Proofs of asymptotic properties.

Proof of Lemma 1. _{Denote Z =R×X , and z = (ε, x)}T the random vector distributed P × ξ on Z. Take some fixed θ1∈ Θ, and consider the set

B¡θ1_{, δ} ¢ = ©θ ∈ Θ :° °θ − θ1 ° °≤ δ ª Define ¯aδ(z) and a

−δ(z) as the maximum and the minimum of a (z, θ) over the set

B¡θ1_{, δ¢. We have that E} ½¯ ¯ ¯ ¯ a −δ(z) ¯ ¯ ¯ ¯ ¾

and E {|¯aδ(z)|} are bounded by

E {maxθ∈Θ|a (z, θ)|} < ∞. Moreover, ¯aδ(z) − a

−δ(z) is an increasing function of δ.

Hence we can change the order of the limit and the mean to prove that lim δց0 · E {¯aδ(z)} − E ½ a −δ(z) ¾¸ = E ½ lim δց0 · ¯ aδ(z) − a −δ(z) ¸¾ = 0 which proves the continuity of E {a (z, θ)} at θ1_{, and implies}

1 N X k a −δ(β)(zk) − E ½ a −δ(β)(z) ¾ −β 2 ≤ 1 N X k a −δ(β)(zk) − E©¯aδ(β)(z) ª ≤ _N1 X k a (zk, θ) − E {a (z, θ)} ≤ _N1 X k ¯ aδ(β)(zk) − E ½ a −δ(β)(z) ¾ ≤ _N1 X k ¯ aδ(β)(zk) − E©¯aδ(β)(z)ª + β 2 By the strong law of large numbers we have that ∀γ>0∃N (β,γ)

Prob ( ∀N >N (β,γ) ¯ ¯ ¯ ¯ ¯ 1 N X k ¯ aδ(β)(zk) − E©¯aδ(β)(z)ª ¯ ¯ ¯ ¯ ¯ <β 2 ) > 1 − γ 2 Prob ( ∀N >N (β,γ) ¯ ¯ ¯ ¯ ¯ 1 N X k a −δ(β)(zk) − E ½ a −δ(β)(z) ¾¯_¯ ¯ ¯ ¯ <β 2 ) > 1 − γ₂

(8)

Combining with the previous inequality we obtain: ∀γ>0∃N (β,γ) Prob ( ∀N >N (β,γ) max θ∈B(θ1_,δ(β)) ¯ ¯ ¯ ¯ ¯ 1 N X k a (zk, θ) − E {a (z, θ)} ¯ ¯ ¯ ¯ ¯ < β ) > 1 − γ

It only remains to cover the compact set Θ by a finite numbers of open sets B¡θi_{, δ (β)¢ ; i = 1, ..., n (β). For any α > 0 take γ = α/n (β) , N (β) = max}

iNi(β, γ). We obtain Prob ( ∀N >N (β)max θ∈Θ ¯ ¯ ¯ ¯ ¯ 1 N X k a (zk, θ) − E {a (z, θ)} ¯ ¯ ¯ ¯ ¯ < β ) > 1 − α

which completes the proof. ✷

Proof of Theorem 1. _{For any θ ∈ Θ} 1 N N X k=1 [y (xk) − η (xk, θ)]2 = 1 N N X k=1 ε2k+ 2 N N X k=1 [ν (xk) − η (xk, θ)] εk +1 N N X k=1 [ν (xk) − η (xk, θ)]2

since εk = y (xk) − ν (xk) . The function (x, θ, ε) → ε2+ 2 [ν (x) − η (x, θ)] ε +

[ν (x) − η (x, θ)]2 is continuous on X × Θ × R, so the USLLN holds, and

lim N →∞ 1 N N X k=1 [y (xk) − η (xk, θ)]2= σ2+ kν (.) − η (., θ)k2_ξ (5)

a.s., and uniformly over Θ, since E (ε) = 0. Denote zk = (εk, xk) and take a fixed

sequence z = {zk}∞k=1 such that this limit holds, and denote by θ#(z) a limit

point ofn ˆθ(N )_(z)o∞

N =1. It exists, since Θ is compact, and there is a subsequence

n ˆ_θ(Nk)_(z)

o∞

k=1 converging to it. Take any ˜θ ∈ ˜Θ. By the definition of the LSE we

have 1 Nt Nt X k=1 h y (xk) − η ³ xk, ˜θ ´i2 ≥ _N1 t Nt X k=1 h y (xk) − η ³ xk, ˆθ(Nt)(z) ´i2

According to (5), the left hand side converges to σ2₊°°

°ν (.) − η ³ ., ˜θ´°° ° 2 ξ while the

right hand side to σ2₊°

°ν (.) − η¡., θ#(z) ¢° ° 2 ξ, consequently θ #_{(z) ∈ ˜}_{Θ. Hence all} limit points of n ˆθ(N )_(z)o∞

N =1, for any sequence z = {zk} ∞

k=1 which satisfies (5),

are in ˜Θ.

The limit value of£s2¤(N )

then follows from (5). ✷

(9)

8

For any v ∈ Rp _{we have by the Schwarz inequality}

vTh_M³_{ξ, ˜}_θ´_{+ D} ν ³ ξ, ˜θív = hvTM³ξ, ˜θ´vi   1 + R X h η³x, ˜θ´− ν (x)iThI − P³ ˜θí hvT ∂2η(x,θ) ∂θ∂θT |_θ˜v i ξ (dx) vT_M³_{ξ, ˜}_θ´_v    ≥ hvTM³ξ, ˜θ´vi · 1 −°° °η ³ ., ˜θ´− ν (.)°° °_ξCint ³ ξ, ˜θ´ ¸ ≥ 0 hence M³ξ, ˜θ´+ Dν ³ ξ, ˜θís p.d. Evidently Mν ³ ξ, ˜θís p.s.d. hence M³ξ, ˜θ´+ σ−2_M ν ³ ξ, ˜θís also p.d.

Take a sequence z = {zk}∞_k=1such thatn ˆθ(N )(z)

o∞

N =1converges to ˜θ. According

to Theorem 1, the probability of sampling such a sequence is equal to one. Denote JN(θ, z) = _N1 PNk=1[y (xk) − η (xk, θ)]2. By the Taylor formula we have

0 =· ∂JN(θ, z) ∂θ ¸ ˆ θ(N )_(z) =· ∂JN(θ, z) ∂θ ¸ ˜ θ +· ∂ 2_J N(θ, z) ∂θ∂θT ¸ β(N ) h ˆ_θ(N ) (z) − ˜θi (6)

with β(N ) _{a point between ˆ}_θ(N )_{(z) and ˜}_{θ. We have}

−√N· 1 2 ∂JN(θ, z) ∂θ ¸ ˜ θ = √1 N N X k=1 h y (xk) − η ³ xk, ˜θ ´i· ∂η (x_k, θ) ∂θ ¸ ˜ θ = √1 N N X k=1 h εk+ ³ ν (xk) − η ³ xk, ˜θ ´´i· ∂η (x_k, θ) ∂θ ¸ ˜ θ

According to the central limit theorem, the last term converges in distribution to a random vector distributed N³0, σ2_M³_{ξ, ˜}_θ´_{+ M}

ν ³ ξ, ˜θ´´since E ½ h ε +³ν (x) − η³x, ˜θ´í· ∂η (x, θ) ∂θ ¸ ˜ θ ¾ = 0 and V ar ½_h ε +³ν (x) − η³x, ˜θ´í· ∂η (x, θ) ∂θ ¸ ˜ θ ¾ = E ½_h ε +³ν (x) − η³x, ˜θ´í2· ∂η (x, θ) ∂θ ∂η (x, θ) ∂θT ¸ ˜ θ ¾ = σ2M³ξ, ˜θ´+ Mν ³ ξ, ˜θ´ Further we have ∂2_J N(θ, z) ∂θ∂θT = 1 N N X k=1 [ν (xk) + εk− η (xk, θ)] · ∂2_{η (x} k, θ) ∂θ∂θT ¸ +1 N N X k=1 · ∂η (xk, θ) ∂θ ∂η (xk, θ) ∂θT ¸

(10)

and according to Lemma 1 this expression converges a.s. with respect to xk and

εk, and uniformly with respect to θ to

E ½ [ν (x) + ε − η (x, θ)]· ∂ 2_{η (x, θ)} ∂θ∂θT ¸¾ + E½ ∂η (x, θ) ∂θ ∂η (x, θ) ∂θT ¾ = Dν(ξ, θ) + M (ξ, θ) Consequently h∂2JN(θ,z) ∂θ∂θT i β(N ) converges a.s. to Dν ³ ξ, ˜θ´+ M³ξ, ˜θ´ since β(N ) converges a.s. to ˜θ. The required result now follows from (6) ✷

Acknowledgment. The authors thank the referee for a very careful revision of the paper.

References

[1] Bates, D.M., Watts, D.G.: Nonlinear Regression Analysis and its Applica-tions. New York, Wiley 1988.

[2] Gallant, A. R.: Nonlinear Statistical Models. New York, Wiley 1987. [3] Harville, D.A.: Matrix Algebra from a Statistician’s Perspective. New York, Springer 1997.

[4] Ivanov, A.V.: Asymptotic Theory of Nonlinear Regression. Dordrecht, Kluwer, 1997.

[5] Jennrich, R. L.: Asymptotic properties of nonlinear least squares estimation. Ann. Math. Statist., 40 (1969), p. 633-643.

[6] Kiefer, J.: Optimum design for fitting biased multivariate surfaces. In: P.R.Krishnaian, editor: Multivariate analysis p. 287-297. North Holland, Ams-terdam 1973.

[7] P´azman A.: Nonlinear Statistical Models. Dordrecht, Kluwer, 1993.

[8] P´azman A., Pronzato, L.: On the irregular behavior of LS estimators for asymptotically singular designs. Lab. I3S, CNRS Sophia Antipolis. Rapport de recherche ISRN I3S/RR-2005-12-FR, Juin 2005 (To appear in Stat. & Prob. Let-ters)

[9] Pesotchinski, L.: Optimal robust designs: linear regression in Rk_{. Annals of}

Statistics 10 (1982), p. 511-525.

[10] Pukelsheim, F.: Optimal Experimental Design. New York, Wiley 1993. [11] Welch, W. J.: A mean squared error criterion for the design of experiments. Biometrika 70 (1983), p. 205-213.