• Aucun résultat trouvé

Information bounds for inverse problems with application to deconvolution and Lévy models

N/A
N/A
Protected

Academic year: 2022

Partager "Information bounds for inverse problems with application to deconvolution and Lévy models"

Copied!
31
0
0

Texte intégral

(1)

www.imstat.org/aihp 2015, Vol. 51, No. 4, 1620–1650

DOI:10.1214/14-AIHP627

© Association des Publications de l’Institut Henri Poincaré, 2015

Information bounds for inverse problems with application to deconvolution and Lévy models

Mathias Trabs

Humboldt-Universität zu Berlin, Unter den Linden 6, 10099 Berlin, Germany. E-mail:trabs@math.hu-berlin.de Received 23 August 2013; revised 16 April 2014; accepted 6 May 2014

Abstract. If a functional in a nonparametric inverse problem can be estimated with parametric rate, then the minimax rate gives no information about the ill-posedness of the problem. To have a more precise lower bound, we study semiparametric efficiency in the sense of Hájek–Le Cam for functional estimation in regular indirect models. These are characterized as models that can be locally approximated by a linear white noise model that is described by the generalized score operator. A convolution theorem for regular indirect models is proved. This applies to a large class of statistical inverse problems, which is illustrated for the prototypical white noise and deconvolution model. It is especially useful for nonlinear models. We discuss in detail a nonlinear model of deconvolution type where a Lévy process is observed at low frequency, concluding an information bound for the estimation of linear functionals of the jump measure.

Résumé. Si une fonctionnelle dans un problème inverse non-paramétrique peut être estimée à vitesse paramétrique, alors la vitesse minimax ne donne aucune information sur le caractère mal posé du problème. Pour avoir une borne inférieure plus précise, nous étudions l’efficacité semi-paramétrique dans le sens de Hájek–Le Cam pour l’estimation fonctionnelle dans des modèles indirects réguliers. Ces derniers sont caractérisés comme modèles que l’on peut approcher localement par un modèle linéaire de bruit blanc décrit par l’opérateur de score généralisé. Un théorème de convolution pour des modèles indirects réguliers est prouvé. Ceci s’applique à une large classe de problèmes statistiques inverses, comme montré pour les modèles prototypes du bruit blanc et de la déconvolution. Il est spécialement utile pour des modèles non-linéaires. Nous discutons en détails un modèle non-linéaire de déconvolution où un processus de Lévy est observé à basse fréquence, en obtenant une borne d’information pour l’estimation de fonctionnelles linéaires de la mesure de sauts.

MSC:60G51; 60J75; 62B15; 62G20; 62M05

Keywords:Convolution theorem; Deconvolution; Lévy process; Nonlinear ill-posed inverse problem; Semiparametric efficiency; White noise model

1. Introduction

Inverse problems are a key topic in applied mathematics, in particular models with noise in the data. Typically the parameter which is the target of the statistical inference is not directly observable, but “hidden” by some operator.

While upper bounds, like convergence rates for nonparametric inverse problems, are mainly properties of the estima- tors, lower bounds reveal the deeper information theoretic structure. Instead of the (infinite dimensional) parameter itself, derived quantities are often the final object of interest. On the one hand, they might allow for inference with parametric rate, circumventing typical problems in nonparametric estimation like the choice of the bandwidth, cf.

estimating the distribution function instead of the density. In this case minimax convergence rates give no information about the ill-posedness of the problem and we need the much more precise information bounds. On the other hand, many nonparametric statistical procedures rely on basis expansions and model selection strategies, see e.g. Cavalier

(2)

et al. [7]. For these adaptive methods it is strictly necessary to assess the quality of the estimated coefficient in terms of confidence.

While inverse problems appear in many different shapes in the literature, information bounds are studied only in a few linear cases: Klaassen et al. [19,20] and Khoujmane et al. [18] considered the linear indirect regression model

Yi=(Kϑ )(Xi)+ξi, i=1, . . . , n,

where the regression function depends on the unknown parameter ϑ via the linear operator K and (X1, Y1), . . . , (Xn, Yn)are observed in presence of the additional errors ξi. Van Rooij et al. [37] derived a convolution theorem for linear indirect density estimation, where a sample of i.i.d. random variablesY1, . . . , Yn with distribution is observed. If more specifically

Yi=Xi+εi, i=1, . . . , n,

whereXihas lawϑand is corrupted by the noise variableεi,Kis a convolution operator. Efficiency for this so called deconvolution model was considered by Söhl and Trabs [31].

Using the polar decomposition or specific properties of the operators, all these studies are restricted to linear mod- els. However, in many situations the operatorK might not be linear, see e.g. Engl et al. [10] and Bissantz et al. [3].

Hence, new mathematical methods are necessary. The aim of the present paper is twofold: (a) to provide a convo- lution theorem for general inverse problems that are regular in a well specified sense and (b) to study concrete and prototypical examples of linear and nonlinear structure.

A canonical probabilistic and nonlinear inverse problem is the following. LetYi be compound Poisson distributed Yi∼eλ

k=0

λk k!ϑk

with intensityλ >0 and jump distributionϑ, writingϑkfor thek-fold convolution of the measureϑ. The distribution ofYiis a convolution exponential and therefore not linear inϑ. IfYiis more generally an increment of a Lévy process (Lt)t0, inference on the characteristic triplet of the Lévy process is a nonlinear problem since the dependence of the probability distribution of the marginals on the Lévy triplet is determined by the characteristic exponent, see the review by Reiß [29]. At the same time this model is of practical importance since Lévy processes are the main building blocks for mathematical modeling of stochastic processes. In the related context of diffusion processes, efficient estimation was recently studied by Clément et al. [8].

In view of the equivalence results by Brown and Low [4] and Nussbaum [28], the prototype of an inverse problem is to estimateϑΘX, or derived parameters, from observationsyε,ϑin the white noise model

yε,ϑ=K(ϑ )+εW˙ for a continuous operatorK:XΘY, (1.1) whereXandY are Hilbert spaces andεW˙ denotes white noise onY with noise levelε >0. For a review of estimation results in this model we refer to Cavalier [6] and references therein. Studying minimax convergence rates whenK is linear, Goldenshluger and Pereverzev [13,14] have shown that the parametric rate ε can be achieved for linear functionals ofϑwhose smoothness is not smaller than the ill-posedness of the operatorK.

Inspired by the results by van der Vaart [34], we restate the classical local asymptotic normality (LAN) theory in a way that is appropriate to capture the inverse structure of the above mentioned models. Here, the linear white noise model (1.1) serves as the local limit experiment in the sense of Le Cam [22]. This leads to the notion ofregular indirect models, meaning that the white noise model is the locally linear weak approximation of the statistical experiment. The asymptotic linear structure is described by the so calledgeneralized score operator. We derive a version of the Hájek–

Le Cam convolution theorem for the estimation of derived parameters for regular inverse problems. The tangent set is determined by the range of the generalized score operator and the efficient influence function is given by the Moore–

Penrose pseudoinverse of the adjoint score operator. Although we focus on linear functionals in the examples, the theory applies to any parameter which is differentiable in a pathwise sense.

We show that the white noise model with a (possibly) nonlinear operator, the deconvolution model as well as the Lévy model are regular indirect models and thus the convolution theorem applies. In many cases estimators are

(3)

known that have the optimal limit distribution and consequently the information bound is sharp. The analysis of Lévy processes is most challenging and the second half of this article is devoted to this model. Here, the proofs rely on estimates of the distance of infinitely divisible distributions by Liese [23] and the Fourier multiplier approach which was introduced by Nickl and Reiß [27].

We will put some stress on the Lévy model for three reasons: First, it is an important paradigm for nonlinear problems in indirect density estimation. Second, to understand from an efficiency perspective theauto-deconvolution structure of the Lévy model which was first reported by Belomestny and Reiß [1]. Third, to answer a conjecture by Nickl and Reiß [27]. Based on low frequency observations of a Lévy process, they have constructed an estimator for the (generalized) distribution function of the jump measureνand proved asymptotic normality when the parametric rate can be attained. The natural question is whether this estimator is efficient in the Hájek–Le Cam sense. Since Buchmann and Grübel [5] have constructed for a finite and known jump activity a decompounding estimator with smaller asymptotic variance, an information bound is of particular interest. With the general convolution theorem at hand, we can prove that both estimators are indeed efficient and thus prior knowledge of the jump intensity simplifies the statistical problem significantly. Concerning the information bound in the deconvolution setting, we can relax the assumptions on the functionals and the admissible error densities by van Rooij et al. [37] and the assumptions on the smoothness and decay behavior of the densities ofXi andεiby Söhl and Trabs [31] substantially. In fact, our abstract approach leads to natural assumptions in the explicit models.

This paper is organized as follows: Starting with the linear white noise model, we develop our general results in Section2. These are illustrated in the deconvolution setup in Section3. The theory will be applied to the Lévy model in Section4. While the previous sections are restricted toRd-valued functionals, we discuss the extension to general derived parameters in Section5. More technical proofs are postponed to Section6.

2. Regular indirect models

2.1. Linear white noise model

To understand the probabilistic structure of general inverse problems, we start with studying the abstract linear white noise model (1.1), whereX andY are separable real Hilbert spaces with scalar products,X and,Y, respec- tively, andK:XY is a linear and bounded operator. To avoid identifiability problems, we additionally assume that Kis injective. That is we observe for some unknownϑX

yε,ϑ, ϕY= Kϑ, ϕY+εW (ϕ)˙ for allϕY,

where(W (ϕ))˙ ϕY is an iso-normal Gaussian process with mean zero and covariance structureE[ ˙W (ϕ1)W (ϕ˙ 2)] = ϕ1, ϕ2Y forϕ1, ϕ2Y. The lawμof the white noiseW˙ is defined as symmetric (zero mean) Gaussian measure on (E,B(E))for a separable Banach spaceEin whichY can be continuously embedded and whereB(E)denotes the Borelσ-algebra onE. In other wordsW˙ is an isometry fromY intoL2(E,B(E), μ). For the construction of the so called abstract Wiener space we refer to Kuo [21], Theorem 4.1, Lemma 4.7. We denote the law ofyε,ϑbyPε,ϑ.

Basically, the linear white noise model is a Gaussian shift experiment where the parameter is hidden behind the operatorK. The inverse problem is to estimate a derived parameterχ (ϑ )from the observationyε,ϑwhenε→0. First, let us focus on a linear functionalχ (ϑ )= ζ, ϑXfor someζX. Typically,Kis injective but admits no continuous inverse, leading to an ill-posed problem, cf. Goldenshluger and Pereverzev [13,14] or Cavalier [6] for a recent review of nonparametric estimation.

Following the classical semiparametric approach, we study parametric submodels by perturbing the parameterϑin directionsbX. For anybXwe consider the submodeltPε,ϑt generated by the path[0,1)tϑt:=ϑ+t b.

The behavior of the submodel along this path is described by the following lemma.

Lemma 2.1. LetPε,xdenote the law ofyε,x=K(x)+εW˙ on(E,B(E))forxX,and an operatorK:XYwith K(0)=0,then for allϑX

dPε,x

dPε,ϑ

(yε,ϑ)=exp W˙

K(x)K(ϑ ) ε

− 1

2K(x)−K(ϑ )2

Y

Pε,ϑ-a.s.

(4)

The proof of this lemma relies on the Cameron–Martin formula for Gaussian measures on Banach spaces [9], Proposition 2.24, and is postponed to Section 6.1. Linearity of K yieldsε1(K(ϑε)K(ϑ ))=Kb and thus by Lemma2.1

log dPε,ϑε

dPε,ϑ

(yε,ϑ)= ˙W (Kb)−1

2Kb2Y Pε,ϑ-a.s. (2.1)

and therefore model (1.1) with linear operatorKsatisfies the classical LAN condition (even without local and asymp- totic) with parameterh=Kb∈ranK. To find an information bound for the derived parameterχ (ϑ )= ζ, ϑX, we express it in terms ofη=by

ψ (η):=

ζ, K1η

X=χ (ϑ ).

SinceK1is typically not continuous,ψwill not be continuous along the pathtt=η+t Kbwithout further assumptions. Supposing howeverζ ∈ranK, where ranK denotes the range of the adjoint operatorK, continuity and linearity ofψfollows from

ψ (η)=

Ky, K1η

X= y, ηY for anyyK 1

{ζ}.

In fact, we will see below that the conditionζ ∈ranKis equivalent to the regularity ofψ. IfKis not injective there are many solutionsyof the equationKy=ζ. The unique solution with minimal norm is given by the Moore–Penrose pseudoinverse

K ζ:=

K|(kerK) 1

(ζ )=Π(kerK)

K 1

{ζ} forζ∈ranK, (2.2)

whereΠ(kerK) denotes the orthogonal projection onto the orthogonal complement(kerK)of the kernel ofK, cf. [10], Definition 2.2 and Proposition 2.3, for the definition and fundamental properties of the pseudoinverse.

Given the regularity of the parameter, an LAN version of the Hájek–Le Cam convolution theorem (see [36], Theo- rem 3.11.2) yields that the variance of any regular estimator is bounded from below by

ΠranK K 1

{ζ} 2Y =K ζ2

Y ifζ ∈ranK, (2.3)

since the closure of the range ofKsatisfies ranK=(kerK).

Remark 2.2. Suppose that the operatorKis injective and compact and denote the domain ofKbydomK.ThenK is adapted to the Hilbert scale(dom(KK)α)α0generated by(KK)1and its degree of ill-posedness isα=1/2 (cf. [25]).According to [13],the parameterχ (ϑ )= ζ, ϑX can be estimated with parametric rateε if and only if ϑ∈ran((KK)1/2).Noting thatranK=ran((KK)1/2)(see[10],Proposition2.18),we recover the condition ζ ∈ranK.Since the existence of a regular estimator implies in particular thatχ (ϑ )can be estimated with rateε, this condition is natural for stating a convolution theorem.

Remark 2.3. The information bound in(2.3)is sharp.Usually,regularization methods are necessary to construct es- timators in ill-posed problems becauseKis unbounded and the observationyε,ϑmay not be in its domain.Assuming ζ ∈ranK,we can however define the estimatorχ (ϑ ):= (K)ζ, yε,ϑY which satisfies

χ (ϑ )χ (ζ )=

K ζ, Kϑ

Yζ, ϑX+εW˙ K ζ

= ζ,

KK−Id ϑ

X+εW˙

K ζN

0, ε2K ζ2

Y ,

where we usedKK=Π(kerK)=IdbecauseKis assumed to be injective.Therefore,the estimatorχ (ϑ )is efficient.

If the operator K in model (1.1) is nonlinear, the situation is more involved and a naive approach may fail as the following example illustrates. We define the Fourier transform of a functionfL1(R)L2(R)as Ff (u):=

eiuxf (x)dx.

(5)

Example 2.4. For a givenϑΘ:= {fL2(R)|f ≥0} ⊆X:=L2(R)consider the linear differential equation inf f= −f+ϑ2 with lim

t→−∞f (t )=0, (2.4)

which has the explicit solutionfϑ(t):=t

−∞e(ts)ϑ2(s)ds.The inverse problem is to estimate a linear functional χ (ϑ )= ϑ, ζX, ζX,given an observation of the solutionfϑY :=L2(R)of the previous equation corrupted by white noise.Since(2.4)is equivalent toF[ϑ2] =F[f+f] =(1−iu)Ff,the operatorK:ΘL2(R),which maps ϑto the solutionfϑ,can be written as

K(ϑ )=F1

(1−iu)1F ϑ2

(u) .

Note thatKis well defined because(1−i)1F[ϑ2]L2(1−i)1L2(R)ϑ2L2(R).We see immediately thatK is nonlinear and injective onΘ.Due to the derivative in(2.4),ϑ does not depend continuously on the datafϑ and thus the problem is ill-posed.

Following the strategy of the linear model, we introduce the direct parameter η=K(ϑ ) and write ψ (η)= ζ, K1(η)X=χ (ϑ ). Note that ψ is nonlinear in η. To study pathwise continuity of ψ, we consider the path [0,1)tηt=η+t hwith directionh=K(b), bΘ.Note thatηt∈ranKsince

ηt+ηt 1/2=

η+η+t

h+h 1/2

=

ϑ2+t b2 1/2Θ.

For some intermediate pointξ∈ [0, t]the mean value theorem yields t1

ψ (ηt)ψ (η) =t1

K1t)K1(η), ζ

X

= 1

2

η+η 1/2

h+h , ζ

X

t 1

4

ηξ+ηξ 3/2

h+h 2, ζ

X

. (2.5)

The first term is the linearizationψ˙η(h)=12+η)1/2(h+h), ζX= 12ϑ1b2, ζX where we have to impose suitable conditions onζ first to compensate the potentially nonintegrable singularities ofϑ1and second to ensure continuity inh.But even if these conditions are satisfied,pathwise continuity ofψmay fail because the integrability problems in the remainder in(2.5)are more serious becauseb4is not integrable for everybXand the singularities of(ϑ2+ξ b2)3/2are more restrictive.

What went wrong in Example2.4? Regularity of the parameterψdepends on two properties: (i) the choice ofζ and (ii) the directions and paths along which we want to show the regularity. In particular the second point has to capture the inverse structure of the problem. The approach in the following section provides a solution to both problems. It gives a clear condition onζ and it determines appropriate perturbations of the parameter, described by the tangent space.

2.2. Local linear weak approximation

Turning to a much more general model, the following definition will ensure that it behaves locally like the model (1.1) with a linear operator. LetΘbe a parameter set such that for anyϑΘthere is atangent setΘ˙ϑ that is a subset of a Hilbert space with scalar product,ϑ such that any elementb∈ ˙Θϑ is associated to a path[0, τ )tϑtΘ starting atϑand for someτ >0. For the sake of brevity we suppress the dependence of the path onbin the notation.

In the followingYnPn Y denotes weak convergence of the law ofYn under the measurePn to the law of Y for random variablesY1, Y2, . . .andY.

Definition 2.5. The sequence of statistical experiments(Xn,An, Pn,ϑ: ϑΘ)is called a locally regular indirect model atϑΘwith respect to the tangent setΘ˙ϑ if there are a Hilbert space(Hϑ,,H)and a continuous linear operator

Aϑ: linΘ˙ϑHϑ

(6)

such that for some rate rn ↓0 and for every b∈ ˙Θϑ with associated path tϑt there are random variables (Gn(h))hranAϑ satisfying

log dPn,ϑrn

dPn,ϑ

=Gn(Aϑb)−1

2Aϑb2H and

(2.6) Gn(h1), . . . , Gn(hk)Pn,ϑ

G(h1), . . . , G(hk) for allk∈N, h1, . . . , hk∈ranAϑ

for a centered Gaussian process (G(h))hranAϑ with covarianceE[G(h1)G(h2)] = h1, h2H.The operatorAϑ is called generalized score operator.

In the sequel we will use the notation Pn:= {Pn,ϑ|ϑΘ}.

The statistical interpretation of this regularity becomes clear by comparing it to the likelihood ratio (2.1) in the linear white noise model. Condition (2.6) means that locally atϑ the model(Pn,ϑrn)converges to a limit experiment which is a linear inverse problem (1.1) in white noise with operatorK=Aϑ on the Hilbert spaceHϑ and with noise level εn=rn. In other words atϑ, the model converges weakly to the linear inverse problem in the sense of Le Cam [22].

Therefore, the classical white noise model (1.1) serves as alocally linear weak approximationof the general model Pn. The difference to the classical theory is that the limit experiment is not a direct Gaussian shift experiment, but an indirect Gaussian shift, preserving the inverse structure of the problem. In that sense property (2.6) generalizes the classical local asymptotic normality, which corresponds to the identity operatorAϑ=Id, tolocal asymptotic indirect normality (LAIN).

The derived parameter χ:Θ→Rd, which is the aim of the statistical inference, should then be regular in the following sense.

Definition 2.6. The function χ:Θ→Rd, d ∈N, is pathwise differentiable atϑΘ with respect to the tangent setΘ˙ϑ if there is a continuous linear operatorχ˙ϑ: linΘ˙ϑ→Rd such that for everyb∈ ˙Θϑ with associated path [0, τ )tϑtΘit holds

1 t

χ (ϑt)χ (ϑ ) → ˙χϑb ast↓0.

By the Riesz representation theorem we can writeχ˙ϑb= χϑ, bϑfor allb∈ ˙Θϑ and some gradientχϑ∈linΘ˙ϑ. Recall that the sequence of parameter functionsψn:Pn→Rdgiven byψn(Pn,ϑ)=χ (ϑ )is called regular atϑrelative toAϑΘ˙ϑif for anyhAϑΘ˙ϑand any submodeltPn,ϑt satisfying (2.6) withh=Aϑbfor someb∈ ˙Θϑ, it holds

ψn(Pn,ϑrn)ψn(Pn,ϑ)

rn → ˙ψϑ(h) (2.7)

for some continuous linear mapψ˙ϑ:H→Rd. Again the Riesz representation theorem determines a uniqueψϑ∈ ranAϑ =linAϑΘ˙ϑ such that ψ˙ϑ(h)= ψϑ, hH for allh∈ranAϑ.ψϑ is calledefficient influence functionin the classical semiparametric theory. As the last ingredient we recall that a sequence of estimatorsTn:Xn→Rdis called regularatϑ with respect to the raternand relative to the directionsΘ˙ϑ if there is a limit distributionLon the Borel measurable space(Rd,B(Rd))such that

1 rn

Tnχ (ϑrn) Pn,ϑrnL

for everyb∈ ˙Θϑ and any corresponding submodeltPn,ϑt. We recall the definition (2.2) of the Moore–Penrose pseudoinverseKof an operatorKon its range and obtain the following convolution theorem.

(7)

Theorem 2.7. Let(Xn,An, Pn,ϑ: ϑΘ)be a locally regular indirect model atϑΘandχ:Θ→Rdbe pathwise differentiable atϑwith respect toΘ˙ϑ.Then the sequenceψn:Pn→Rd is regular atϑrelative toΘ˙ϑ if and only if each coordinate function ofχϑ=ϑ(1), . . . ,χϑ(d))is contained in the range of the adjoint score operatorAϑ:H→ linΘ˙ϑ.

In this case the efficient influence function is given byψϑ=(Aϑ)χϑ=((Aϑ)χϑ(1), . . . , (Aϑ)χϑ(d))and for any regular estimator sequenceTn:Xn→Rdthe limit distribution satisfiesL=N(0, Σ )Mfor some Borel probability distributionMand covariance matrixΣ∈Rd×dwith

Σk,l=ψϑ(k)ϑ(l)

H=

Aϑ χϑ(k),

Aϑ χϑ(l)

H, k, l∈ {1, . . . , d}. (2.8) Proof. The characterization of regular parameter functions can be proved analogously to the i.i.d. setting studied in Theorem 3.1 by van der Vaart [34]. Let us first considerd=1. For anyb∈ ˙Θϑ there is a path [0, τ )tϑt in directionbgenerating a submodeltPn,ϑt withh=Aϑb. Ifψnis regular,

˙

ψϑ(Aϑb)= lim

n→∞

ψn(Pn,ϑrn)ψn(Pn,ϑ)

rn = lim

n→∞

χ (ϑrn)χ (ϑ )

rn = χϑ, bϑ.

Since the equalityψϑ, AϑbH = ˙ψϑ(Aϑb)= χϑ, bϑ holds for allb∈ ˙Θϑ, it followsAϑψϑ=χϑ. To conclude the converse direction, we can use the previous display as definition ofψ˙ϑ and have to verify that it is indeed lin- ear and continuous. But this follows because by assumption there is some ψH such thatAϑψ=χϑ and thus ψ˙ϑ(h)= ψ, hH for allhH. Because ranAϑ=(kerAϑ), there is exactly one solution ofAϑψ=χϑ in ranAϑ

and this is given by(Aϑ)χϑ=Π(kerA

ϑ)(Aϑ)1({χϑ}). Ford >1 it is sufficient to consider the coordinate functions separately.

To conclude the second part of the theorem, we consider AϑΘ˙ϑ as local parameter set and identify the local parameterκn(Aϑb):=ψn(Pn,ϑrn)=χ (ϑrn)withκn(0):=ψn(Pn,ϑ). Thenκn is regular relative to AϑΘ˙ϑ and we can apply the convolution theorem in [36], Theorem 3.11.2. Hereby, we have to note that it is sufficient if the local parameter setAϑΘ˙ϑ is only a subset of a Hilbert space, and thus (2.6) may not hold for all linear combinations of elements inΘ˙ϑ, as long as the weak convergenceGn(h)G(h)underPn,ϑ holds true for allh∈linAϑΘ˙ϑ. The theorem implies immediately that the asymptotic covariance of every regular estimator ofχ (ϑ )is bounded from below by (2.8) in the order of nonnegative definite matrices. If χϑ is contained in the smaller range of the information operatorAϑAϑ, then the efficient influence function can be obtained by

ψϑ=

Aϑ χϑ=Aϑ

AϑAϑ

χϑ,

owing to ker(AϑAϑ)=kerAϑ. Therefore, in this case the hardest parametric subproblem is given by the direction (AϑAϑ)χϑ∈linΘ˙ϑ. In the finite dimensional linear model this lower bound coincides with the minimal variance of the Gauß–Markov theorem. Let us illustrate Theorem2.7in several examples.

Example 2.8 (Indirect regression model). WithX=Y =L2(R)and a linear,bounded,injective operatorK:XY consider the indirect regression model with deterministic design

Yi=(Kf ) i

n

+εi, i=1, . . . , n,with unkownfX

and with i.i.d.errors ε1, . . . , εnμ for some law μ. Therefore, (Y1, . . . , Yn)Pn,f =n

i=1μ((Kf )(i/n)).

Khoujmane et al. [18]have proved a convolution theorem for estimatingχ (f )= ζ, fL2(R), ζXwithζL2(R)= 1,where f and the error distributionμ are unknown.Let us focus on the submodel with standard normal errors εiN(0,1). Under smoothness conditions it is stated in this submodel ([18],Theorem 2, withδ =0), that the asymptotic distribution of any regular estimator is given by the convolutionN(0,KζL22(R))M for some Borel probability measureM.

(8)

To apply Theorem2.7,we have to check that this model is a regular indirect model.Choosing the tangent space Θ˙f =Xwith linear pathsft=f +t bin directionsb∈ ˙Θf,we calculate

log dPn,f1/n

dPn,f

(Y1, . . . , Yn)

= − 1 2n

n

i=1

(Kb)2 i

n

+ 1 n1/2

n

i=1

(Kb) i

n

Yi(Kf ) i

n

Pn,f

N

−1

2Kb2L2(R),Kb2L2(R)

.

Therefore,the generalized score operator is given byAf=K.Assuming for simplicity thatζ∈ran(KK),the asymp- totic distribution of any regular estimator is given by a convolutionN(0,K(KK)ζ2L2(R))M.Since the Cauchy–

Schwarz inequality yieldsKζL2(R)K(KK)ζL2(R)ζ2L2(R)=1,the bound by[18]achieves our information bound if and only ifKK=λIdfor someλ >0.Therefore,their information bound may not be optimal.The reason is thatf has been perturbed in directionζ instead of the least favorable direction(KK)ζ.

Example 2.9 (Nonlinear white noise model). Suppose we observeyn,ϑ=K(ϑ )+εnW˙ withεn→0asn→ ∞on the Hilbert spaceY for someϑXand for a not necessarily linear operatorK:XΘY withK(0)=0which is Gâteaux differentiable at the inner pointϑΘ.That is there is a continuous linear operatorK˙ϑ:XY with

tlim0

1 t

K(ϑ+t b)K(ϑ ) = ˙Kϑb for allbX.

By the Hilbert space structure,the tangent space can be chosen asΘ˙ϑ=Xby considering the path[0,1)tϑt:=ϑ+t bforb∈ ˙Θϑ.Lemma2.1yields for anybXwith associated pathtϑt

log dPn,ϑεn

dPn,ϑ

(yn,ϑ)= ˙W

K(ϑ+εnb)K(ϑ ) εn

− 1

2nK(ϑ+εnb)K(ϑ )2

Y.

Therefore, the LAIN property (2.6) is satisfied with generalized score operator chosen as the Gâteaux derivative Aϑ= ˙Kϑ atϑand

Gn(Aϑb)= ˙W

K(ϑ+εnb)K(ϑ ) εn

−1 2

K(ϑ+εnb)K(ϑ ) εn

2

Y

− ˙Kϑb2Y

,

whereGn(Aϑb)N(0, ˙Kϑb2Y),since the variance of the first term ofGn converge to ˙Kϑb2Y and the second term converges deterministically to zero by the Gâteaux differentiability.The convergence of the finite dimensional distributions follows likewise.

Along the pathtϑt the linear functionalχ (ϑ )= ζ, ϑXpossesses the derivative

tlim0

1 t

χ (ϑt)χ (ϑ ) = ζ, bX.

Hence,χ˙ϑb= χϑ, bXfor the gradientχϑ=ζ and allb∈ ˙Θϑ.Applying Theorem2.7shows in particular that the asymptotic variance of every regular sequence of estimatorsTn(with respect to the rateεn)of the functionalχ (ϑ )is bounded from below by

K˙ϑ ζ2

Y wheneverζ ∈ranK˙ϑ.

IfKis a linear bounded operator the score operator isAϑ= ˙Kϑ=Kand thus the statement of Theorem2.7coincides with the previous result(2.3).

In Remark2.3we saw that this information bound can be achieved ifKis linear.For nonlinear operators an upper bound is beyond the scope of this paper.

(9)

Example 2.10 (Example2.4continued). Let us come back to the ill-posed inverse problem in Example2.4related to the differential equation(2.4).The corresponding nonlinear operatorK:XY withΘ= {fL2(R)|f ≥0} ⊆X andX=Y =L2(R)was given byK(ϑ )=F1[(1−i)1F[ϑ2]].ForϑΘand anyb∈ ˙Θϑ:=Θthe functional χ (ϑ )= ζ, ϑX, ζL2(R),is pathwise differentiable along the path[0,1)→ϑt=ϑ+t bwith gradientχϑ=ζ.K is pathwise differentiable with respect to the tangent setΘ˙ϑatϑwith derivative

K˙ϑb=F1

(1−iu)1F[2bϑ](u) .

SinceK˙ is well defined onlinΘ˙ϑ =L2(R),the generalized score operatorAϑ: linΘ˙ϑH:=L2(R)is given by Aϑb= ˙Kϑb as in the previous example.The “directions”in which we perturb the direct parameterK(ϑ )are then given byAϑΘ˙Θ= {K(

ϑ b)|bX}.Applying Plancherel’s identity twice,the adjoint ofAϑcan be calculated via ˙Kϑb, h = 1

(1−iu)1F[2bϑ](u)Fh(u)du

=

b,2ϑF1

(1−iu)1Fh(u)

X,

forb, hL2(R).Therefore,Aϑh=2ϑF1[(1−iu)1Fh(u)]and regularity of the parameter function follows for anyζ ∈ranAϑ= {ϑf|fH1(R)}with the Sobolev spaceH1(R)= {fL2(R)|

(1+u2)|Ff (u)|2du <∞}.

As Example2.10indicates, the adjoint score operatorAϑhas usually no closed range. In these cases it is a difficult problem to determine the range ofAϑ. As the following characterization shows, it is sufficient to knowAϑon a dense subspace. This approximation argument will turn out to be very useful for more complex models.

Proposition 2.11. LetAϑ:H→linΘ˙ϑbe injective and letGbe a dense subspace inH.Thenχϑ∈linΘ˙ϑis contained inranAϑ if and only if the following is satisfied

(i) there exists a sequenceχn∈ranAϑ|Gsuch thatχnχϑ asn→ ∞and (ii) (Aϑ)1χnconverges to someψH.

In this caseAϑψ=χϑand thusΠranAϑψ=ψis the efficient influence function.

Proof. “if”: SinceAϑis a bounded operator, its graph g, Aϑg : gH

H×linΘ˙ϑ

is closed. Therefore, the inverse operator (Aϑ)1|ranAϑ is closed, too. Consequently, (i) and (ii) imply χϑ ∈ dom(Aϑ)1=ranAϑ with(Aϑ)1χϑ=ψ.

“only if”: Sinceχϑ∈ranAϑ there is someψH such thatAϑψ=χϑ. Moreover, there is a sequence(gn)G withgnψbecauseGis dense inH. The continuity ofAϑyields thenχn:=AϑgnAϑψ=χϑ. 2.3. I.i.d.observations

When the observations are given bynindependent and identically distributed random variablesY1, . . . , Yn, the model simplifies to the product space(Xn,An, Pϑn: ϑΘ)such that the probability measure is completely described by the family of marginal distributionsP= {Pϑ: ϑΘ}. We will rephrase the conditions of the previous section in terms of the marginal measurePϑ. This setting appears quite often in applications and, in particular, the deconvolution model and the Lévy model which we study in Sections3and4will be two examples. That is why, we will give some details for the i.i.d. case.

Recall that atangent setP˙Pϑ atPϑ is a set of score functionsgof submodels[0, τ )tPϑt starting atPϑ and for someτ >0. In the present situation the derived parameter can be written asψ (Pϑ)=χ (ϑ ), independent ofn. The classical Hajék–Le Cam convolution theorem (cf. [2], Theorem 3.3.2) applies ifψ is differentiable atPϑ relative to P˙Pϑ, that is, there exists a continuous linear mapψ˙:L2(Pϑ)→Rksuch that

tlim0

ψ (Pϑt)ψ (Pϑ)

t = ˙ψg for allg∈ ˙PPϑ.

(10)

This differentiability corresponds to the general assumption (2.7). In the i.i.d. setting local asymptotic normality follows from Hellinger regularity of the submodeltPϑt. Therefore, we can reformulate the conditions in Defini- tion2.5to the following

Assumption A. AtϑΘlet the parameter set give rise to a tangent setΘ˙ϑ.Furthermore,let there be a continuous linear operator

Aϑ: linΘ˙ϑL20(Pϑ):=

gL2(Pϑ):

gdPϑ=0

such that for everyb∈ ˙Θϑ with associated pathtϑt

dPϑ1/2

t −dPϑ1/2

t −1

2AϑbdPϑ1/2 2

→0 ast↓0. (2.9)

In (2.9) dPϑt denotes the Radon–Nikodymμ-density ofPϑt for some dominating measureμand the integration is with respect toμ. Since the integral does not depend onμ, it is suppressed in the notation.

Lemma 2.12. If the product model(Xn,An, Pϑn: ϑΘ)satisfies AssumptionAatϑΘ,then it is a locally regular indirect model at ϑΘ with respect to the tangent set Θ˙ϑ,with ratern=n1/2 and (generalized)score operatorAϑ.

Proof. The Hellinger regularity in AssumptionAimplies local asymptotic normality, since it yields (see Bickel et al.

[2], Proposition 2.1.2), n

j=1

log dPϑ1/n

dPϑ

(Xj)= 1

n n

j=1

Aϑb(Xj)−1

2AϑbL2(Pϑ)+Rn

for a remainderRn that converges inPϑn-probability to zero. Hence, the LAIN property (2.6) is satisfied with rate 1/√

n and with the score operatorAϑ mapping into the Hilbert space H=L20(Pϑ). The convergence of the finite dimensional distributions in Definition2.5follows from the Cramér–Wold device and the linearity ofAϑ. Note thatL20(Pν)is the orthogonal complement of lin 1 and thus it is a closed subspace of the Hilbert spaceL2(Pν).

The operatorAϑmaps directionsb∈ ˙Θϑinto score functions atPϑand thus it is calledscore operatorwhich explains the name given in the general case. It generates the tangent set P˙Pϑ =AϑΘ˙ϑ of the modelP atPϑ. Note that the range ofAϑis a subset of the maximal tangent set as the following example shows.

Example 2.13 (Maximal tangent set). LetP be the model of all probability measures on some sample space.The maximal tangent set of the modelP at some distributionP is given byL20(P ).This can be seen as follows:Score functions are necessarily centered and square integrable.For any scoregL20(P )a one-dimensional submodel istc(t)k(tg(x))dP (x)with aC2(R)-functionk:R→R+which satisfiesk(0)=k(0)=1and such thatk/kis bounded and with normalization constantc(t)= k(tb)L11(ν),for instance,k(y)=2/(1+e2y)(cf. [35],Example25.16).

Theorem2.7yields then the following convolution theorem, which was already obtained by van der Vaart [34].

Corollary 2.14. Suppose the product model with marginal distributionsP= {Pϑ|ϑΘ}satisfies AssumptionAand letχ:Θ→Rd be pathwise differentiable with respect toΘ˙ϑ.The mapψ:P→Rd is differentiable atPϑ relative to the tangent setP˙Pϑ =AϑΘ˙ϑ if and only if each coordinate function ofχϑ is contained in the range of the adjoint score operatorAϑ:L20(Pϑ)→linΘ˙ϑ.In this case the efficient influence function is given byψϑ=(Aϑ)χϑ.

Références

Documents relatifs

Ié : alors je pense que les premières personnes importantes c’est mes parents parce que bah quand je leur ait annoncé que j’arrêtais le collège ils auraient pu forcer à

Peptide binding is dominated by the α chain CDR3 loop (blue) with two hydrogen bonds and 32 van der Waals contacts, and β chain CDR3 and CDR1 loops (green) with a total of four

We propose that in the course of evolution these unicellular photoreceptors has duplicated and differ- entiated into at least two different cell types, photoreceptor cells and

« Un lexique ne se définit pas seulement par des mots simples et complexes, mais par des suites de mots convenues, fixées, dont le sens n’est guère prévisible […]. Ces

When measuring the ultrasound field, the signal provided by the receiving transducer is affected by its spatial properties. Particularly, the displacement normal to its surface

With this work and the toolboxes offered we hope to offer an insight on how one can use the approaches and methods of the inverse problems field in any application of simple

In particular, a numerical illustration is given for the three following models: a Gaussian AR(1) model for which our approach can be well understood, an AR(1) process with

A result on power moments of Lévy-type perpetuities and its application to the L p-convergence of Biggins’ martingales in branching Lévy processes.. ALEA : Latin American Journal