Adaptive estimation of spectral densities via wavelet thresholding and information projection

(1)

HAL Id: hal-00440424

https://hal.archives-ouvertes.fr/hal-00440424v4

Preprint submitted on 6 Jun 2011

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires

Adaptive estimation of spectral densities via wavelet thresholding and information projection

Jérémie Bigot, Rolando Biscay Lirio, Jean-Michel Loubes, Lilian Muniz Alvarez

To cite this version:

Jérémie Bigot, Rolando Biscay Lirio, Jean-Michel Loubes, Lilian Muniz Alvarez. Adaptive estimation of spectral densities via wavelet thresholding and information projection. 2010. �hal-00440424v4�

(2)

Adaptive estimation of spectral densities via wavelet thresholding and information projection

Jérémie Bigot¹, Rolando J. Biscay Lirio³, Jean-Michel Loubes¹ & Lilian Muñiz Alvarez^1,2 IMT, Université Paul Sabatier, Toulouse, France¹

Facultad de Matem´atica y Computaci´on, Universidad de La Habana, Cuba² CIMFAV-DEUV, Facultad de Ciencias, Universidad de Valparaiso, Chile³

June 6, 2011

Abstract

In this paper, we study the problem of adaptive estimation of the spectral density of a stationary Gaussian process. For this purpose, we consider a wavelet-based method which combines the ideas of wavelet approximation and estimation by information projection in order to warrants that the solution is a non-negative function. The spectral density of the process is estimated by projecting the wavelet thresholding expansion of the periodogram onto a family of exponential functions. This ensures that the spectral density estimator is a strictly positive function. The theoretical behavior of the estimator is established in terms of rate of convergence of the Kullback- Leibler discrepancy over Besov classes. We also show the excellent practical performance of the estimator in some numerical experiments.

Keywords: Spectral density estimation, adaptive estimation, wavelet thresholding, sequences of exponential families, Besov spaces.

AMS classifications: Primary 62G07; secondary 42C40, 41A29

1 Introduction

The estimation of spectral densities is a fundamental problem in inference for stationary stochastic processes. Many applications in several fields such as weather forecast and financial series are deeply related to this issue, see for instance Priestley [20]. It is known that the estimation of the covariance function of a stationary process is strongly related to the estimation of the corresponding spectral density. By Bochner’s theorem the covariance function is non-negative definite if and only if the corresponding spectral density is a non-negative function. Hence in order to preserve the property of non-negative definiteness of a covariance function, the estimation of the corresponding spectral density must be a non-negative function. The purpose of this work is to provide a non-negative estimator of the spectral density.

Inference in the spectral domain uses the periodogram of the data, providing an inconsistent estimator which must be smoothed in order to achieve consistency. For highly regular spectral densities, linear smoothing techniques such as kernel smoothing are appropriate (see Brillinger [4]).

However, these methods are not able to achieve the optimal mean-square rate of convergence for spectra whose smoothness is distributed inhomogeneously over the domain of interest. For this nonlinear methods are needed. One nonlinear method for adaptive spectral density estimation of a

(3)

stationary Gaussian sequence was proposed by Comte [7]. It is based on model selection techniques.

Others nonlinear smoothing procedures are the wavelet thresholding methods, first proposed by Donoho and Johnstone [12]. In this context, different thresholding rules have been proposed by Neumann [18] and Fryzlewicz, Nason and von Sachs [13] to name but a few.

Neumann’s approach [18] consists in pre-estimating the variance of the periodogram via kernel smoothing, so that it can be supplied to the wavelet estimation procedure. Kernel pre-estimation may not be appropriate in cases where the underlying spectral density is of low regularity. One way to avoid this problem is proposed in Fryzlewicz, Nason and von Sachs [13], where the empirical wavelet coefficient thresholds are built as appropriate local weighted l1 norms of the periodogram.

These methods do not produce non-negative spectral density estimators, therefore the corresponding estimators of the covariance function is not non-negative definite.

To overcome the drawbacks of previous estimators, in this paper we propose a new wavelet-based method for the estimation of the spectral density of a Gaussian process. As a solution to ensure non- negativeness of the spectral density estimator, our method combines the ideas of wavelet thresholding and estimation by information projection. We estimate the spectral density by a projection of the nonlinear wavelet approximation of the periodogram onto a family of exponential functions.

Therefore, the estimator is non-negative by construction. This technique was studied by Barron and Sheu [2] for the approximation of density functions by sequences of exponential families, by Loubes and Yan [16] for penalized maximum likelihood estimation withl₁ penalty, by Antoniadis and Bigot [1] for the study of Poisson inverse problems, and by Bigot and Van Bellegem [5] for log-density deconvolution.

The theoretical optimality of the estimators for the spectral density of a stationary process is generally studied using risk bounds in L₂-norm. This is the case in the papers of Neumann [18], Comte [7] and Fryzlewicz, Nason and von Sachs [13] mentioned before. In this work, the behavior of the proposed estimator is established in terms of the rate of convergence of the Kullback-Leibler discrepancy over Besov classes, which is maybe a more natural loss function for the estimation of a spectral density function than theL₂-norm. Moreover, the thresholding rules that we use to derive adaptive estimators differ from previous approaches based on wavelet decomposition and are quite simple to compute. Finally, we compare the performance of our estimator with other estimators on some simulations.

The paper is organized as follows. Section 2 presents the statistical framework under which we work. We define the model, the wavelet-based exponential family and the linear and nonlinear wavelet estimators by information projection. We also recall the definition of the Kullback-Leibler divergence and some results on Besov spaces. The rate of convergence of the proposed estimators are stated in Section 3. Some numerical experiments are described in Section 4. Technical lemmas and proofs of the main theorems are gathered in the Appendix.

Throughout this paper C denotes a constant that may vary from line to line. The notation C(.) specifies the dependency ofC on some quantities.

2 Statistical framework

2.1 The model

We aim at providing a nonparametric adaptive estimation of the spectral density which satisfies the property of being non-negative in order to guarantee that the covariance estimator is a non-negative definite function. We consider the sequence (X_t)_t_∈N that satisfies the following assumptions:

(4)

Assumption 1 The sequence(X₁, ...X_n) is ann-sample drawn from a stationary sequence of Gaus- sian random variables.

Let ρ be the covariance function of the process, i.e. ρ(h) = cov(X_t, X_t+h) with h ∈ Z. The spectral density f is defined as:

f(ω) = 1 2π

X

h∈Z

ρ(h)e^i2πωh, ω∈[0,1]. We need the following standard assumption onρ:

Assumption 2 The covariance function ρ is non-negative definite, such that there exists two constants 0< C₁, C₂ <+∞ such that P

h∈Z|ρ(h)|=C₁ and P

h∈Z

hρ²(h)=C₂.

Assumption 2 implies in particular that the spectral density f is bounded by the constant C₁. As a consequence, it is also square integrable. As in Comte [7], the data consist on a number of observations X1, ..., Xn at regularly spaced points. We want to obtain a positive estimator for the spectral density function f without parametric assumptions on the basis of these observations. For this, we combine the ideas of wavelet thresholding and estimation by information projection.

2.2 Estimation by information projection

To ensure nonnegativity of the estimator, we will look for approximations over an exponential family.

For this, we construct a sieve of exponential functions defined in a wavelet basis.

Let φ(ω) and ψ(ω), respectively, be the scaling and the wavelet functions generated by an orthonormal multiresolution decomposition of L₂([0,1]), see Mallat [17] for a detailed exposition on wavelet analysis. Throughout the paper, the functions φ and ψ are supposed to be compactly supported and such that kφk_∞ < +∞, kψk_∞ < +∞. Then, for any integer j₀ ≥ 0, any function g∈L₂([0,1]) has the following representation:

g(ω) =

2X^j⁰−1

k=0

hg, φ_j₀_,kiφ_j₀_,k(ω) +

+∞

X

j=j0

2X^j−1

k=0

hg, ψ_j,kiψ_j,k(ω),

where φ_j₀_,k(ω) = 2^j²⁰φ 2^j⁰ω−k

and ψ_j,k(ω) = 2^j²ψ 2^jω−k

. The main idea of this paper is to expand the spectral density f onto this wavelet basis and to find an estimator of this expansion that is then modified to impose the positivity property. The scaling and wavelet coefficients of the spectral density function f are denoted by a_j₀_,k =hf, φ_j₀_,ki and b_j,k =hf, ψ_j,ki.

To simplify the notations, we write (ψ_j,k)_j=j

0−1 for the scaling functions (φ_j,k)_j=j

0. Letj₁ ≥j₀ and define the set

Λ_j₁ =

(j, k) :j₀−1≤j < j₁,0≤k≤2^j −1 .

Note that #Λ_j₁ = 2^j¹, where #Λ_j₁ denotes the cardinal of Λ_j₁. Letθdenotes a vector inR^#Λ^j1, the wavelet-based exponential familyEj1 at scalej₁ is defined as the set of functions:

Ej1 =



f_j₁_,θ(.) = exp



 X

(j,k)∈Λj1

θ_j,kψ_j,k(.)



, θ= (θ_j,k)_(j,k)_∈_Λ

j1 ∈R^#Λ^j1



. (2.1)

(5)

It is well known that Besov spaces for periodic functions in L₂([0,1]) can be characterized in terms of wavelet coefficients (see e.g. Mallat [17]). Assume thatψhasmvanishing moments, and let 0< s < mdenote the usual smoothness parameter. Then, for a Besov ball B^s_p,q(A) of radiusA >0 with 1≤p, q≤ ∞, one has that fors^∗ =s+ 1/2−1/p ≥0:

B_p,q^s (A) :=









g∈L2([0,1] :kgk^s,p,q :=





2X^j⁰−1

k=0

|a_j₀_,k|^p





1 p

+



 X∞ j=j0

2^js^∗^q





2X^j−1

k=0

|b_j,k|^p





q p



1 q

≤A







 ,

with the respective above sums replaced by maximum ifp=∞or q=∞and wherea_j₀_,k =hg, φ_j₀_,ki and b_j,k =hg, ψ_j,ki.

The condition thats+ 1/2−1/p ≥0 is imposed to ensure thatB_p,q^s (A) is a subspace ofL₂([0,1]), and we shall restrict ourselves to this case in this paper (although not always stated, it is clear that all our results hold fors < m).

Let M >0 and denote byF_p,q^s (M) the set of functions such that F_p,q^s (M) ={f = exp (g) : kgks,p,q ≤M},

where kgks,p,q denotes the norm in the Besov space B_p,q^s . Note that assuming that f ∈ F_p,q^s (M) implies thatf is strictly positive. The following results hold.

Lemma 2.1 Suppose that f ∈F_p,q^s (M) withs > ¹_p and1≤p≤2. Then, there exists a constant M₁ such that for all f ∈F_p,q^s (M), 0< M₁⁻¹ ≤f ≤M₁ <+∞.

LetV_j denote the usual multiresolution space at scalejspanned by the scaling functions (φ_j,k)₀_≤_k_≤₂j−1, and defineA_j <+∞as the constant such thatkυk_∞≤A_jkυkL2 for allυ∈V_j. For f ∈F_p,q^s (M), let g= log (f). Then forj≥j₀−1, defineD_j =kg−g_jk_L₂ andγ_j =kg−g_jk_∞,whereg_j =²

j−1

P

k=0

θ_j,kψ_j,k, withθ_j,k =hg, ψ_j,ki.

The proof of the following lemma immediately follows from the arguments in the proof of Lemma A.5 in Antoniadis and Bigot [1].

Lemma 2.2 Let j ∈ N. Then A_j ≤ C2^j/2. Suppose that f ∈ F_p,q^s (M) with 1 ≤ p ≤ 2 and s > ¹_p. Then, uniformly over F_p,q^s (M), D_j ≤C2⁻^j(s+1/2⁻^1/p) and γ_j ≤C2⁻^j(s⁻^1/p) where C denotes constants depending only on M, s, p and q.

To assess the quality of the estimators, we will measure the discrepancy between an estimator fb and the true function f in the sense of relative entropy (Kullback-Leibler divergence) defined by:

∆ f;fb

= Z 1

0

flog

f fb

−f+fb

dµ,

whereµdenotes the Lebesgue measure on [0,1]. It can be shown that ∆ f;fb

is non-negative and equals zero if and only iffb=f.

We will enforce our estimator of the spectral density to belong to the family Ej1 of exponential functions, which are positive by definition. For this we will consider a notion of projection using information projection.

(6)

The estimation of density function based on information projection has been introduced by Barron and Sheu [2]. To apply this method in our context, we recall for completeness a set of results that are useful to prove the existence of our estimators. The proofs of the following lemmas immediately follow from results in Barron and Sheu [2] and Antoniadis and Bigot [1].

Lemma 2.3 Let β ∈R^#Λ^j1. Assume that there exists someθ(β)∈R^#Λ^j1 such that, for all (j, k)∈ Λ_j₁, θ(β) is a solution of

f_j,θ(β), ψ_j,k

=β_j,k.

Then for any function f such that hf, ψ_j,ki = β_j,k for all (j, k) ∈ Λ_j₁, and for all θ ∈ R^#Λ^j1, the following Pythagorian-like identity holds:

∆ (f;f_j,θ) = ∆ f;f_j,θ(β)

+ ∆ f_j,θ(β);f_j,θ

. (2.2)

The next lemma is a key result which gives sufficient conditions for the existence of the vector θ(β) as defined in Lemma 2.3. This lemma also relates distances between the functions in the exponential family to distances between the corresponding wavelet coefficients. Its proof relies upon a series of lemmas on bounds within exponential families for the Kullback-Leibler divergence and can be found in Barron and Sheu [2] and Antoniadis and Bigot [1].

Lemma 2.4 Let θ₀ ∈ R^#Λ^j1, β₀ = β_0,(j,k)

(j,k)∈Λj1 ∈ R^#Λ^j1 such that β_0,(j,k) = hf_j,θ₀, ψ_j,ki for all (j, k) ∈ Λ_j₁, and βe ∈ R^#Λ^j1 a given vector. Let b = exp klog (f_j,θ₀)k_∞

and e = exp(1). If

eβ−β0

2≤ 2ebA¹j1 then the solution θ βe

of hf_j₁_,θ, ψ_j,ki=βe_j,k for all (j, k)∈Λ_j₁ exists and satisfies

θ

βe

−θ₀

2≤2eb eβ−β₀ 2

log f_j₁_,θ(β₀₎ f_j

1,θ(β^e)

!

∞

≤2ebA_j₁ eβ−β₀₂

∆

f_j₁_,θ(β₀₎;f_j

1,θ(^β^e)

≤2eb eβ−β₀²

2,

where kβk2 denotes the standard Euclidean norm forβ ∈R^#Λ^j1.

Following Csisz´ar [8], it is possible to define the projection of a function f onto Ej1. If this projection exists, it is defined as the function f_j₁_,θ^∗_j

1 in the exponential family E^j1 that is the closest to the true function f in the Kullback-Leibler sense, and is characterized as the unique function in the familyEj1 for which

Df_j₁_,θ^∗_j

1, ψ_j,kE

=hf, ψ_j,ki:=β_j,k for all (j, k)∈Λj1.

Note that the notation β_j,k is used to denote both the the scaling coefficients a_j₀_,k and the wavelet coefficients b_j,k.

(7)

Let

I_n(ω) = 1 2πn

Xn t=1

Xn t^′=1

X_t−X

X_t^′−X_∗

e^i2πω(t⁻^t^′⁾,

be the classical periodogram, where X_t−X_∗

denotes the conjugate transpose of X_t−X and X = _n¹ Pⁿ

t=1

X_t. The expansion of I_n(ω) onto the wavelet basis allows to obtain estimators of a_j₀_,k and b_j,k given by

b a_j₀_,k =

Z1

0

I_n(ω)φ_j₀_,k(ω)dω and bb_j,k = Z1

0

I_n(ω)ψ_j,k(ω)dω. (2.3)

It seems therefore natural to estimate the functionf by searching for some bθ_n∈R^#Λ^j1 such that Df_j

1,θbn, ψ_j,kE

= Z1

0

I_n(ω)ψ_j,k(ω)dω:=βb_j,k for all (j, k)∈Λ_j₁, (2.4)

where βb_j,k denotes both the estimation of the scaling coefficients ba_j₀_,k and the wavelet coefficients bb_j,k. The function f_j

1,θbn is the spectral density positive linear estimator.

Similarly, the positive nonlinear estimator with hard thresholding is defined as the functionf^HT

j1,θbn,ξ

(with bθ_n∈R^#Λ^j1) such that Df^HT

j1,θbn,ξ, ψ_j,kE

=δ_ξ βb_j,k

for all (j, k)∈Λ_j₁, (2.5)

whereδ_ξ denotes the hard thresholding rule defined by δ_ξ(x) =xI(|x| ≥ξ) forx∈R,

whereξ >0 is an appropriate threshold whose choice is discussed later on.

The existence of these estimators is questionable. Thus, in the next sections, some sufficient conditions are given for the existence off_j

1,bθnandf^HT

j1,θbn,ξwith probability tending to one asn→+∞. Even if an explicit expression forθb_nis not available, we use a numerical approximation ofθb_n, obtained via a gradient-descent algorithm with an adaptive step.

In this section we establish the rate of convergence of our estimators in terms of the Kullback- Leibler discrepancy over Besov classes.

We make the following assumption on the wavelet basis that guarantees that Assumption 2 holds uniformly over F_p,q^s (M).

Assumption 3 Let M > 0, 1 ≤ p ≤ 2 and s > 1/p. For f ∈ F_p,q^s (M) and h ∈ Z, let ρ(h) = R1

0 f(ω)e⁻^i2πωhdω, C₁(f) := P

h∈Z

|ρ(h)| and C₂(f) := P

h∈Z

hρ²(h). Then, the wavelet basis is such that there exists a constant M_∗ such that for all f ∈F_p,q^s (M),

C₁(f)≤M_∗ and C₂(f)≤M_∗.

(8)

2.3 Linear estimation

The following theorem is the general result on the linear information projection estimator of the spectral density function. Note that the choice of the coarse level resolution level j₀ is of minor importance, and without loss of generality we take j0= 0 for the linear estimator f_j

1,θbn.

Theorem 2.5 Assume that f ∈F_2,2^s (M) with s > ¹₂ and suppose that Assumptions 1, 2 and 3 are satisfied. Define j1 = j1(n) as the largest integer such that 2^j¹ ≤ n^2s+1¹ . Then, with probability tending to one as n→+∞, the information projection estimator (2.4) exists and satisfies:

∆ f;f_j

1(n),bθn

=Op

n⁻^2s+1^2s .

Moreover, the convergence is uniform over the class F_2,2^s (M) in the sense that

Klim→+∞ lim

n→+∞ sup

f∈F_2,2^s (M)

P

n^2s+1^2s ∆ f;f_j

1(n),θbn

> K

= 0.

This theorem provides the existence with probability tending to one of a linear estimator for the spectral densityf given by f_j

1(n),θbj1(n). This estimator is strictly positive by construction. Therefore the corresponding estimator of the covariance function ρb^L (which is obtained as the inverse Fourier transform off_j

1(n),bθn) is a positive definite function by Bochner’s theorem. Henceρb^L is a covariance function.

In the related problem of density estimation from an i.d.d. sample, Koo [15] has shown that, for the Kullback-Leibler divergence, n⁻^2s+1^2s is the fastest rate of convergence for the problem of estimating a densityf such that log(f) belongs to the spaceB_2,2^s (M). For spectral densities belonging to a general Besov ball B_p,q^s (M), Newman [18] has also shown that n⁻^2s+1^2s is an optimal rate of convergence for the L₂ risk. For the Kullback-Leibler divergence, we conjecture that n⁻^2s+1^2s is the minimax rate of convergence for spectral densities belonging to F_2,2^s (M).

However, the result obtained in the above theorem is nonadaptive because the selection of j₁(n) depends on the unknown smoothnesssoff. Moreover, the result is only suited for smooth functions (as F_2,2^s (M) corresponds to a Sobolev space of order s) and does not attain an optimal rate of convergence when for exampleg= log(f) has singularities. We therefore propose in the next section an adaptive estimator derived by applying an appropriate nonlinear thresholding procedure.

2.4 Adaptive estimation 2.4.1 The bound on f is known

In adaptive estimation, we need to define an appropriate thresholding rule for the wavelet coefficients of the periodogram. This threshold is level-dependent and in this paper will take the form

ξ=ξj,n= 2

"

2kfk_∞

rδlogn

n + 2^j²kψk_∞δlogn n

! + C_∗

√n

#

, (2.6)

whereδ≥0 is a tuning parameter whose choice will be discussed later on andC_∗ =

qC2+39C₁² 4π² . The following theorem states that the relative entropy between the true f and its nonlinear estimator achieves in probability the conjectured optimal rate of convergence up to a logarithmic factor over a wide range of Besov balls.

(9)

Theorem 2.6 Assume that f ∈ F_p,q^s (M) with s > ¹₂ + ¹_p and 1 ≤ p ≤ 2. Suppose also that Assumptions 1, 2, 3 hold. For anyn >1, define j₀ =j₀(n) to be the integer such that 2^j⁰ ≥logn≥ 2^j⁰⁻¹, and j1 = j1(n) to be the integer such that 2^j¹ ≥ _logⁿ_n ≥2^j¹⁻¹. For δ ≥6, take the threshold ξ_j,n as in (2.6). Then, the thresholding estimator (2.5) exists with probability tending to one when n→+∞ and satisfies:

∆ f;f^HT

j0(n),j1(n),θbn,ξj,n

=O^p n

logn

₋_2s+1^2s ! .

Note that the choices of j₀,j₁ and ξ_j,n are independent of the parameter s; hence the estimator f^HT

j0(n),j1(n),θbn,ξj,n is an adaptive estimator which attains in probability what we claim is the optimal rate of convergence, up to a logarithmic factor. In particular,f^HT

j0(n),j1(n),θbn,ξj,nis adaptive onF_2,2^s (M).

This theorem provides the existence with probability tending to one of a nonlinear estimator for the spectral density. This estimator is strictly positive by construction. Therefore the corresponding estimator of the covariance function ρb^{N L} (which is obtained as the inverse Fourier transform of f^HT

j0(n),j1(n),θbn,ξj,n) is a positive definite function by Bochner theorem. Hence ρb^{N L} is a covariance function.

2.4.2 Estimating the bound on f

Although the results of Theorem 2.6 are certainly of some theoretical interest, they are not helpful for practical applications. The (deterministic) threshold ξj,n depends on the unknown quantitieskfk_∞ and C_∗ := C(C₁, C₂), where C₁ and C₂ are unknown constants. To make the method applicable, it is necessary to find some completely data-driven rule for the threshold, which works well over a range as wide as possible of smoothness classes. In this subsection, we give an extension that leads to consider a random threshold which no longer depends on the bound on f neither on C_∗. For this let us consider the dyadic partitions of [0,1] given byIn=

j/2^Jⁿ,(j+ 1)/2^Jⁿ

, j= 0, ...,2^Jⁿ−1 . Given some positive integer r, we definePnas the space of piecewise polynomials of degreer on the dyadic partition In of step 2⁻^Jⁿ. The dimension of Pn depends on n and is denoted by N_n. Note thatN_n= (r+ 1) 2^Jⁿ. This family is regular in the sense that the partitionInhas equispaced knots.

An estimator of kfk_∞ is constructed as proposed by Birg´e and Massart [6] in the following way. We take the infinite norm offb_n, wherefb_ndenotes the (empirical) orthogonal projection of the periodogramI_n onPn. We denote byf_n theL₂-orthogonal projection off on the same space. Then the following theorem holds.

Theorem 2.7 Assume that f ∈ F_p,q^s (M) with s > ¹₂ + ¹_p and 1 ≤ p ≤ 2. Suppose also that Assumptions 1, 2 and 3 hold. For any n > 1, let j0 =j0(n) be the integer such that 2^j⁰ ≥logn≥ 2^j⁰⁻¹, and let j₁ =j₁(n) be the integer such that 2^j¹ ≥ _logⁿ_n ≥2^j¹⁻¹. Take the constants δ = 6 and b∈₃

4,1

, and define the threshold ξb_j,n= 2

"

2 bf_n

∞

s δ (1−b)²

logn

n + 2²^jkψk_∞ δ (1−b)²

logn n

! +

rlogn n

#

. (2.7)

Then, if kf −f_nk_∞ ≤ ¹₄kfk_∞ and N_n ≤ _(r+1)^κ 2 n

logn, where κ is a numerical constant and r is the degree of the polynomials, the thresholding estimator (2.5) exists with probability tending to one as

(10)

n→+∞ and satisfies

∆ f;f^HT

j0(n),j1(n),θbn,ξbj,n

=O^p n logn

₋_2s+1^2s ! .

Note that, we finally obtain a fully tractable estimator of f which reaches the optimal rate of convergence without prior knowledge of the regularity of the spectral density, but also which gets rise to a real covariance estimator.

Remark 2.8 We point out that, in Comte [7] the conditionkf−f_nk_∞≤ ¹₄kfk_∞is assumed. Under some regularity conditions on f, results from approximation theory entails that this condition is met.

Indeed for f ∈B_p,^s_∞, with s > ¹_p, we know from DeVore and Lorentz [11] that

kf−fnk_∞≤C(s)|f|s,pN⁻

s−¹p

n ,

with|f|s,p= sup

y>0

y⁻^sw_d(f, y)_p<+∞, where w_d(f, y)_p is the modulus of smoothness andd= [s] + 1.

Therefore kf−f_nk_∞ ≤ ¹₄kfk_∞ if N_n ≥

4C(s) ^|^f^|^s,p

kfk^∞

_s−1¹

p := C(f, s, p), where C(f, s, p) is a constant depending on f,s and p.

3 Numerical experiments

In this section we present some numerical experiments which support the claims made in the theoretical part of this paper. The programs for our simulations were implemented using the MATLAB programming environment. We simulate a time series which is a superposition of an ARMA(2,2) process and a Gaussian white noise:

X_t=Y_t+c_oZ_t, (3.1)

whereY_t+a₁Y_t₋₁+a₂Y_t₋₂=b₀ε_t+b₁ε_t₋₁+b₂ε_t₋₂, and{ε_t},{Z_t}are independent Gaussian white noise processes with unit variance. The constants were chosen asa₁ = 0.2,a₂ = 0.9,b₀ = 1, b₁ = 0, b₂ = 1 andc₀= 0.5. We generated a sample of sizen= 1024 according to (3.1). The spectral density f of (X_t) is shown in Figure 1. It has two moderately sharp peaks and is smooth in the rest of the domain.

Starting from the periodogram we considered the Symmlet 8 basis, i.e. the least asymmetric, compactly supported wavelets which are described in Daubechies [9]. We choose j₀ and j₁ as in the hypothesis of Theorem 2.7 and left the coefficients assigned to the father wavelets unthresholded.

Hard thresholding is performed using the thresholdξb_j,nas in (2.7) for the levelsj=j₀, ..., j₁, and the empirical coefficients from the higher resolution scalesj > j₁ are set to zero. This gives the estimate

f_j^HT₀_,j₁_,ξ_j,n =

2X^j⁰−1

k=0

b

a_j₀_,kφ_j₀_,k+

j1

X

j=j0

2X^j−1

k=0

bb_j,kIbb_j,k> ξ_j,n

ψ_j,k, (3.2)

which is obtained by simply thresholding the wavelet coefficients (2.3) of the periodogram. Note that such an estimator is not guaranteed to be strictly positive in the interval [0,1]. However, we use it

(11)

to built our strictly positive estimatorf^HT

j0,j1,θbn,ξbj,n(see (2.5) to recall its definition). We want to find θbn such that

Df^HT

j0,j1,bθn,ξbj,n, ψ_j,kE

=δ_ξ_b

j,n

βb_j,k

for all (j, k)∈Λ_j₁ For this, we take

θb_n= arg min

θ∈R^#Λ^j1

X

(j,k)∈Λj1

hf_j₀_,j₁_,θ, ψ_j,ki −δ_ξ_b

j,n

βb_j,k2

,

where f_j₀_,j₁_,θ(.) = exp P

(j,k)∈Λj1

θ_j,kψ_j,k(.)

!

∈ E^j1 and E^j1 is the family (2.1). To solve this opti- mization problem we used a gradient descent method with an adaptive step, taking as initial value

θ₀=

log f^HT

j0,j1,ξbj,n

+

, ψ_j,k

, where

f^HT

j0,j1,ξbj,n

(ω)

+

:= max f^HT

j0,j1,ξbj,n

(ω), η

for allω ∈[0,1] andη >0 is a small constant.

In Figure 1 we display the unconstrained estimatorf_j^HT₀_,j₁_,ξ

j,nas in (3.2), obtained by thresholding of the wavelet coefficients of the periodogram, together with the estimator f^HT

j0,j1,θbn,ξbj,n, which is strictly positive by construction. Note that these wavelet estimators capture well the peaks and look fairly good on the smooth part too.

We compared our method with the spectral density estimator proposed by Comte [7], which is based on a model selection procedure. As an example, in Comte [7], the author study the behavior of such estimators using a collection of nested models (S_m), withm= 1, ...,100, whereS_mis the space of piecewise constant functions, generated by a histogram basis on [0,1] of dimensionmwith equispaced knots (see Comte [7] for further details). In Figure 2 we show the result of this comparison. Note that our method better captures the peaks of the true spectral density.

Acknowledgments

This work was supported in part by Egide, under the Program of Eiffel excellency Phd grants, as well as by the BDI CNRS grant.

4 Appendix

Throughout all the proofs, C denotes a generic constant whose value may change from line to line.

4.1 Technical results for the empirical estimators of the wavelet coefficients Lemma 4.1 Let n ≥1, β_j,k := hf, ψ_j,ki and βb_j,k := hIn, ψ_j,ki for j ≥ j0−1 and 0 ≤ k ≤ 2^j −1.

Suppose that Assumptions 1, 2 and 3 hold. Then, Bias² βb_j,k

:=

E βb_j,k

−β_j,k2

≤ ^C_n^∗² with

(12)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−0.2 0 0.2 0.4 0.6 0.8 1 1.2

True spectral density

Estimation by Wavelet Thresholding Final positive estimator

Figure 1: True spectral densityf, wavelet thresholding estimatorf^HT

j0,j1,bξj,nand final positive estimator f^HT

j0,j1,θbn,ξbj,n.

C_∗ =

qC2+39C²₁

4π² , and V ar βb_j,k

:= E

βb_j,k−E

βb_j,k2

≤ ^C_n for some constantC >0. Moreover, there exists a constant M₂ >0 such that for all f ∈F_p,q^s (M) with s > ¹_p and 1≤p≤2,

E

βb_j,k −β_j,k2

=Bias² βb_j,k

+V ar βb_j,k

≤ M2

n . Proof. Note thatBias²

βb_j,k

≤ kf −E(I_n)k²L2.Using Proposition 1 in Comte [7], Assumptions 1 and 2 imply thatkf−E(I_n)k²L2 ≤ ^C²_4π^+39C²_n ¹²,which gives the result for the bias term. To bound the variance term, remark that

V ar βb_j,k

=EhI_n−E(I_n), ψ_j,ki² ≤EkI_n−E(I_n)k²L2kψ_j,kk²L2 = Z 1

0

E|I_n(ω)−E(I_n(ω))|²dω.

Then, under Assumptions 1 and 2, it follows that there exists an absolute constant C > 0 such that for all ω ∈[0,1], E|I_n(ω)−E(I_n(ω))|² ≤ ^C_n. To complete the proof it remains to remark that Assumption 3 implies that these bounds for the bias and the variance hold uniformly over F_p,q^s (M).

(13)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0

0.2 0.4 0.6 0.8 1 1.2 1.4

True spectral density Final positive estimator Regular histogram estimation

Figure 2: True spectral density f, final positive estimator f^HT

j0,j1,bθn,bξj,n

and estimator via model selection using regular histograms.

Lemma 4.2 Let n≥1, b_j,k :=hf, ψ_j,ki andbb_j,k :=hI_n, ψ_j,ki forj ≥j₀ and0≤k≤2^j−1. Suppose that Assumptions 1 and 2 hold. Then for any x >0,

P

|bb_j,k−b_j,k|>2kfk∞

rx

n+ 2^j/2kψk∞

x n

+ C_∗

√n

≤2e⁻^x, where C_∗=

qC2+39C₁² 4π² . Proof. Note that

bb_j,k = 1 2πn

Xn t=1

Xn t^′=1

X_t−X

X_t^′ −X∗

Z1

0

e^i2πω(t⁻^t^′⁾ψ_j,k(ω)dω= 1

2πnX^TT_n(ψ_j,k)X^∗, where X = X₁−X, ..., X_n−XT

, X^T denotes the transpose of X and T_n(ψ_j,k) is the Toeplitz matrix with entries [T_n(ψ_j,k)]_t,t′ =

R1 0

e^i2πω(t⁻^t^′⁾ψ_j,k(ω)dω, 1 ≤ t, t^′ ≤ n. We can assume without loss of generality that E(X_t) = 0, and then under under Assumptions 1 and 2, X is a centered

(14)

Gaussian vector in Rⁿ with covariance matrix Σ = Tn(f). Using the decomposition X = Σ¹²ε, where ε ∼ N(0, I_n), it follows thatbb_j,k = _2πn¹ ε^TA_j,kε, with A_j,k = Σ¹²T_n(ψ_j,k)Σ¹². Note also that E

bb_j,k

= _2πn¹ tr(A_j,k), where tr(A) denotes the trace of a matrix A.

Now let s₁, . . . , s_n be the eigenvalues of the Hermitian matrixA_j,k with |s₁| ≥ |s₂| ≥. . .≥ |s_n| and letZ = 2πn

bb_j,k−E bb_j,k

=ε^TA_j,kε−tr(A_j,k). Then, for 0< λ <(2|s₁|)⁻¹ one has that

log E

e^λZ

= Xn i=1

−λs_i− 1

2log (1−2λs_i) = Xn

i=1 +∞

X

ℓ=2

1

2ℓ(2s_iλ)^ℓ ≤ Xn i=1

+∞

X

ℓ=2

1

2ℓ(2|s_i|λ)^ℓ

≤ Xn i=1

−λ|si| − 1

2log (1−2λ|si|), where we have used the fact that −log(1−x) = P+∞

ℓ=1 x^ℓ

ℓ for x < 1. Then using the inequality

−u−¹₂log (1−2u)≤ ₁₋^u²_2u that holds for all 0< u < ¹₂, the above inequality implies that log

E e^λZ

≤ Xn i=1

λ²|s_i|²

1−2λ|s_i| ≤ λ²ksk² 1−2λ|s₁|, whereksk² =Pn

i=1|s_i|². Arguing as in Birg´e and Massart [6], the above inequality implies that for any x >0,P(|Z|>2|s₁|x+ 2ksk√

x)≤2e⁻^x, which implies P

|bb_j,k−E bb_j,k

|>2|s₁|x

n+ 2ksk n

√x

≤2e⁻^x. (4.1)

Let τ(A) denotes the spectral radius of a matrix A. For the Toeplitz matrices Σ = T_n(f) and Tn(ψ_j,k) one has that τ(Σ) ≤ kfk∞ and τ(Tn(ψ_j,k)) ≤ kψ_j,kk∞ = 2^j/2kψk∞. These inequalities imply that

|s₁|=τ

Σ¹²T_n(ψ_j,k)Σ¹²

≤τ(Σ)τ(T_n(ψ_j,k))≤ kfk∞2^j/2kψk∞. (4.2) Letλi,i= 1., , , .n, be the eigenvalues ofTn(ψ_j,k). From Lemma 3.1 in Davies [10], we have that

lim sup

n→+∞

1

ntr T_n(ψ_j,k)²

= lim sup

n→+∞

1 n

Xn i=1

λ²_i = Z1

0

ψ_j,k² (ω)dω= 1,

which implies that ksk² =

Xn i=1

|s_i|² =tr A²_j,k

=tr

(ΣT_n(ψ_j,k))²

≤τ(Σ)²tr T_n(ψ_j,k)²

≤ kfk²_∞n, (4.3) where we have used the inequality tr (AB)²

≤τ(A)²tr B²

that holds for any pair of Hermitian matrices A, B. Combining (4.1), (4.2) and (4.3), we finally obtain that for any x >0

P

|bb_j,k−E bb_j,k

|>2kfk∞

rx

n+ 2^j/2kψk∞x n

≤2e⁻^x. (4.4)

(15)

Now, let ξ_j,n= 2kfk∞ p_x

n+ 2^j/2kψk∞x n

+√^C^∗

n, and note that Pbb_j,k−b_j,k> ξ_j,n

≤Pbb_j,k−E

bb_j,k> ξ_j,n −E bb_j,k

−b_j,k , By Lemma 4.1, one has that E

bb_j,k

−b_j,k ≤ ^√^C^∗_n, and thus ξ_j,n −E bb_j,k

−b_j,k ≥ ξ_j,n− ^√^C^∗_n which implies using (4.4) that

Pbb_j,k−b_j,k> ξ_j,n

≤P

bb_j,k−E

bb_j,k> ξ_j,n− C_∗

√n

≤2e⁻^x, which completes the proof of Lemma 4.2.

Lemma 4.3 Assume that f ∈F_p,q^s (M) with s > ¹₂ + ¹_p and 1 ≤p ≤2. Suppose that Assumptions 1, 2 and 3 hold. For any n >1, define j₀ =j₀(n) to be the integer such that 2^j⁰ > logn ≥2^j⁰⁻¹, and j₁ = j₁(n) to be the integer such that 2^j¹ ≥ _logⁿ_n ≥ 2^j¹⁻¹. For δ ≥ 6, take the threshold ξ_j,n= 2

2kfk_∞

qδlogn

n + 2^j² kψk_∞^δ^log_nⁿ

+√^C^∗ n

as in (2.6), where C_∗ =

qC2+39C₁²

4π² . Let β_j,k :=

hf, ψ_j,ki and βb_ξ_j,n_,(j,k) := δ_ξ_j,n βb_j,k

with (j, k) ∈ Λ_j₁ as in (2.5). Take β = (β_j,k)_(j,k)_∈_Λ

j1 and βb_ξ_j,n =

βb_ξ_j,n_,(j,k)

(j,k)∈Λj1

. Then there exists a constant M₃ >0 such that for all sufficiently large n:

E

β−βb_ξ_j,n²

2 :=E



 X

(j,k)∈Λj1

β_j,k−δ_ξ_j,n βb_j,k²



≤M₃ n

logn ₋_2s+1^2s

uniformly over F_p,q^s (M).

Proof. Taking into account that E

β−βb_ξ_j,n²

2 =

2X^j⁰−1

k=0

E(a_j₀_,k−ba_j₀_,k)²+

j1

X

j=j0

2X^j−1

k=0

E

b_j,k−bb_j,k2

Ibb_j,k> ξ_j,n

+

j1

X

j=j0

2X^j−1

k=0

b²_j,kPbb_j,k≤ξ_j,n

:=T₁+T₂+T₃, (4.5)

we are interested in bounding these three terms. The bound forT₁ follows from Lemma 4.1 and the fact that j₀ = log₂(logn)≤ _2s+1¹ log₂(n):

T1 =

2X^j⁰−1

k=0

E(a_j₀_,k−ba_j₀_,k)² =O 2^j⁰

n

≤O

n⁻^2s+1^2s

. (4.6)

To bound T₂ and T₃ we proceed as follows. Write T2 =

j1

X

j=j0

2X^j−1

k=0

E

b_j,k−bb_j,k2

Ibb_j,k> ξj,n,|b_j,k|> ξ_j,n 2

+Ibb_j,k> ξj,n,|b_j,k| ≤ ξ_j,n 2