5.2.1 The wavelet setting
The wavelet setting involves two functionsφandψ in L2(R) and their Fourier transforms φ(ξ)b def=
Z ∞
−∞
φ(t)e−iξtdt and ψ(ξ)b def= Z ∞
−∞
ψ(t)e−iξtdt . (5.2) Let us assume that conditions (W-1)-(W-4) hold. Condition (W-2) ensures that the Fourier transformψbdecreases quickly to zero. Condition (W-3) ensures that ψ oscillates and that its scalar product with continuous-time polynomials up to degreeM−1 vanishes.
It is equivalent to asserting that the firstM−1 derivatives of ψbvanish at the origin and hence
|ψ(λ)b |=O(|λ|M), asλ→0. (5.3) Daubechies wavelets (withM ≥2) and the Coiflets satisfy these conditions, see Moulines et al. [2007a]. Viewing the wavelet ψ(t) as a basic template, define the family {ψj,k, j ∈ Z, k∈Z}of translated and dilated functions
ψj,k(t) = 2−j/2ψ(2−jt−k), j∈Z, k∈Z. (5.4)
Positive values ofktranslateψ to the right, negative values to the left. Thescale index j dilatesψso that large values ofjcorrespond to coarse scales and hence to low frequencies.
We suppose throughout the paper that
(1 +β)/2−α < d≤M . (5.5)
We now describe how the wavelet coefficients are defined in discrete time, that is for a real-valued sequence {xk, k ∈ Z} and for a finite sample {xk, k = 1, . . . , n}. Using the scaling function φ, we first interpolate these discrete values to construct the following continuous-time functions
xn(t)def= Xn k=1
xkφ(t−k) and x(t)def= X
k∈Z
xkφ(t−k), t∈R. (5.6) Without loss of generality we may suppose that the support of the scaling function φis included in [−T,0] for some integer T≥1. Then
xn(t) =x(t) for all t∈[0, n−T + 1].
We may also suppose that the support of the wavelet function ψ is included in [0,T].
With these conventions, the support of ψj,k is included in the interval [2jk,2j(k+ T)].
The wavelet coefficient Wj,k at scale j ≥0 and locationk ∈Z is formally defined as the scalar product in L2(R) of the function t7→x(t) and the wavelet t7→ψj,k(t):
Wj,k def= Z ∞
−∞
x(t)ψj,k(t) dt= Z ∞
−∞
xn(t)ψj,k(t) dt, j≥0, k∈Z, (5.7) when [2jk,2jk+ T]⊆[0, n−T + 1], that is, for all (j, k)∈ In, where whereIn is defined in (2.32)
If∆MXis stationary, then from (2.43) the process{Wj,k}k∈Z of wavelet coefficients at scalej ≥0 is stationary but the two–dimensional process {[Wj,k, Wj′,k]T}k∈Z of wavelet coefficients at scalesj and j′, withj≥j′, is not stationary. Here T denotes the transpo-sition. This is why we consider instead the stationary between-scale process
{[Wj,k,Wj,k(j−j′)T]T}k∈Z, whereWj,k(j−j′) is defined as follows:
Wj,k(j−j′)def= h
Wj′,2j−j′k, Wj′,2j−j′k+1, . . . , Wj′,2j−j′k+2j−j′−1
iT
. For allj, j′ ≥1, the covariance function of the between scale process is given by
Cov(Wj,k′(j−j′), Wj,k) = Z π
−π
eiλ(k−k′)Dj,j−j′(λ;f)dλ , (5.8) whereDj,j−j′(λ;f) stands for the cross-spectral density function of this process. The case j=j′ corresponds to the spectral density function of the within-scale process{Wj,k}k∈Z.
In the sequel, we shall use that the within- and between-scale spectral densities Dj,j−j′(λ;d) of the processX with memory parameterd∈Rcan be approximated by the corresponding spectral density of the generalized fractional Brownian motionB(d) defined in (2.84).
5.2.2 Definition of the robust estimators of d
Let us now define robust estimators of the memory parameter d of the M(d) process X from the observationsX1, . . . , Xn. These estimators are derived from the Abry and Veitch [1998] construction, and consists in regressing estimators of the scale spectrum
σj2 def= Var(Wj,0) (5.9)
with respect to the scale index j. More precisely, if bσj2 is an estimator of σ2j based on Wj,0:nj−1 = (Wj,0, . . . , Wj,nj−1) then an estimator of the memory parameterdis obtained by regressing log(bσj2) for a finite number of scale indices j ∈ {J0, . . . , J0 +ℓ} where J0 =J0(n)≥0 is the lower scale and 1 +ℓ≥2 is the number of scales in the regression.
The regression estimator can be expressed formally as dbn(J0,w)def=
JX0+ℓ j=J0
wj−J0log bσj2
, (5.10)
where the vectorwdef= [w0, . . . , wℓ]T of weights satisfiesPℓ
i=0wi= 0 and 2 log(2)Pℓ
i=0iwi= 1, see Abry and Veitch [1998] and Moulines et al. [2007b]. For J0 ≥ 1 and ℓ > 1, one may choose for examplew corresponding to the least squares regression matrix, defined byw=DB(BTDB)−1b where
bdef= h
0 (2 log(2))−1i
, B def=
"
1 1 . . . 1 0 1 . . . ℓ
#T
is the design matrix and D is an arbitrary positive definite matrix. The best choice of D depends on the memory parameterd. However a good approximation of this optimal matrix D is the diagonal matrix with diagonal entries Di,i = 2−i, i = 0. . . , ℓ; see Fa¨y et al. [2009] and the references therein. We will use this choice of the design matrix in the numerical experiments. A heuristic justification for this choice is that by [Moulines et al., 2007a, Eq. (28)],
σ2j ∼C22jd , asj→ ∞, (5.11)
whereC is a positive constant.
In the sequel, we shall consider three different estimators ofdbased on three different estimators of the scale spectrum σj2 with respect to the scale index j which are defined below.
Classical scale estimator
This estimator has been considered in the original contribution of Abry and Veitch [1998]
and consists in estimating the scale spectrumσj2 with respect to the scale index j by the empirical variance
b
σCL,j2 = 1 nj
nj
X
i=1
Wj,i2 , (5.12)
where for any j, nj denotes the number of available wavelet coefficients at scale index j defined in (2.32).
Median absolute deviation
This estimator is well-known to be a robust estimator of the scale and as mentioned by Rousseeuw and Croux [1993] it has several appealing properties: it is easy to compute and has the best possible breakdown point (50%). Since the wavelet coefficients Wj,i are centered Gaussian observations, the square of the median absolute deviation ofWj,0:nj−1 is defined by
b
σMAD,j2 =
m(Φ) med
0≤i≤nj−1Wj,i
2
, (5.13)
where Φ denotes the c.d.f of a standard Gaussian random variable and
m(Φ) = 1/Φ−1(3/4) = 1.4826 . (5.14) The use of the median estimator to estimate the scalogram has been suggested to estimate the memory parameter in Stoev et al. [2005] (see also [Percival and Walden, 2006, p. 420]).
A closely related technique is considered in Coeurjolly [2008a] and Coeurjolly [2008b] to estimate the Hurst coefficient of locally self-similar Gaussian processes. Note that the use of the median of the squared wavelet coefficients has been advocated to estimate the variance at a given scale in wavelet denoising applications; this technique is mentioned in Donoho and Johnstone [1994] to estimate the scalogram of the noise in the i.i.d. context;
Johnstone and Silverman [1997] proposed to use this method in the long-range dependent context; the use of these estimators has not been however rigorously justified.
The Croux and Rousseeuw estimator
This estimator is another robust scale estimator introduced in Rousseeuw and Croux [1993]. Its asymptotic properties in several dependence contexts have been further studied in L´evy-Leduc et al. [2009] and the square of this estimator is defined by
b
σCR,j2 =
c(Φ){|Wj,i−Wj,k|; 0≤i, k≤nj−1}(knj)
2
, (5.15)
wherec(Φ) = 2.21914 and knj =⌊n2j/4⌋. That is, up to the multiplicative constantc(Φ), b
σCR,j is the knjth order statistics of the n2j distances |Wj,i−Wj,k| between all the pairs of observations.