Local likelihood wavelet (LLW) estimator of d

4.4 Application

4.4.2 Local likelihood wavelet (LLW) estimator of d

Z π

−π

∂f(λ;β⁽⁻^M))

∂βr

e^iλkdλ= Z π

−π

∂logf(λ;β⁽⁻^M))

∂βr

e^iλkf(λ;β⁽⁻^M⁾)dλ . (4.26) whereβ⁽⁻^M⁾ denotes the parameter (d−M, φ₁, . . . , φ_p, θ₁, . . . , θ_q, σ²) when

β= (d, φ₁, . . . , φ_p, θ₁, . . . , θ_q, σ²).

The integral (4.26) is not known in closed form, except forp=q = 0, see Gradshteyn and Ryzhik [2000], but, forM large enough so as to makef(λ;β⁽⁻^M⁾) and its derivative with respect to β a smooth function of λ, can be approximated by Riemann sums with arbitrary precision.

4.4.2 Local likelihood wavelet (LLW) estimator of d

As shown in Moulines et al. [2007b], the wavelet coefficients of any M(d) process have second order properties that only depend on d at large scales, up to a multiplicative constant. In particular, as j → ∞, Var(Wj,0) ∼ σ²2^2dj. The local Whittle Wavelet (LWW) estimator ofdintroduced in Moulines et al. [2008] is based on the approximation of nearly the wavelet coefficients by independent Gaussian coefficients with exact variance σ²2^2dj over a given set of time scale indices I, which yields the followingpseudo negative log-likelihood

Lb_I(σ², d) = 1 2σ²

(j,k)∈I

2⁻^2dj(W_j,k)²+|I|

2 log(σ²2²^hIi^d),

where |I| denotes the cardinal of I and hIi is the average scale, hIi^def= |I|⁻¹P

(j,k)∈Ij.

Define bσ_I²(d) ^def= Argmin_σ2>0bL_I(σ², d) = |I|⁻¹P

(j,k)∈I2⁻^2dj(W_j,k)² . The pseudo maxi-mum likelihood estimator of the memory parameter is then equal to the minimaxi-mum of the negated profile log-likelihood,

db^LWW(I)^def= Argmin

d∈[∆1,∆2]

bL_I(bσ_I²(d), d) = Argmin

d∈[∆1,∆2]

eL_I(d), (4.27) where [∆₁,∆₂] is an interval of admissible values ford(that only depends on the wavelet) and

Le_I(d)^def= log



 X

(j,k)∈I

2^2d(^hIi−^j)(W_j,k)²



 . (4.28)

Since the equivalence Var[W_j,0]∼ σ²2^2dj holds at large scales (j → ∞), the set of scale indicesI is chosen of the form

In(ℓ) ={(j, k) : ℓ≤j ≤J_n, 0≤k < n_j} .

(recall thatn_j is the number of available wavelet coefficients at scalejdefined in (2.53) and J_nthe largest observed scale index). Henceforth we will simply denote the corresponding estimator as db^LWW(ℓ). The asymptotic properties (consistency and central limit theorem as ℓ, n → ∞) of the LWW estimators have been studied in Moulines et al. [2008] in the Gaussian case and in Roueff and Taqqu [2009] in the linear case, see also Fa¨y et al.

[2009] for comparison with other wavelet estimators and Fourier estimator of the memory parameterd.

However, as already mentioned, it is shown in Moulines et al. [2007b] that the wavelet coefficients of anM(d) process cannot be approximated by independent coefficients, even at large scales. Instead, they have second order properties that are equivalent to those of the wavelet coefficients of a generalized continuous time fractional Brownian motion, up to a multiplicative constant, see Moulines et al. [2007b]. In particular, although the inde-pendence of wavelet coefficients is not verified at large scales, the second order properties keep depending only on the unknown parameterd and on a multiplicative constant. (In fact this constant only depends on the wavelet and f^∗(0)). As a consequence, at large scales, the wavelet coefficients of any M(d) process has a distribution well approximated by those of an ARFIMA(0, d,0). Hence we propose to define a local likelihood wavelet (LLW) estimator by maximizing the likelihood associated to the wavelet coefficients of an ARFIMA(0, d,0) with indices in In(ℓ) for some lower scale ℓ. That is, we define db^LLW(ℓ) as the minimizer of

Lℓ(d) = log

W^(ℓ,n)T

(M^(ℓ,n)(d))⁻¹W^(ℓ,n)

+ 1

n(ℓ)log det

M^(ℓ,n)(d) , where W^(ℓ,n) contains all the available wavelet coefficients with scale indices between ℓ andJ_n,en(ℓ) denotes the number of such coefficients andM^(ℓ,n)(d) is theexact covariance matrix of these wavelet coefficients for the ARFIMA(0, d,0) process. Note that this matrix can be computed using the iterative algorithm derived in Proposition 4.3.2.

Since the approximation of the second order properties is finer for the likelihood used for definingdb^LLW(ℓ) than for the pseudo likelihood used for definingdb^LWW(ℓ), one expects a better performance for the former estimator than for the second one. This is in fact only partially true. This better approximation yields an estimator based on a more complete information of the asymptotic model that appears at large scales, since not only the variance of the wavelet coefficients but also their cross-correlations are used. This should clearly yield a smaller variance of the estimator. On the other hand the bias introduced by the fact that the true model may not be an ARFIMA(0, d,0) should not be improved :

both the variance and the cross-correlations are well approximated only forℓlarge. Indeed by approximating not only the variance but also the cross-correlations by an asymptotic model, it may happen that a larger bias is introduced. In other words, the likelihood could be more model dependent than the pseudo likelihood. These two phenomena are observed in our Monte-Carlo simulations, that we now present. We study the finite sample properties of the two estimators db^LWW(ℓ) and db^LLW(ℓ) for two different models.

1. Gaussian ARFIMA(1,d,0) model with d in{−0.8,−0.4,0,0.2,0.6,1,1.6,2,2.6} and the AR coefficient equal to 0.7.

2. Gaussian DARFIMA model, as defined in Andrews and Sun [2004]. The spectral density of the DARFIMA(1,d,0) process is equal to that of an ARFIMA(1,d,0) pro-cess on the interval [−λ₀, λ₀] and is zero for (λ₀, π]. It is obtained by low-pass filtering of an ARFIMA(1,d,0) trajectory by a truncated sinc function in the time domain.

We choseλ₀ =π/2 and the same parameters as previously for the ARFIMA(1,d,0) part.

In all the simulations, we use Daubechies wavelets with 4 vanishing moments. In this study, we consider sample of lengthn= 2¹², which, with the chosen wavelet, givesJ_n = 8.

n= 4096,Jn= 8 ARFIMA(1, d,0) σ²= 1

d -0.8 -0.4 0 0.2 0.6 1 1.6 2 2.6

Bias^LLW -0.057 -0.04 -0.03 -0.03 -0.02 -0.02 0.01 -0.01 -0.02

ℓ= 3 S.E^LLW (0.026) (0.029) (0.026) (0.026) (0.027) (0.029) (0.025) (0.027) (0.024) RMSE^LLW 0.063 0.051 0.043 0.0540 0.043 0.033 0.036 0.032 0.029 Bias^LWW -0.029 -0.02 -0.02 -0.026 -0.02 -0.02 -0.01 -0.02 -0.03 S.E^LWW (0.042) (0.031) (0.027) (0.036) ( 0.046) (0.039) (0.032) (0.040) (0.046) RMSE^LWW 0.032 0.041 0.037 0.036 0.049 0.037 0.045 0.044 0.037

Bias^LLW -0.006 -0.009 -0.01 -0.02 -0.01 -0.007 -0.009 -0.009 -0.004 ℓ= 4 S.E^LLW (0.045) (0.047) (0.045) (0.045) (0.047) (0.043) (0.044) (0.041) (0.038) RMSE^LLW 0.046 0.065 0.047 0.046 0.050 0.041 0.036 0.042 0.037 Bias^LWW 0.03 0.01 -0.004 -0.005 -0.01 -0.008 -0.009 -0.01 -0.08 S.E^LWW (0.059) (0.054) (0.051) (0.047) (0.053) (0.054) (0.068) (0.11) (0.071)

RMSE^LWW 0.063 0.061 0.049 0.046 0.054 0.055 0.056 0.066 0.069

Bias^LLW 0.004 0.008 -0.006 -0.01 -0.009 -0.006 -0.01 -0.016 -0.01 ℓ= 5 S.E^LLW (0.070) (0.071) (0.072) (0.075) (0.074) (0.073) (0.076) (0.064) (0.037)

RMSE^LLW 0.069 0.091 0.08 0.077 0.074 0.073 0.072 0.066 0.091 Bias^LWW 0.051 -0.009 -0.01 -0.007 -0.009 -0.006 -0.01 -0.03 -0.009 S.E^LWW (0.091) (0.097) (0.09) (0.082) (0.088) (0.092) (0.11) (0.10) (0.07)

RMSE^LWW 0.1 0.091 0.092 0.083 0.087 0.095 0.106 0.12 0.085

Table 4.1: Bias, standard deviation and root mean-square error for db^LLW(L) anddb^LWW(L) over 1000 replications for a time series generated from the Gaussian ARFIMA(1, d,0) model. The lowest RMSE among the the method and all scales considered appears in boldface.

In view of Table 4.1, Table 4.2 and Figure 4.1, the two methods appear to work well with similar performances at the optimal finest scale indexℓ. An important property of db^LLW is that its standard deviations appear to remain more stable than for the estimator db^LWW as the unknown memory parameter d evolves for a fixed finest scale ℓ. This is of interest for computing confidence intervals: their size mainly depend on (the known) ℓ,

n= 4096,J_n= 8 DARFIMA(1, d,0) σ²= 1

d -0.8 -0.4 0 0.2 0.6 1 1.6 2 2.6

Bias^LLW -0.02 -0.02 -0.009 -0.028 -0.023 -0.020 -0.021 -0.02 -0.02 ℓ= 3 S.E^LLW (0.030) (0.03) (0.031) (0.029) (0.027) (0.027) (0.030) (0.028) (0.026)

RMSE^LLW 0.036 0.039 0.08 0.041 0.035 0.032 0.036 0.037 0.032 Bias^LWW 0.009 -0.01 0.004 -0.022 -0.02 -0.02 -0.02 -0.022 -0.01

S.E^LWW (0.032) (0.035) (0.051) (0.033) (0.031) (0.034) (0.038) (0.040) (0.041) RMSE^LWW 0.034 0.035 0.049 0.039 0.036 0.039 0.043 0.046 0.043

Bias^LLW -0.01 -0.01 -0.01 -0.002 -0.01 -0.002 -0.006 -0.006 0.004 ℓ= 4 S.E^LLW (0.04) (0.042) (0.043) (0.047) (0.047) (0.037) (0.041) (0.041) (0.038)

RMSE^LLW 0.055 0.044 0.034 0.047 0.049 0.037 0.041 0.041 0.036 Bias^LWW 0.02 0.007 -0.006 0.004 -0.007 -0.006 -0.007 -0.009 -0.008 S.E^LWW (0.056) (0.050) (0.049) (0.053) (0.054) (0.050) (0.056) (0.061) (0.067)

RMSE^LWW 0.069 0.051 0.032 0.054 0.055 0.051 0.06 0.062 0.069

Bias^LLW -0.008 -0.004 0.008 0.001 -0.002 -0.01 -0.01 -0.02 -0.01 ℓ= 5 S.E^LLW (0.067) (0.072) (0.072) (0.077) (0.075) (0.07) (0.069) (0.074) (0.061)

RMSE^LLW 0.12 0.072 0.079 0.077 0.08 0.067 0.07 0.077 0.058

Bias^LWW 0.03 0.02 0.005 0.007 -0.002 -0.03 -0.02 -0.03 -0.03

S.E^LWW (0.09) (0.088) (0.083) (0.088) (0.089) (0.084) (0.099) (0.11) (0.12)

RMSE^LWW 0.44 0.09 0.07 0.089 0.088 0.085 0.10 0.12 0.11

Table 4.2: Bias, standard deviation and root mean-square error for db^LLW(L) anddb^LWW(L) over 1000 replications for a time series generated from the Gaussian DARFIMA(1, d,0) model. The lowest RMSE among the two methods and all scales considered appears in boldface.