Adaptiveness of the empirical distribution of residuals in semi- parametric conditional location scale models

(1)

HAL Id: hal-02898909

https://hal.archives-ouvertes.fr/hal-02898909

Preprint submitted on 14 Jul 2020

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Adaptiveness of the empirical distribution of residuals in semi- parametric conditional location scale models

Christian Francq, Jean-Michel Zakoïan

To cite this version:

Christian Francq, Jean-Michel Zakoïan. Adaptiveness of the empirical distribution of residuals in semi- parametric conditional location scale models. 2020. �hal-02898909�

(2)

Adaptiveness of the empirical distribution of residuals in semi- parametric conditional location scale models

Christian Francq and Jean-Michel Zakoïan

CREST and University of Lille, 5 Avenue Henri Le Chatelier, 91120 Palaiseau, France.

E-mail: [email protected], [email protected]

Summary. This paper addresses the problem of deriving the asymptotic distribution of the empirical

distribution functionFbn of the residuals in a general class of time series models, including conditional mean and conditional heteroscedaticity, whose independent and identically distributed errors have unknown distributionF. We show that, for a large class of time series models (including the standard ARMA-GARCH), the asymptotic distribution of √

n{Fbn(·)−F(·)} is impacted by the estimation but does not depend on the model parameters. It is thus neither asymptotically estimation free, as is the case for purely linear models, nor asymptotically model dependent, as is the case for some nonlinear models. The asymptotic stochastic equicontinuity is also established. We consider an application to the estimation of the conditional Value-at-Risk.

(3)

1. Introduction

For independent variablesη1, . . . , ηn with common distributionF, the celebrated Glivenko-Cantelli theorem states the uniform almost sure convergence of the empirical distribution,

sup

x∈R

|F_n(x)−F(x)| →0, a.s.

where Fn(x) = _n¹Pn

t=11{η_t≤x} for all x. The usual Central Limit Theorem (CLT) shows that

√n{F_n(x)−F(x)}is asymptoticallyN(0, F(x)(1−F(x))distributed without any further assumption. Moreover, under the assumption that F is continuous, the sequence {√

n[F_n(·)−F(·)]} is stochastically equicontinuous, in the sense that √

n{F_n(x_n)−F(x_n)} → N^L (0, F(x)(1−F(x)) for any sequence(x_n) converging tox in probability, for any x (see Billingsley (1968, Section 22) and Andrews (1994)). The latter convergence in distribution has important statistical applications, for instance the derivation of the asymptotic distribution of the empirical quantiles (see below).

Time series observations are rarely independent and identically distributed (iid), but time series models often involve iid innovations. Modern time series analysis often relies on semi-parametric models in which the distributionF of the innovations η_t is not fully specied. Many such models are of the location-scale form y_t = m_t(θ₀) +σ_t(θ₀)η_t, where m_t and σ_t are functions of the past observationsyt−i, i >0, and the sequence(η_t)is iid. Consistent estimation of the parameterθ₀ can be achieved by several methods. In particular, the Quasi-Maximum Likelihood (QML) estimation method generally relies on a Gaussian criterion - written as if the distribution F was Gaussian - but provides consistent estimators while remaining agnostic concerning the true distribution.

Even if no distributional assumption is made during the estimation phase, using semi-parametric time series models may require information about the errors distribution F. For instance, in the location-scale model the conditional Value-at-Risk (VaR) at levelα ∈(0,1) is equal to m_t+σ_tξ_α whereξ_αis theα-quantile ofF. Information aboutF can be obtained via the empirical distribution function of the residuals (e.d.f.r.) Fb_n. Consistent estimation of the parameters produce residuals which are obviously not iid but converge to the innovations. One question of importance is whether the previous properties on Fn apply to Fbn, i.e. when innovations are replaced by residuals.

For many applications, the asymptotic distribution of the sequence{√

n[Fbn(·)−F(·)]}is required.

In the statistical literature, two kinds of results have been established. In some situations, the asymptotic distribution is asymptotically estimation free (AEF), in the sense that, asymptotically,

√n{Fb_n(·) −F(·)} =^d √

n{F_n(·) −F(·)}. Such results go back to Boldin (1982), for the AR(p) process, and have been generalized to linear processes by Kreiss (1991) and Bai (1994). On the other hand, existing results for nonlinear processes show that the distribution of√

n{Fbn(·)−F(·)}

can be much more complicated. It can be asymptotically model dependent (AMD), in the sense

(4)

that the asymptotic variance depends on both the law of the innovations F and the parameter θ0. For dierent classes of conditionally heteroscedatic time series, the asymptotic distribution was investigated by Boldin (1998), and Lee and Taniguchi (2005) among others, and for the squared residuals by Horváth, Kokoszka and Teyssière (2001) and Berkes and Horváth (2003). In all these works, the distribution of the statistics involving the residuals is AMD. To overcome the AMD, Koul and Ling (2006) proposed a modication of the Kolmogorov-Smirnov test based on residuals.

See Berkes and Horváth (2002) for a review of the asymptotic behaviour of e.d.f.r.

The main contribution of this paper is to show that, for a large class of time series models (including the standard ARMA-GARCH), the asymptotic distribution of√

n{Fbn(·)−F(·)}is neither AEF nor AMD. In some cases, which we are able to characterize, the distribution is asymptotically model free (AMF), however it is impacted by the estimation, i.e. it is asymptotically estimation dependent (AED), though only through the errors distribution. The AMF property simplies dra- matically the use of such asymptotic results in practical applications based on potentially complex time series models. On the other hand, the AED property tells us that usual statistics available for observed iid processes cannot be directly employed.

We start by considering volatility modelling and QML estimation, which constitutes the most widely used framework for nancial returns. The absence of a conditional mean and the choice of a specic estimation method allows us to derive ner results. However, from an extended perspective, we will also consider conditional location-scale models in which both rst conditional moments are involved, and more general estimation methods. We will also investigate an application to the VaR estimation.

The rest of the paper is organized as follows. In the next section we introduce a general parametric volatility model and the main assumptions ensuring the consistency of the e.d.f.r. Next, we state our rst main result, establishing the asymptotic distribution of the e.d.f.r. Section 3 considers the extension to conditional location-scale models. The application to VaR is developed in Section 4. In particular, we provide conditions for the strong consistency and asymptotic normality of the residuals quantiles. Section 5 concludes. Most proofs are postponed to Section 6 or to an Appendix.

2. Empirical distribution of the residuals for volatility models

In this section, we focus on conditional scale models of the multiplicative form

t=σtηt, σt=σ(t−1, t−2, . . .;θ0), (1) where(ηt) is a sequence of iid random variables,E(η_t²) = 1,θ0 is a vector of unknown coecients which belongs to a compact parameter setΘ⊂R^d, andσ is a positive function. Note that we do

(5)

not assume thatE(ηt) = 0.

For instance, the standard GARCH(p, q) model writes t=σtηt, σ²_t =ω0+

q

X

i=1

α0i²_t−i+

p

X

j=1

β0jσ_t−j² , (2)

whereθ0 = (ω0, α01, . . . , β0p)⁰ satisesω0>0, α0i ≥0, β0j ≥0.Under strict stationarity, invertibility of the polynomialB(z) = 1−Pp

j=1β0jz^j holds and the volatilityσ²_t can indeed be written as a linear function of the past values of ²_t.

We start by considering the Gaussian QML estimation method for which explicit conditions for Consistency and Asymptotic Normality (CAN) can be displayed. More general estimators will be considered in Section 3. The setup can be described as follows. Given observations ₁, . . . , _n, and using arbitrary initial values˜i for i≤0, we dene for anyθ∈Θ,

˜

σt(θ) =σ(t−1, t−2, . . . , 1,˜0,˜−1, . . .;θ), which will be used as a proxy ofσ_t(θ) =σ(t−1, t−2, . . . , ₁, ₀, −1, . . .;θ).

A Gaussian QML estimator (QMLE) ofθ₀ is dened as θbn= arg min

θ∈Θ

1 n

n

X

t=1

`˜t(θ), `˜t(θ) = ²_t

˜

σ²_t(θ)+ log ˜σ_t²(θ). (3) Let K > 0 be a generic constant or random variable measurable with respect to F₀, where F_t denotes the σ-algebra generated by{η_s, s≤t}. Let ρ∈(0,1). We shall assume the following.

A1: (t) is a strictly stationary, non-anticipative (i.e. t ∈ F_t) and ergodic solution of Model (1).

Moreover,E|σ_t|^r<∞ for some r >0.

A2: For any real sequence (x_i), the function θ 7→ σ(x₁, x₂, . . .;θ) is continuous dierentiable.

Almost surely, σ_t(θ) ∈(ω,∞]for any θ ∈Θand for some ω >0. Moreover, σ_t(θ₀)/σ_t(θ) = 1a.s. iθ=θ₀.

A3: sup_θ∈Θ|σ_t(θ)−σ˜t(θ)| ≤Kρ^t.

A4: There exists a neighborhood V(θ₀) of θ₀ such that E

sup_θ∈V_(θ₀₎^σ_σ^t^(θ⁰⁾

t(θ)

r

< ∞ and Esup_θ∈V_(θ₀₎kD_t(θ)k^r <∞,whereD_t(θ) =σ_t⁻¹(θ)∂σ_t(θ)/∂θ.

Assumptions A1-A3 are a set of conditions ensuring the strong consistency of bθn (see Francq and Zakoian (2004, 2015)). Assumption A4 is introduced to control the dierence between the innovations and the residuals. ¹

1For the classical GARCH(p, q) model, under invertibility conditions on the lag polynomial B_θ(z) = 1− Pp

j=1βjz^j and assuming that the law ofη²_t is nondegenerate, A1-A4 reduce to the rst part of A1.

(6)

Let the residuals ηbt=t/˜σt(bθn). We note that, at least for t large enough, σ˜t(bθn)≥ω by A2- A3. We start by establishing a Glivenko-Cantelli result for the e.d.f.r. Fb_n(x) = _n¹P_n

t=11_{

ηbt≤x}.² The following assumption is simply denoted A5 when it holds for all x∈R.

A5(x): Forx∈R, the cdf F of η is Lipschitz continuous in a neighborhood of x. Theorem 1. Under A1-A4,

(a) If A5(x) holds for x∈R, we have |Fbn(x)−F(x)| →0 a.s.

(b) If A5 holds we havesup_x∈_R|Fbn(x)−F(x)| →0 a.s.

Notice that the previous uniform convergence was established by Stute (2001) in the case of ARCH(q) processes. It is known (see for instance Stute and Schumann (1980)) that in the case of a stationary ergodic sequence, the Glivenko-Cantelli theorem is valid without any assumption onF. The following example shows that A5 is required for the consistency of the e.d.f.r.

Example 1 (Glivenko-Cantelli in failure). Consider the scale model _t = ση_t, σ > 0, where ηt is distributed over {−√

2,0,√

2} with P(ηt = −√

2) = P(ηt = √

2) = 1/4. Let σbn be a consistent estimator of σ. The residualsηbt= ^σ

σbnηt have the e.d.f.r.

Fb_n(x) = 1 n

n

X

t=1

1lt<0

! 1l₋^√₂ ^σ

σnb ≤x<0+ 1 n

n

X

t=1

1lt≤0

!

1l_0≤x<^√₂ ^σ

bσn

+ 1l_x≥^√₂ ^σ

σnb

. It follows that sup_x∈_R|Fbn(x)−F(x)| ≥ |Fbn(−√

2)−F(−√

2)|= 1/4 whenever σ < bσn, which has a non vanishing probability for the QML. Indeed, we have bσ_n² = ¹_nPn

t=1²_t, thus P(σ < bσn) = P(¹_nPn

t=11l²_t=0>1/2) =P(X_n> n/2) whereX_n∼ B(n,1/2).

We now derive the e.d.f.r. asymptotic distribution under the following additional assumptions.

A6: θ₀ belongs to the interior of Θ.

A7: There exist no non-zerox∈R^d such thatx⁰^∂σ_∂θ^t^(θ⁰⁾ = 0, a.s.

A8: The functionθ 7→σ(x1, x2, . . .;θ) has continuous second-order derivatives, and sup

θ∈Θ

∂σt(θ)

∂θ −∂σ˜t(θ)

∂θ

≤Kρ^t. A9: There exists a neighborhood V(θ0) ofθ0 such that

E sup

θ∈V(θ0)

(

1 σ_t(θ)

∂σ_t(θ)

∂θ

4

+

1 σ_t(θ)

∂²σ_t(θ)

∂θ∂θ⁰

2

+

σ_t(θ₀) σ_t(θ)

4

+

σ_t(θ) σ_t(θ₀)

4)

<∞.

Moreover,κ4 :=E|η_t|⁴ <∞.

2For the classical GARCH(p, q) model, Berkes and Horváth (2003) established a Glivenko-Cantelli theorem for the empirical cumulative distribution function (cdf) of the squared residuals.

(7)

A10: All the coordinates of ^∂σ^t_∂θ^(θ⁰⁾ are (strictly) positive.

A11: η1 admits a densityf which is continuous onR.

A12: For any θ ∈ Θ, for any c > 0, and any sequence (x_i), there exits θ_c ∈ Θ such that cσ(x1, x2, . . .;θ) =σ(x1, x2, . . .;θc).

Assumptions A6 is required for the asymptotic normal distribution of bθ_n. Assumptions A7-A10 and A12 are satised for the standard GARCH (p, q) model, under the already mentioned reg- ularity assumptions. Assumption A12 is a stability-by-scaling property, which seems a desirable assumption for any volatility model (examples are provided in Francq and Zakoian (2015)).

We now state our rst main result showing that the law of √ n

Fbn−F

is AED but AMF.

Theorem 2. Let A1-A4, A6-A11 hold. Then, for any sequence (xn) of random variables converging in probability tox∈R,

√n

Fb_n(x_n)−F(x_n)

= 1

√n

n

X

t=1

{1lηt<x−F(x)} −xf(x) 2√

n Ω⁰J⁻¹

n

X

t=1

(1−η²_t)D_t+o_P(1), where Ω=E(Dt), J =E(DtD⁰_t) with Dt=Dt(θ0).

If in addition A12 holds,

√n

Fbn(xn)−F(xn) _L

→ N

0, F(x){1−F(x)}+{xf(x)}²

4 (κ4−1) +xf(x)%(x)

where %(x) =E(η₀²1lη0<x)−F(x).

Remark 1. A noticeable outcome of this theorem is that, under A12, the asymptotic distribution of the e.d.f.r. only depends on the underlying distribution of the innovations. It is completely independent of the model from which these residuals are derived. In the sense of Robinson (1987), it means that the e.d.f.r. is adaptive to the unknown value of θ0, and even to the unknown form of the function σ. This does not mean that we retrieve the asymptotic distribution of Fn derived for iid data. The usual asymptotic variance, F(x){1−F(x)}, is only valid when xf(x) = 0. This is the case when x= 0 because the innovations and residuals at a given date have the same sign. The eect of estimation also vanishes when |x|increases to ∞ (under A9). Parameter estimation may reduce or increase the asymptotic variance (see Appendix).

Remark 2. One practical interest of adaptiveness is that estimation of the asymptotic variance reduces to estimation of characteristics of the innovations distribution. In particular, f(x) can be estimated by a Kernel density estimator based on the residuals. Theorem 2.1 in Kulperger and Yu (2005) shows that, in the standard GARCH case, the use of residuals instead of innovations has no asymptotic impact on the estimation of the density.

(8)

3. Including a conditional mean

In ARMA-GARCH models, the conditional variance specication is completed by a linear model for the conditional mean. In what follows, we consider a more general framework. We will not either restrict our investigations to the QML estimator. Suppose the model writes







y_t=m_t+_t, _t=σ_tη_t

m_t=m(yt−1, yt−2, . . .;θ₀), σ_t=σ(yt−1, yt−2, . . .;θ₀) (4) under the same assumptions on (ηt),θ0 and Θas before. Model (4) includes the double-AR(p) of Ling (2007) in whichθ0 = (φ01, . . . , φ0p, ω0, α01, . . . , α0p)⁰ ∈R^p×(0,∞)×[0,∞)^p and

m(yt−1, yt−2, . . .;θ0) =

p

X

i=1

φ0iyt−i, σ(yt−1, yt−2, . . .;θ0) = v u u tω0+

p

X

i=1

α0iy²_t−i,

but also more traditional ARMA-GARCH-type models, in which the volatility initially dened in terms of the t−i's can be rewritten as a function of theyt−i's.

Let, for arbitrary initial valuesy˜0,y˜−1, . . . ,for anyθ ∈Θ, and for1≤t≤n,

me_t(θ) =m(yt−1, yt−2, . . . , y₁,y˜₀,y˜−1, . . .;θ), σ˜_t(θ) =σ(yt−1, yt−2, . . . , y₁,y˜₀,y˜−1, . . .;θ).

As in the previous section we denote without "tilde", the functions mt(θ) and σt(θ) in which the initial values are replaced by variables at times anterior to 0. In the following assumptions, r >0 denotes a real number which can be chosen arbitrarily small.

B1: (yt) is a strictly stationary, non-anticipative and ergodic solution of Model (4). Moreover, E|m_t|^r<∞and E|σ_t|^r<∞.

B2: bθ_n denotes any consistent estimator of θ₀. The functions θ → m_t(θ) and θ → σ_t(θ) are continuously dierentiable. Almost surely, σ_t(θ)∈(ω,∞]for anyθ ∈Θand for someω >0. B3: sup_θ∈Θ(|m_t(θ)−met(θ)|+|σ_t(θ)−σ˜t(θ)|)≤Ktρ^t whereKt∈ F_t−1 andsup_tE(K_t^r)<∞. B4: For any neighborhoodV(θ₀) of θ₀, we have

E sup

θ∈V(θ0)

∂mt(θ)

∂θ

r

+|m_t(θ0)−mt(θ)|^r+kD_t(θ)k^r

<∞ where

Dt(θ) = 1 σt(θ)

∂σt(θ)

∂θ .

Note that the conditionK ∈ F₀ in A3 has been weaken in B3, in particular to be able to handle the ARMA-GARCH under standard conditions.

We start by extending the Glivenko-Cantelli-type Theorem 1. Let the residuals bηt = {y_t − met(bθn)}/˜σt(bθn).

(9)

Theorem 3. Under B1-B4 (instead of A1-A4), the conclusions of Theorem 1 hold.

Hence, extending the framework of the previous section does not alter the consistency results. We will now see that the same conclusion cannot be drawn concerning the asymptotic distribution of

√n

Fbn−F

, which will no longer be model-free in general. Another dierence, which constitutes a major diculty, is that Assumption A10 can no longer be made when a conditional mean is included.³ We now assume that bθ_n admits a Bahadur representation.

B5: The following expansion holds

√n

bθ_n−θ₀

= 1

√n

n

X

t=1

∆t−1V(η_t) +o_P(1),

where V(·) is a measurable function, V : R 7→ R^k for some positive integer k, and ∆t−1 is a F_t−1-measurable d×k matrix. The variables ∆t and V(ηt) belong to L² with EV(ηt) = 0, var{V(ηt)}=Υis nonsingular andE∆t=Λ is full row rank.

Under this assumption, the CLT for stationary second-order martingale dierences of Billingsley (1961) can be applied. It follows that the asymptotic law of√

n

bθn−θ0

is a centered Gaussian with variance Σ:=E(∆_tΥ∆⁰_t). Assumption B5 has to be veried on a case by case basis, given specic model and estimator.

Remark 3. For the Gaussian QMLE θb_n of θ₀, bθ_n= arg min

θ∈ΘQ_n(θ), Q_n(θ) = 1 n

n

X

t=1

`˜_t(θ), `˜_t(θ) = {y_t−m˜_t(θ)}²

˜

σ²_t(θ) + log ˜σ²_t(θ), it can be shown that

√n(bθn−θ0) = 2J⁻¹ 1

√n

n

X

t=1

ηt

1 σ_t

∂mt(θ0)

∂θ + (η_t²−1)1 σ_t

∂σt(θ0)

∂θ

+o_P(1), (5) where

J =E

∂`²_t(θ₀)

∂θ∂θ⁰

= 2J_m+ 4J_σ, J_m =E 1

σ_t²

∂m_t(θ₀)

∂θ

∂m_t(θ₀)

∂θ⁰

andJσ =EDt(θ0)D⁰_t(θ0).The Bahadur expansion in B5 is thus satised, ifE(ηt) = 0, withk= 2 and

V(η_t) = (η_t, η_t²−1)⁰, ∆t−1 = 2J⁻¹ 1

σt

∂m_t(θ₀)

∂θ

1 σt

∂σ_t(θ₀)

∂θ

, Λ= 2J⁻¹[Ω_m Ω_σ]. We also have Varas{√

n(θbn−θ₀)}= 4J⁻¹{J_m+ (κ4−1)Jσ+µ3(Jmσ+J⁰_mσ)}J⁻¹ =Σ, where Jmσ =E

1 σ²_t

∂mt(θ0)

∂θ

∂σt(θ0)

∂θ⁰

and µ3=Eη³_t.

3For instance in the modely_t=m₀+σ_tη_twhereσ²_t =ω+α(y_t−1−m)², the derivative ofσ_twith respect tomis not positive, and even not of constant sign.

(10)

The assumptions of Section 2 have to be modied as follows.

B6: The functionθ →mt(θ) and θ→σt(θ)have continuous second-order derivatives, and sup

θ∈Θ

∂m_t(θ)

∂θ −∂me_t(θ)

∂θ

+ sup

θ∈Θ

∂σ_t(θ)

∂θ −∂σe_t(θ)

∂θ

≤K_tρ^t, whereK_t is as in B3.

B7: There exists a neighborhoodV(θ0) ofθ0 such that E sup

θ∈V(θ0)

(

1 σt(θ0)

∂m_t(θ)

∂θ

4

+

1 σt(θ0)

∂²m_t(θ)

∂θ∂θ⁰

2)

<∞.

Let the additional assumption

B8: Fornlarge enough,n > n₀ say, the conditional distribution ofη_t givenθb_n andFt−1 admits a density which is a.s. bounded, uniformly inn > n0 and t < n−t(n) wheret(n) =o(√

n).

Intuitively, the variablesθbn and ηt should be asymptotically independent in most standard situations. For instance, consider the simple location model yt = θ0+σηt, with Gaussian innovations ηt. The conditional distribution ofηt givenθbn andF_t−1, where θbn is the sample mean, is Gaussian with variance1− {n−(t−1)}⁻¹. Thus B8 is satised for n0 >1 andt(n) = 1, with bound 1/√

π. The next theorem uses the notion of discrete estimator, which has been introduced by Le Cam (1960) and used by many authors (e.g. Kreiss (1987)).

Theorem 4. Under B1-B8 and A9, A11, if i) (xn) is a non-random sequence converging to x∈R; or ii) xn is a discrete √

n-consistent estimator of x, then

√n

Fb_n(x_n)−F(x_n) _L

→ N 0, F(x){1−F(x)}+H⁰(x)ΣH(x) + 2H⁰(x)Λ%(x) where%(x) =E(1lη0<xV(η0))andH(x) =f(x)(Ωm+xΩσ),Ωm=E

1 σt

∂mt(θ0)

∂θ

,Ωσ =EDt(θ0).

Every sequence of √

n-consistent estimators can be truncated to get a discrete sequence. The discretization can also be avoided, at the price of either reinforcing Assumption B8, or increasing the speed of convergence of the sequence(x_n) (see Appendix).

More explicit forms of the asymptotic variance can be derived for the QMLE under the following extension of Assumption A12 which will be illustrated below.

A12^∗: For any θ ∈ Θ, for any c > 0, and any sequence (x_i), there exits θ_c ∈ Θ such that cσ(x1, x2, . . .;θ) =σ(x1, x2, . . .;θc) andm(x1, x2, . . .;θ) =m(x1, x2, . . .;θc).

The next result provides an explicit form for the asymptotic variance when the parameters of the conditional mean and variance are independent and are estimated by QML.