HAL Id: hal-02898909
https://hal.archives-ouvertes.fr/hal-02898909
Preprint submitted on 14 Jul 2020
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Adaptiveness of the empirical distribution of residuals in semi- parametric conditional location scale models
Christian Francq, Jean-Michel Zakoïan
To cite this version:
Christian Francq, Jean-Michel Zakoïan. Adaptiveness of the empirical distribution of residuals in semi- parametric conditional location scale models. 2020. �hal-02898909�
Adaptiveness of the empirical distribution of residuals in semi- parametric conditional location scale models
Christian Francq and Jean-Michel Zakoïan
CREST and University of Lille, 5 Avenue Henri Le Chatelier, 91120 Palaiseau, France.
E-mail: [email protected], [email protected]
Summary. This paper addresses the problem of deriving the asymptotic distribution of the empirical
distribution functionFbn of the residuals in a general class of time series models, including conditional mean and conditional heteroscedaticity, whose independent and identically distributed errors have unknown distributionF. We show that, for a large class of time series models (including the standard ARMA-GARCH), the asymptotic distribution of √
n{Fbn(·)−F(·)} is impacted by the estimation but does not depend on the model parameters. It is thus neither asymptotically estimation free, as is the case for purely linear models, nor asymptotically model dependent, as is the case for some nonlinear models. The asymptotic stochastic equicontinuity is also established. We consider an application to the estimation of the conditional Value-at-Risk.
1. Introduction
For independent variablesη1, . . . , ηn with common distributionF, the celebrated Glivenko-Cantelli theorem states the uniform almost sure convergence of the empirical distribution,
sup
x∈R
|Fn(x)−F(x)| →0, a.s.
where Fn(x) = n1Pn
t=11{ηt≤x} for all x. The usual Central Limit Theorem (CLT) shows that
√n{Fn(x)−F(x)}is asymptoticallyN(0, F(x)(1−F(x))distributed without any further assump- tion. Moreover, under the assumption that F is continuous, the sequence {√
n[Fn(·)−F(·)]} is stochastically equicontinuous, in the sense that √
n{Fn(xn)−F(xn)} → NL (0, F(x)(1−F(x)) for any sequence(xn) converging tox in probability, for any x (see Billingsley (1968, Section 22) and Andrews (1994)). The latter convergence in distribution has important statistical applications, for instance the derivation of the asymptotic distribution of the empirical quantiles (see below).
Time series observations are rarely independent and identically distributed (iid), but time series models often involve iid innovations. Modern time series analysis often relies on semi-parametric models in which the distributionF of the innovations ηt is not fully specied. Many such models are of the location-scale form yt = mt(θ0) +σt(θ0)ηt, where mt and σt are functions of the past observationsyt−i, i >0, and the sequence(ηt)is iid. Consistent estimation of the parameterθ0 can be achieved by several methods. In particular, the Quasi-Maximum Likelihood (QML) estimation method generally relies on a Gaussian criterion - written as if the distribution F was Gaussian - but provides consistent estimators while remaining agnostic concerning the true distribution.
Even if no distributional assumption is made during the estimation phase, using semi-parametric time series models may require information about the errors distribution F. For instance, in the location-scale model the conditional Value-at-Risk (VaR) at levelα ∈(0,1) is equal to mt+σtξα whereξαis theα-quantile ofF. Information aboutF can be obtained via the empirical distribution function of the residuals (e.d.f.r.) Fbn. Consistent estimation of the parameters produce residuals which are obviously not iid but converge to the innovations. One question of importance is whether the previous properties on Fn apply to Fbn, i.e. when innovations are replaced by residuals.
For many applications, the asymptotic distribution of the sequence{√
n[Fbn(·)−F(·)]}is required.
In the statistical literature, two kinds of results have been established. In some situations, the asymptotic distribution is asymptotically estimation free (AEF), in the sense that, asymptotically,
√n{Fbn(·) −F(·)} =d √
n{Fn(·) −F(·)}. Such results go back to Boldin (1982), for the AR(p) process, and have been generalized to linear processes by Kreiss (1991) and Bai (1994). On the other hand, existing results for nonlinear processes show that the distribution of√
n{Fbn(·)−F(·)}
can be much more complicated. It can be asymptotically model dependent (AMD), in the sense
that the asymptotic variance depends on both the law of the innovations F and the parameter θ0. For dierent classes of conditionally heteroscedatic time series, the asymptotic distribution was investigated by Boldin (1998), and Lee and Taniguchi (2005) among others, and for the squared residuals by Horváth, Kokoszka and Teyssière (2001) and Berkes and Horváth (2003). In all these works, the distribution of the statistics involving the residuals is AMD. To overcome the AMD, Koul and Ling (2006) proposed a modication of the Kolmogorov-Smirnov test based on residuals.
See Berkes and Horváth (2002) for a review of the asymptotic behaviour of e.d.f.r.
The main contribution of this paper is to show that, for a large class of time series models (in- cluding the standard ARMA-GARCH), the asymptotic distribution of√
n{Fbn(·)−F(·)}is neither AEF nor AMD. In some cases, which we are able to characterize, the distribution is asymptotically model free (AMF), however it is impacted by the estimation, i.e. it is asymptotically estimation dependent (AED), though only through the errors distribution. The AMF property simplies dra- matically the use of such asymptotic results in practical applications based on potentially complex time series models. On the other hand, the AED property tells us that usual statistics available for observed iid processes cannot be directly employed.
We start by considering volatility modelling and QML estimation, which constitutes the most widely used framework for nancial returns. The absence of a conditional mean and the choice of a specic estimation method allows us to derive ner results. However, from an extended perspective, we will also consider conditional location-scale models in which both rst conditional moments are involved, and more general estimation methods. We will also investigate an application to the VaR estimation.
The rest of the paper is organized as follows. In the next section we introduce a general para- metric volatility model and the main assumptions ensuring the consistency of the e.d.f.r. Next, we state our rst main result, establishing the asymptotic distribution of the e.d.f.r. Section 3 considers the extension to conditional location-scale models. The application to VaR is developed in Section 4. In particular, we provide conditions for the strong consistency and asymptotic normality of the residuals quantiles. Section 5 concludes. Most proofs are postponed to Section 6 or to an Appendix.
2. Empirical distribution of the residuals for volatility models
In this section, we focus on conditional scale models of the multiplicative form
t=σtηt, σt=σ(t−1, t−2, . . .;θ0), (1) where(ηt) is a sequence of iid random variables,E(ηt2) = 1,θ0 is a vector of unknown coecients which belongs to a compact parameter setΘ⊂Rd, andσ is a positive function. Note that we do
not assume thatE(ηt) = 0.
For instance, the standard GARCH(p, q) model writes t=σtηt, σ2t =ω0+
q
X
i=1
α0i2t−i+
p
X
j=1
β0jσt−j2 , (2)
whereθ0 = (ω0, α01, . . . , β0p)0 satisesω0>0, α0i ≥0, β0j ≥0.Under strict stationarity, invertibil- ity of the polynomialB(z) = 1−Pp
j=1β0jzj holds and the volatilityσ2t can indeed be written as a linear function of the past values of 2t.
We start by considering the Gaussian QML estimation method for which explicit conditions for Consistency and Asymptotic Normality (CAN) can be displayed. More general estimators will be considered in Section 3. The setup can be described as follows. Given observations 1, . . . , n, and using arbitrary initial values˜i for i≤0, we dene for anyθ∈Θ,
˜
σt(θ) =σ(t−1, t−2, . . . , 1,˜0,˜−1, . . .;θ), which will be used as a proxy ofσt(θ) =σ(t−1, t−2, . . . , 1, 0, −1, . . .;θ).
A Gaussian QML estimator (QMLE) ofθ0 is dened as θbn= arg min
θ∈Θ
1 n
n
X
t=1
`˜t(θ), `˜t(θ) = 2t
˜
σ2t(θ)+ log ˜σt2(θ). (3) Let K > 0 be a generic constant or random variable measurable with respect to F0, where Ft denotes the σ-algebra generated by{ηs, s≤t}. Let ρ∈(0,1). We shall assume the following.
A1: (t) is a strictly stationary, non-anticipative (i.e. t ∈ Ft) and ergodic solution of Model (1).
Moreover,E|σt|r<∞ for some r >0.
A2: For any real sequence (xi), the function θ 7→ σ(x1, x2, . . .;θ) is continuous dierentiable.
Almost surely, σt(θ) ∈(ω,∞]for any θ ∈Θand for some ω >0. Moreover, σt(θ0)/σt(θ) = 1a.s. iθ=θ0.
A3: supθ∈Θ|σt(θ)−σ˜t(θ)| ≤Kρt.
A4: There exists a neighborhood V(θ0) of θ0 such that E
supθ∈V(θ0)σσt(θ0)
t(θ)
r
< ∞ and Esupθ∈V(θ0)kDt(θ)kr <∞,whereDt(θ) =σt−1(θ)∂σt(θ)/∂θ.
Assumptions A1-A3 are a set of conditions ensuring the strong consistency of bθn (see Francq and Zakoian (2004, 2015)). Assumption A4 is introduced to control the dierence between the innovations and the residuals. 1
1For the classical GARCH(p, q) model, under invertibility conditions on the lag polynomial Bθ(z) = 1− Pp
j=1βjzj and assuming that the law ofη2t is nondegenerate, A1-A4 reduce to the rst part of A1.
Let the residuals ηbt=t/˜σt(bθn). We note that, at least for t large enough, σ˜t(bθn)≥ω by A2- A3. We start by establishing a Glivenko-Cantelli result for the e.d.f.r. Fbn(x) = n1Pn
t=11{
ηbt≤x}.2 The following assumption is simply denoted A5 when it holds for all x∈R.
A5(x): Forx∈R, the cdf F of η is Lipschitz continuous in a neighborhood of x. Theorem 1. Under A1-A4,
(a) If A5(x) holds for x∈R, we have |Fbn(x)−F(x)| →0 a.s.
(b) If A5 holds we havesupx∈R|Fbn(x)−F(x)| →0 a.s.
Notice that the previous uniform convergence was established by Stute (2001) in the case of ARCH(q) processes. It is known (see for instance Stute and Schumann (1980)) that in the case of a stationary ergodic sequence, the Glivenko-Cantelli theorem is valid without any assumption onF. The following example shows that A5 is required for the consistency of the e.d.f.r.
Example 1 (Glivenko-Cantelli in failure). Consider the scale model t = σηt, σ > 0, where ηt is distributed over {−√
2,0,√
2} with P(ηt = −√
2) = P(ηt = √
2) = 1/4. Let σbn be a consistent estimator of σ. The residualsηbt= σ
σbnηt have the e.d.f.r.
Fbn(x) = 1 n
n
X
t=1
1lt<0
! 1l−√2 σ
σnb ≤x<0+ 1 n
n
X
t=1
1lt≤0
!
1l0≤x<√2 σ
bσn
+ 1lx≥√2 σ
σnb
. It follows that supx∈R|Fbn(x)−F(x)| ≥ |Fbn(−√
2)−F(−√
2)|= 1/4 whenever σ < bσn, which has a non vanishing probability for the QML. Indeed, we have bσn2 = 1nPn
t=12t, thus P(σ < bσn) = P(1nPn
t=11l2t=0>1/2) =P(Xn> n/2) whereXn∼ B(n,1/2).
We now derive the e.d.f.r. asymptotic distribution under the following additional assumptions.
A6: θ0 belongs to the interior of Θ.
A7: There exist no non-zerox∈Rd such thatx0∂σ∂θt(θ0) = 0, a.s.
A8: The functionθ 7→σ(x1, x2, . . .;θ) has continuous second-order derivatives, and sup
θ∈Θ
∂σt(θ)
∂θ −∂σ˜t(θ)
∂θ
≤Kρt. A9: There exists a neighborhood V(θ0) ofθ0 such that
E sup
θ∈V(θ0)
(
1 σt(θ)
∂σt(θ)
∂θ
4
+
1 σt(θ)
∂2σt(θ)
∂θ∂θ0
2
+
σt(θ0) σt(θ)
4
+
σt(θ) σt(θ0)
4)
<∞.
Moreover,κ4 :=E|ηt|4 <∞.
2For the classical GARCH(p, q) model, Berkes and Horváth (2003) established a Glivenko-Cantelli theorem for the empirical cumulative distribution function (cdf) of the squared residuals.
A10: All the coordinates of ∂σt∂θ(θ0) are (strictly) positive.
A11: η1 admits a densityf which is continuous onR.
A12: For any θ ∈ Θ, for any c > 0, and any sequence (xi), there exits θc ∈ Θ such that cσ(x1, x2, . . .;θ) =σ(x1, x2, . . .;θc).
Assumptions A6 is required for the asymptotic normal distribution of bθn. Assumptions A7-A10 and A12 are satised for the standard GARCH (p, q) model, under the already mentioned reg- ularity assumptions. Assumption A12 is a stability-by-scaling property, which seems a desirable assumption for any volatility model (examples are provided in Francq and Zakoian (2015)).
We now state our rst main result showing that the law of √ n
Fbn−F
is AED but AMF.
Theorem 2. Let A1-A4, A6-A11 hold. Then, for any sequence (xn) of random variables converging in probability tox∈R,
√n
Fbn(xn)−F(xn)
= 1
√n
n
X
t=1
{1lηt<x−F(x)} −xf(x) 2√
n Ω0J−1
n
X
t=1
(1−η2t)Dt+oP(1), where Ω=E(Dt), J =E(DtD0t) with Dt=Dt(θ0).
If in addition A12 holds,
√n
Fbn(xn)−F(xn) L
→ N
0, F(x){1−F(x)}+{xf(x)}2
4 (κ4−1) +xf(x)%(x)
where %(x) =E(η021lη0<x)−F(x).
Remark 1. A noticeable outcome of this theorem is that, under A12, the asymptotic distribution of the e.d.f.r. only depends on the underlying distribution of the innovations. It is completely independent of the model from which these residuals are derived. In the sense of Robinson (1987), it means that the e.d.f.r. is adaptive to the unknown value of θ0, and even to the unknown form of the function σ. This does not mean that we retrieve the asymptotic distribution of Fn derived for iid data. The usual asymptotic variance, F(x){1−F(x)}, is only valid when xf(x) = 0. This is the case when x= 0 because the innovations and residuals at a given date have the same sign. The eect of estimation also vanishes when |x|increases to ∞ (under A9). Parameter estimation may reduce or increase the asymptotic variance (see Appendix).
Remark 2. One practical interest of adaptiveness is that estimation of the asymptotic variance reduces to estimation of characteristics of the innovations distribution. In particular, f(x) can be estimated by a Kernel density estimator based on the residuals. Theorem 2.1 in Kulperger and Yu (2005) shows that, in the standard GARCH case, the use of residuals instead of innovations has no asymptotic impact on the estimation of the density.
3. Including a conditional mean
In ARMA-GARCH models, the conditional variance specication is completed by a linear model for the conditional mean. In what follows, we consider a more general framework. We will not either restrict our investigations to the QML estimator. Suppose the model writes
yt=mt+t, t=σtηt
mt=m(yt−1, yt−2, . . .;θ0), σt=σ(yt−1, yt−2, . . .;θ0) (4) under the same assumptions on (ηt),θ0 and Θas before. Model (4) includes the double-AR(p) of Ling (2007) in whichθ0 = (φ01, . . . , φ0p, ω0, α01, . . . , α0p)0 ∈Rp×(0,∞)×[0,∞)p and
m(yt−1, yt−2, . . .;θ0) =
p
X
i=1
φ0iyt−i, σ(yt−1, yt−2, . . .;θ0) = v u u tω0+
p
X
i=1
α0iy2t−i,
but also more traditional ARMA-GARCH-type models, in which the volatility initially dened in terms of the t−i's can be rewritten as a function of theyt−i's.
Let, for arbitrary initial valuesy˜0,y˜−1, . . . ,for anyθ ∈Θ, and for1≤t≤n,
met(θ) =m(yt−1, yt−2, . . . , y1,y˜0,y˜−1, . . .;θ), σ˜t(θ) =σ(yt−1, yt−2, . . . , y1,y˜0,y˜−1, . . .;θ).
As in the previous section we denote without "tilde", the functions mt(θ) and σt(θ) in which the initial values are replaced by variables at times anterior to 0. In the following assumptions, r >0 denotes a real number which can be chosen arbitrarily small.
B1: (yt) is a strictly stationary, non-anticipative and ergodic solution of Model (4). Moreover, E|mt|r<∞and E|σt|r<∞.
B2: bθn denotes any consistent estimator of θ0. The functions θ → mt(θ) and θ → σt(θ) are continuously dierentiable. Almost surely, σt(θ)∈(ω,∞]for anyθ ∈Θand for someω >0. B3: supθ∈Θ(|mt(θ)−met(θ)|+|σt(θ)−σ˜t(θ)|)≤Ktρt whereKt∈ Ft−1 andsuptE(Ktr)<∞. B4: For any neighborhoodV(θ0) of θ0, we have
E sup
θ∈V(θ0)
∂mt(θ)
∂θ
r
+|mt(θ0)−mt(θ)|r+kDt(θ)kr
<∞ where
Dt(θ) = 1 σt(θ)
∂σt(θ)
∂θ .
Note that the conditionK ∈ F0 in A3 has been weaken in B3, in particular to be able to handle the ARMA-GARCH under standard conditions.
We start by extending the Glivenko-Cantelli-type Theorem 1. Let the residuals bηt = {yt − met(bθn)}/˜σt(bθn).
Theorem 3. Under B1-B4 (instead of A1-A4), the conclusions of Theorem 1 hold.
Hence, extending the framework of the previous section does not alter the consistency results. We will now see that the same conclusion cannot be drawn concerning the asymptotic distribution of
√n
Fbn−F
, which will no longer be model-free in general. Another dierence, which constitutes a major diculty, is that Assumption A10 can no longer be made when a conditional mean is included.3 We now assume that bθn admits a Bahadur representation.
B5: The following expansion holds
√n
bθn−θ0
= 1
√n
n
X
t=1
∆t−1V(ηt) +oP(1),
where V(·) is a measurable function, V : R 7→ Rk for some positive integer k, and ∆t−1 is a Ft−1-measurable d×k matrix. The variables ∆t and V(ηt) belong to L2 with EV(ηt) = 0, var{V(ηt)}=Υis nonsingular andE∆t=Λ is full row rank.
Under this assumption, the CLT for stationary second-order martingale dierences of Billingsley (1961) can be applied. It follows that the asymptotic law of√
n
bθn−θ0
is a centered Gaussian with variance Σ:=E(∆tΥ∆0t). Assumption B5 has to be veried on a case by case basis, given specic model and estimator.
Remark 3. For the Gaussian QMLE θbn of θ0, bθn= arg min
θ∈ΘQn(θ), Qn(θ) = 1 n
n
X
t=1
`˜t(θ), `˜t(θ) = {yt−m˜t(θ)}2
˜
σ2t(θ) + log ˜σ2t(θ), it can be shown that
√n(bθn−θ0) = 2J−1 1
√n
n
X
t=1
ηt
1 σt
∂mt(θ0)
∂θ + (ηt2−1)1 σt
∂σt(θ0)
∂θ
+oP(1), (5) where
J =E
∂`2t(θ0)
∂θ∂θ0
= 2Jm+ 4Jσ, Jm =E 1
σt2
∂mt(θ0)
∂θ
∂mt(θ0)
∂θ0
andJσ =EDt(θ0)D0t(θ0).The Bahadur expansion in B5 is thus satised, ifE(ηt) = 0, withk= 2 and
V(ηt) = (ηt, ηt2−1)0, ∆t−1 = 2J−1 1
σt
∂mt(θ0)
∂θ
1 σt
∂σt(θ0)
∂θ
, Λ= 2J−1[Ωm Ωσ]. We also have Varas{√
n(θbn−θ0)}= 4J−1{Jm+ (κ4−1)Jσ+µ3(Jmσ+J0mσ)}J−1 =Σ, where Jmσ =E
1 σ2t
∂mt(θ0)
∂θ
∂σt(θ0)
∂θ0
and µ3=Eη3t.
3For instance in the modelyt=m0+σtηtwhereσ2t =ω+α(yt−1−m)2, the derivative ofσtwith respect tomis not positive, and even not of constant sign.
The assumptions of Section 2 have to be modied as follows.
B6: The functionθ →mt(θ) and θ→σt(θ)have continuous second-order derivatives, and sup
θ∈Θ
∂mt(θ)
∂θ −∂met(θ)
∂θ
+ sup
θ∈Θ
∂σt(θ)
∂θ −∂σet(θ)
∂θ
≤Ktρt, whereKt is as in B3.
B7: There exists a neighborhoodV(θ0) ofθ0 such that E sup
θ∈V(θ0)
(
1 σt(θ0)
∂mt(θ)
∂θ
4
+
1 σt(θ0)
∂2mt(θ)
∂θ∂θ0
2)
<∞.
Let the additional assumption
B8: Fornlarge enough,n > n0 say, the conditional distribution ofηt givenθbn andFt−1 admits a density which is a.s. bounded, uniformly inn > n0 and t < n−t(n) wheret(n) =o(√
n).
Intuitively, the variablesθbn and ηt should be asymptotically independent in most standard situa- tions. For instance, consider the simple location model yt = θ0+σηt, with Gaussian innovations ηt. The conditional distribution ofηt givenθbn andFt−1, where θbn is the sample mean, is Gaussian with variance1− {n−(t−1)}−1. Thus B8 is satised for n0 >1 andt(n) = 1, with bound 1/√
π. The next theorem uses the notion of discrete estimator, which has been introduced by Le Cam (1960) and used by many authors (e.g. Kreiss (1987)).
Theorem 4. Under B1-B8 and A9, A11, if i) (xn) is a non-random sequence converging to x∈R; or ii) xn is a discrete √
n-consistent estimator of x, then
√n
Fbn(xn)−F(xn) L
→ N 0, F(x){1−F(x)}+H0(x)ΣH(x) + 2H0(x)Λ%(x) where%(x) =E(1lη0<xV(η0))andH(x) =f(x)(Ωm+xΩσ),Ωm=E
1 σt
∂mt(θ0)
∂θ
,Ωσ =EDt(θ0).
Every sequence of √
n-consistent estimators can be truncated to get a discrete sequence. The discretization can also be avoided, at the price of either reinforcing Assumption B8, or increasing the speed of convergence of the sequence(xn) (see Appendix).
More explicit forms of the asymptotic variance can be derived for the QMLE under the following extension of Assumption A12 which will be illustrated below.
A12∗: For any θ ∈ Θ, for any c > 0, and any sequence (xi), there exits θc ∈ Θ such that cσ(x1, x2, . . .;θ) =σ(x1, x2, . . .;θc) andm(x1, x2, . . .;θ) =m(x1, x2, . . .;θc).
The next result provides an explicit form for the asymptotic variance when the parameters of the conditional mean and variance are independent and are estimated by QML.