• Aucun résultat trouvé

Adaptiveness of the empirical distribution of residuals in semi- parametric conditional location scale models

N/A
N/A
Protected

Academic year: 2021

Partager "Adaptiveness of the empirical distribution of residuals in semi- parametric conditional location scale models"

Copied!
45
0
0

Texte intégral

(1)

HAL Id: hal-02898909

https://hal.archives-ouvertes.fr/hal-02898909

Preprint submitted on 14 Jul 2020

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Adaptiveness of the empirical distribution of residuals in semi- parametric conditional location scale models

Christian Francq, Jean-Michel Zakoïan

To cite this version:

Christian Francq, Jean-Michel Zakoïan. Adaptiveness of the empirical distribution of residuals in semi- parametric conditional location scale models. 2020. �hal-02898909�

(2)

Adaptiveness of the empirical distribution of residuals in semi- parametric conditional location scale models

Christian Francq and Jean-Michel Zakoïan

CREST and University of Lille, 5 Avenue Henri Le Chatelier, 91120 Palaiseau, France.

E-mail: [email protected], [email protected]

Summary. This paper addresses the problem of deriving the asymptotic distribution of the empirical

distribution functionFbn of the residuals in a general class of time series models, including conditional mean and conditional heteroscedaticity, whose independent and identically distributed errors have unknown distributionF. We show that, for a large class of time series models (including the standard ARMA-GARCH), the asymptotic distribution of

n{Fbn(·)F(·)} is impacted by the estimation but does not depend on the model parameters. It is thus neither asymptotically estimation free, as is the case for purely linear models, nor asymptotically model dependent, as is the case for some nonlinear models. The asymptotic stochastic equicontinuity is also established. We consider an application to the estimation of the conditional Value-at-Risk.

(3)

1. Introduction

For independent variablesη1, . . . , ηn with common distributionF, the celebrated Glivenko-Cantelli theorem states the uniform almost sure convergence of the empirical distribution,

sup

x∈R

|Fn(x)F(x)| →0, a.s.

where Fn(x) = n1Pn

t=11t≤x} for all x. The usual Central Limit Theorem (CLT) shows that

n{Fn(x)F(x)}is asymptoticallyN(0, F(x)(1F(x))distributed without any further assump- tion. Moreover, under the assumption that F is continuous, the sequence {

n[Fn(·)F(·)]} is stochastically equicontinuous, in the sense that

n{Fn(xn)F(xn)} → NL (0, F(x)(1F(x)) for any sequence(xn) converging tox in probability, for any x (see Billingsley (1968, Section 22) and Andrews (1994)). The latter convergence in distribution has important statistical applications, for instance the derivation of the asymptotic distribution of the empirical quantiles (see below).

Time series observations are rarely independent and identically distributed (iid), but time series models often involve iid innovations. Modern time series analysis often relies on semi-parametric models in which the distributionF of the innovations ηt is not fully specied. Many such models are of the location-scale form yt = mt0) +σt0t, where mt and σt are functions of the past observationsyt−i, i >0, and the sequencet)is iid. Consistent estimation of the parameterθ0 can be achieved by several methods. In particular, the Quasi-Maximum Likelihood (QML) estimation method generally relies on a Gaussian criterion - written as if the distribution F was Gaussian - but provides consistent estimators while remaining agnostic concerning the true distribution.

Even if no distributional assumption is made during the estimation phase, using semi-parametric time series models may require information about the errors distribution F. For instance, in the location-scale model the conditional Value-at-Risk (VaR) at levelα (0,1) is equal to mt+σtξα whereξαis theα-quantile ofF. Information aboutF can be obtained via the empirical distribution function of the residuals (e.d.f.r.) Fbn. Consistent estimation of the parameters produce residuals which are obviously not iid but converge to the innovations. One question of importance is whether the previous properties on Fn apply to Fbn, i.e. when innovations are replaced by residuals.

For many applications, the asymptotic distribution of the sequence{

n[Fbn(·)−F(·)]}is required.

In the statistical literature, two kinds of results have been established. In some situations, the asymptotic distribution is asymptotically estimation free (AEF), in the sense that, asymptotically,

n{Fbn(·) F(·)} =d

n{Fn(·) F(·)}. Such results go back to Boldin (1982), for the AR(p) process, and have been generalized to linear processes by Kreiss (1991) and Bai (1994). On the other hand, existing results for nonlinear processes show that the distribution of

n{Fbn(·)F(·)}

can be much more complicated. It can be asymptotically model dependent (AMD), in the sense

(4)

that the asymptotic variance depends on both the law of the innovations F and the parameter θ0. For dierent classes of conditionally heteroscedatic time series, the asymptotic distribution was investigated by Boldin (1998), and Lee and Taniguchi (2005) among others, and for the squared residuals by Horváth, Kokoszka and Teyssière (2001) and Berkes and Horváth (2003). In all these works, the distribution of the statistics involving the residuals is AMD. To overcome the AMD, Koul and Ling (2006) proposed a modication of the Kolmogorov-Smirnov test based on residuals.

See Berkes and Horváth (2002) for a review of the asymptotic behaviour of e.d.f.r.

The main contribution of this paper is to show that, for a large class of time series models (in- cluding the standard ARMA-GARCH), the asymptotic distribution of

n{Fbn(·)F(·)}is neither AEF nor AMD. In some cases, which we are able to characterize, the distribution is asymptotically model free (AMF), however it is impacted by the estimation, i.e. it is asymptotically estimation dependent (AED), though only through the errors distribution. The AMF property simplies dra- matically the use of such asymptotic results in practical applications based on potentially complex time series models. On the other hand, the AED property tells us that usual statistics available for observed iid processes cannot be directly employed.

We start by considering volatility modelling and QML estimation, which constitutes the most widely used framework for nancial returns. The absence of a conditional mean and the choice of a specic estimation method allows us to derive ner results. However, from an extended perspective, we will also consider conditional location-scale models in which both rst conditional moments are involved, and more general estimation methods. We will also investigate an application to the VaR estimation.

The rest of the paper is organized as follows. In the next section we introduce a general para- metric volatility model and the main assumptions ensuring the consistency of the e.d.f.r. Next, we state our rst main result, establishing the asymptotic distribution of the e.d.f.r. Section 3 considers the extension to conditional location-scale models. The application to VaR is developed in Section 4. In particular, we provide conditions for the strong consistency and asymptotic normality of the residuals quantiles. Section 5 concludes. Most proofs are postponed to Section 6 or to an Appendix.

2. Empirical distribution of the residuals for volatility models

In this section, we focus on conditional scale models of the multiplicative form

t=σtηt, σt=σ(t−1, t−2, . . .;θ0), (1) wheret) is a sequence of iid random variables,E(ηt2) = 1,θ0 is a vector of unknown coecients which belongs to a compact parameter setΘRd, andσ is a positive function. Note that we do

(5)

not assume thatE(ηt) = 0.

For instance, the standard GARCH(p, q) model writes t=σtηt, σ2t =ω0+

q

X

i=1

α0i2t−i+

p

X

j=1

β0jσt−j2 , (2)

whereθ0 = (ω0, α01, . . . , β0p)0 satisesω0>0, α0i 0, β0j 0.Under strict stationarity, invertibil- ity of the polynomialB(z) = 1Pp

j=1β0jzj holds and the volatilityσ2t can indeed be written as a linear function of the past values of 2t.

We start by considering the Gaussian QML estimation method for which explicit conditions for Consistency and Asymptotic Normality (CAN) can be displayed. More general estimators will be considered in Section 3. The setup can be described as follows. Given observations 1, . . . , n, and using arbitrary initial values˜i for i0, we dene for anyθΘ,

˜

σt(θ) =σ(t−1, t−2, . . . , 1,˜0,˜−1, . . .;θ), which will be used as a proxy ofσt(θ) =σ(t−1, t−2, . . . , 1, 0, −1, . . .;θ).

A Gaussian QML estimator (QMLE) ofθ0 is dened as θbn= arg min

θ∈Θ

1 n

n

X

t=1

`˜t(θ), `˜t(θ) = 2t

˜

σ2t(θ)+ log ˜σt2(θ). (3) Let K > 0 be a generic constant or random variable measurable with respect to F0, where Ft denotes the σ-algebra generated bys, st}. Let ρ(0,1). We shall assume the following.

A1: (t) is a strictly stationary, non-anticipative (i.e. t ∈ Ft) and ergodic solution of Model (1).

Moreover,E|σt|r< for some r >0.

A2: For any real sequence (xi), the function θ 7→ σ(x1, x2, . . .;θ) is continuous dierentiable.

Almost surely, σt(θ) (ω,∞]for any θ Θand for some ω >0. Moreover, σt0)/σt(θ) = 1a.s. iθ=θ0.

A3: supθ∈Θt(θ)σ˜t(θ)| ≤t.

A4: There exists a neighborhood V0) of θ0 such that E

supθ∈V0)σσt0)

t(θ)

r

< and Esupθ∈V0)kDt(θ)kr <∞,whereDt(θ) =σt−1(θ)∂σt(θ)/∂θ.

Assumptions A1-A3 are a set of conditions ensuring the strong consistency of bθn (see Francq and Zakoian (2004, 2015)). Assumption A4 is introduced to control the dierence between the innovations and the residuals. 1

1For the classical GARCH(p, q) model, under invertibility conditions on the lag polynomial Bθ(z) = 1 Pp

j=1βjzj and assuming that the law ofη2t is nondegenerate, A1-A4 reduce to the rst part of A1.

(6)

Let the residuals ηbt=tσt(bθn). We note that, at least for t large enough, σ˜t(bθn)ω by A2- A3. We start by establishing a Glivenko-Cantelli result for the e.d.f.r. Fbn(x) = n1Pn

t=11{

ηbt≤x}.2 The following assumption is simply denoted A5 when it holds for all xR.

A5(x): ForxR, the cdf F of η is Lipschitz continuous in a neighborhood of x. Theorem 1. Under A1-A4,

(a) If A5(x) holds for xR, we have |Fbn(x)F(x)| →0 a.s.

(b) If A5 holds we havesupx∈R|Fbn(x)F(x)| →0 a.s.

Notice that the previous uniform convergence was established by Stute (2001) in the case of ARCH(q) processes. It is known (see for instance Stute and Schumann (1980)) that in the case of a stationary ergodic sequence, the Glivenko-Cantelli theorem is valid without any assumption onF. The following example shows that A5 is required for the consistency of the e.d.f.r.

Example 1 (Glivenko-Cantelli in failure). Consider the scale model t = σηt, σ > 0, where ηt is distributed over {−

2,0,

2} with Pt =

2) = Pt =

2) = 1/4. Let σbn be a consistent estimator of σ. The residualsηbt= σ

σbnηt have the e.d.f.r.

Fbn(x) = 1 n

n

X

t=1

1lt<0

! 1l2 σ

σnb ≤x<0+ 1 n

n

X

t=1

1lt≤0

!

1l0≤x<2 σ

bσn

+ 1lx≥2 σ

σnb

. It follows that supx∈R|Fbn(x)F(x)| ≥ |Fbn(−

2)F(−

2)|= 1/4 whenever σ < bσn, which has a non vanishing probability for the QML. Indeed, we have bσn2 = 1nPn

t=12t, thus P(σ < bσn) = P(1nPn

t=11l2t=0>1/2) =P(Xn> n/2) whereXn∼ B(n,1/2).

We now derive the e.d.f.r. asymptotic distribution under the following additional assumptions.

A6: θ0 belongs to the interior of Θ.

A7: There exist no non-zeroxRd such thatx0∂σ∂θt0) = 0, a.s.

A8: The functionθ 7→σ(x1, x2, . . .;θ) has continuous second-order derivatives, and sup

θ∈Θ

∂σt(θ)

∂θ σ˜t(θ)

∂θ

t. A9: There exists a neighborhood V0) ofθ0 such that

E sup

θ∈V0)

(

1 σt(θ)

∂σt(θ)

∂θ

4

+

1 σt(θ)

2σt(θ)

∂θ∂θ0

2

+

σt0) σt(θ)

4

+

σt(θ) σt0)

4)

<∞.

Moreover,κ4 :=E|ηt|4 <.

2For the classical GARCH(p, q) model, Berkes and Horváth (2003) established a Glivenko-Cantelli theorem for the empirical cumulative distribution function (cdf) of the squared residuals.

(7)

A10: All the coordinates of ∂σt∂θ0) are (strictly) positive.

A11: η1 admits a densityf which is continuous onR.

A12: For any θ Θ, for any c > 0, and any sequence (xi), there exits θc Θ such that cσ(x1, x2, . . .;θ) =σ(x1, x2, . . .;θc).

Assumptions A6 is required for the asymptotic normal distribution of bθn. Assumptions A7-A10 and A12 are satised for the standard GARCH (p, q) model, under the already mentioned reg- ularity assumptions. Assumption A12 is a stability-by-scaling property, which seems a desirable assumption for any volatility model (examples are provided in Francq and Zakoian (2015)).

We now state our rst main result showing that the law of n

FbnF

is AED but AMF.

Theorem 2. Let A1-A4, A6-A11 hold. Then, for any sequence (xn) of random variables converging in probability toxR,

n

Fbn(xn)F(xn)

= 1

n

n

X

t=1

{1lηt<xF(x)} −xf(x) 2

n 0J−1

n

X

t=1

(1η2t)Dt+oP(1), where =E(Dt), J =E(DtD0t) with Dt=Dt0).

If in addition A12 holds,

n

Fbn(xn)F(xn) L

→ N

0, F(x){1F(x)}+{xf(x)}2

4 41) +xf(x)%(x)

where %(x) =E(η021lη0<x)F(x).

Remark 1. A noticeable outcome of this theorem is that, under A12, the asymptotic distribution of the e.d.f.r. only depends on the underlying distribution of the innovations. It is completely independent of the model from which these residuals are derived. In the sense of Robinson (1987), it means that the e.d.f.r. is adaptive to the unknown value of θ0, and even to the unknown form of the function σ. This does not mean that we retrieve the asymptotic distribution of Fn derived for iid data. The usual asymptotic variance, F(x){1F(x)}, is only valid when xf(x) = 0. This is the case when x= 0 because the innovations and residuals at a given date have the same sign. The eect of estimation also vanishes when |x|increases to (under A9). Parameter estimation may reduce or increase the asymptotic variance (see Appendix).

Remark 2. One practical interest of adaptiveness is that estimation of the asymptotic variance reduces to estimation of characteristics of the innovations distribution. In particular, f(x) can be estimated by a Kernel density estimator based on the residuals. Theorem 2.1 in Kulperger and Yu (2005) shows that, in the standard GARCH case, the use of residuals instead of innovations has no asymptotic impact on the estimation of the density.

(8)

3. Including a conditional mean

In ARMA-GARCH models, the conditional variance specication is completed by a linear model for the conditional mean. In what follows, we consider a more general framework. We will not either restrict our investigations to the QML estimator. Suppose the model writes

yt=mt+t, t=σtηt

mt=m(yt−1, yt−2, . . .;θ0), σt=σ(yt−1, yt−2, . . .;θ0) (4) under the same assumptions on t),θ0 and Θas before. Model (4) includes the double-AR(p) of Ling (2007) in whichθ0 = (φ01, . . . , φ0p, ω0, α01, . . . , α0p)0 Rp×(0,∞)×[0,∞)p and

m(yt−1, yt−2, . . .;θ0) =

p

X

i=1

φ0iyt−i, σ(yt−1, yt−2, . . .;θ0) = v u u tω0+

p

X

i=1

α0iy2t−i,

but also more traditional ARMA-GARCH-type models, in which the volatility initially dened in terms of the t−i's can be rewritten as a function of theyt−i's.

Let, for arbitrary initial valuesy˜0,y˜−1, . . . ,for anyθ Θ, and for1tn,

met(θ) =m(yt−1, yt−2, . . . , y1,y˜0,y˜−1, . . .;θ), σ˜t(θ) =σ(yt−1, yt−2, . . . , y1,y˜0,y˜−1, . . .;θ).

As in the previous section we denote without "tilde", the functions mt(θ) and σt(θ) in which the initial values are replaced by variables at times anterior to 0. In the following assumptions, r >0 denotes a real number which can be chosen arbitrarily small.

B1: (yt) is a strictly stationary, non-anticipative and ergodic solution of Model (4). Moreover, E|mt|r<and E|σt|r<.

B2: bθn denotes any consistent estimator of θ0. The functions θ mt(θ) and θ σt(θ) are continuously dierentiable. Almost surely, σt(θ)(ω,∞]for anyθ Θand for someω >0. B3: supθ∈Θ(|mt(θ)met(θ)|+t(θ)σ˜t(θ)|)Ktρt whereKt∈ Ft−1 andsuptE(Ktr)<. B4: For any neighborhoodV0) of θ0, we have

E sup

θ∈V0)

∂mt(θ)

∂θ

r

+|mt0)mt(θ)|r+kDt(θ)kr

< where

Dt(θ) = 1 σt(θ)

∂σt(θ)

∂θ .

Note that the conditionK ∈ F0 in A3 has been weaken in B3, in particular to be able to handle the ARMA-GARCH under standard conditions.

We start by extending the Glivenko-Cantelli-type Theorem 1. Let the residuals bηt = {yt met(bθn)}/˜σt(bθn).

(9)

Theorem 3. Under B1-B4 (instead of A1-A4), the conclusions of Theorem 1 hold.

Hence, extending the framework of the previous section does not alter the consistency results. We will now see that the same conclusion cannot be drawn concerning the asymptotic distribution of

n

FbnF

, which will no longer be model-free in general. Another dierence, which constitutes a major diculty, is that Assumption A10 can no longer be made when a conditional mean is included.3 We now assume that bθn admits a Bahadur representation.

B5: The following expansion holds

n

bθnθ0

= 1

n

n

X

t=1

t−1Vt) +oP(1),

where V(·) is a measurable function, V : R 7→ Rk for some positive integer k, and t−1 is a Ft−1-measurable d×k matrix. The variables t and Vt) belong to L2 with EVt) = 0, var{Vt)}=Υis nonsingular andE∆t=Λ is full row rank.

Under this assumption, the CLT for stationary second-order martingale dierences of Billingsley (1961) can be applied. It follows that the asymptotic law of

n

bθnθ0

is a centered Gaussian with variance Σ:=E(∆tΥ∆0t). Assumption B5 has to be veried on a case by case basis, given specic model and estimator.

Remark 3. For the Gaussian QMLE θbn of θ0, bθn= arg min

θ∈ΘQn(θ), Qn(θ) = 1 n

n

X

t=1

`˜t(θ), `˜t(θ) = {ytm˜t(θ)}2

˜

σ2t(θ) + log ˜σ2t(θ), it can be shown that

n(bθnθ0) = 2J−1 1

n

n

X

t=1

ηt

1 σt

∂mt0)

∂θ + (ηt21)1 σt

∂σt0)

∂θ

+oP(1), (5) where

J =E

∂`2t0)

∂θ∂θ0

= 2Jm+ 4Jσ, Jm =E 1

σt2

∂mt0)

∂θ

∂mt0)

∂θ0

andJσ =EDt0)D0t0).The Bahadur expansion in B5 is thus satised, ifE(ηt) = 0, withk= 2 and

Vt) = (ηt, ηt21)0, t−1 = 2J−1 1

σt

∂mt0)

∂θ

1 σt

∂σt0)

∂θ

, Λ= 2J−1[Ωm σ]. We also have Varas{

n(θbn−θ0)}= 4J−1{Jm+ (κ41)Jσ+µ3(J+J0)}J−1 =Σ, where J =E

1 σ2t

∂mt0)

∂θ

∂σt0)

∂θ0

and µ3=3t.

3For instance in the modelyt=m0+σtηtwhereσ2t =ω+α(yt−1m)2, the derivative ofσtwith respect tomis not positive, and even not of constant sign.

(10)

The assumptions of Section 2 have to be modied as follows.

B6: The functionθ mt(θ) and θσt(θ)have continuous second-order derivatives, and sup

θ∈Θ

∂mt(θ)

∂θ met(θ)

∂θ

+ sup

θ∈Θ

∂σt(θ)

∂θ σet(θ)

∂θ

Ktρt, whereKt is as in B3.

B7: There exists a neighborhoodV0) ofθ0 such that E sup

θ∈V0)

(

1 σt0)

∂mt(θ)

∂θ

4

+

1 σt0)

2mt(θ)

∂θ∂θ0

2)

<∞.

Let the additional assumption

B8: Fornlarge enough,n > n0 say, the conditional distribution ofηt givenθbn andFt−1 admits a density which is a.s. bounded, uniformly inn > n0 and t < nt(n) wheret(n) =o(

n).

Intuitively, the variablesθbn and ηt should be asymptotically independent in most standard situa- tions. For instance, consider the simple location model yt = θ0+σηt, with Gaussian innovations ηt. The conditional distribution ofηt givenθbn andFt−1, where θbn is the sample mean, is Gaussian with variance1− {n(t1)}−1. Thus B8 is satised for n0 >1 andt(n) = 1, with bound 1/

π. The next theorem uses the notion of discrete estimator, which has been introduced by Le Cam (1960) and used by many authors (e.g. Kreiss (1987)).

Theorem 4. Under B1-B8 and A9, A11, if i) (xn) is a non-random sequence converging to xR; or ii) xn is a discrete

n-consistent estimator of x, then

n

Fbn(xn)F(xn) L

→ N 0, F(x){1F(x)}+H0(x)ΣH(x) + 2H0(x)Λ%(x) where%(x) =E(1lη0<xV0))andH(x) =f(x)(Ωm+xΩσ),m=E

1 σt

∂mt0)

∂θ

,σ =EDt0).

Every sequence of

n-consistent estimators can be truncated to get a discrete sequence. The discretization can also be avoided, at the price of either reinforcing Assumption B8, or increasing the speed of convergence of the sequence(xn) (see Appendix).

More explicit forms of the asymptotic variance can be derived for the QMLE under the following extension of Assumption A12 which will be illustrated below.

A12: For any θ Θ, for any c > 0, and any sequence (xi), there exits θc Θ such that cσ(x1, x2, . . .;θ) =σ(x1, x2, . . .;θc) andm(x1, x2, . . .;θ) =m(x1, x2, . . .;θc).

The next result provides an explicit form for the asymptotic variance when the parameters of the conditional mean and variance are independent and are estimated by QML.

Références

Documents relatifs

Adaptive nonparametric estimation for censored data in presence of covariates has been considered by LeBlanc and Crowley (1999) or Castellan and Letu´e (2000) for particular

Keywords: Marker-dependent counting process; Conditional intensity; Model selection; Adaptive estimation; Minimax and nonparametric methods; Censored data; Conditional Hazard

In the Wigner case with a variance profile, that is when matrix Y n and the variance profile are symmetric (such matrices are also called band matrices), the limiting behaviour of

Nev- ertheless, the main themes from the annual panels hold up: (i) there is no sta- tistically significant unconditional effect of democratic switches; (ii) the effect of

It is widely recognized that an interdependent network of cytokines, including interleukin-1 (IL-1) and tumour necrosis factor alpha (TNF α), plays a primary role in mediating

In the Falck case, the far-sighted family champion of change Alberto Falck—with crucial support of the external CEO Achille Colombo—was able to de-escalate the family business

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des

In a proportional hazard model without errors, the main drawback of the least squares criterion, compared to the partial log-likelihood, is that it does not allow to separate