Report
Reference
Multivariate wavelet-based shape preserving estimation for dependent observations
COSMA, Antonio, SCAILLET, Olivier, SACHS, Rainervon
Abstract
We present a new approach on shape preserving estimation of probability distribution and density functions using wavelet methodology for multivariate dependent data. Our estimators preserve shape constraints such as monotonicity, positivity and integration to one, and allow for low spatial regularity of the underlying functions. As important application, we discuss conditional quantile estimation for nancial time series data. We show that our methodology can be easily implemented with B-splines, and performs well in a nite sample situation, through Monte Carlo simulations.
COSMA, Antonio, SCAILLET, Olivier, SACHS, Rainervon. Multivariate wavelet-based shape preserving estimation for dependent observations. 2005
Available at:
http://archive-ouverte.unige.ch/unige:5760
Disclaimer: layout of this document may differ from the published version.
2005.04
FACULTE DES SCIENCES
ECONOMIQUES ET SOCIALES
HAUTES ETUDES COMMERCIALES
Multivariate wavelet-based shape preserving estimation for dependent observations
A. Cosma
O. Scaillet
R. Von Sachs
Multivariate wavelet-based shape preserving estimation for dependent observations
Antonio Cosma1 Olivier Scaillet2 Rainer von Sachs3∗
July 13, 2006
Abstract
We introduce a new approach on shape preserving estimation of cumulative distribution functions and probability density functions using the wavelet methodology for multivariate de- pendent data. Our estimators preserve shape constraints such as monotonicity, positivity and integration to one, and allow for low spatial regularity of the underlying functions. We discuss conditional quantile estimation for financial time series data as an application. Our methodology can be implemented with B-splines. We show with Monte Carlo simulations that it performs well in finite samples and for a data-driven choice of the resolution level.
Keywords: Conditional quantile, time series, shape preserving wavelet estimation, B-splines, mul- tivariate process.
Abbreviated title: Shape preserving wavelet estimation
AMS 2000 classification: 62G05, 62G07, 42C40, 41A15. JEL classification: C14, C15, C32.
1Luxembourg School of Finance, Universit´e du Luxembourg, Luxembourg. [email protected]
2HEC Gen`eve and Swiss Finance Institute, Gen`eve, Suisse. [email protected]
3Institut de statistique, Universit´e catholique de Louvain, Louvain-la-Neuve Belgique. [email protected]
∗Corresponding author
1 Introduction
The construction of shape-preserving estimators of probabilistic functions, such as probability den- sity functions (pdfs) and cumulative distribution functions (cdfs), has attracted recent interest, see for instance Cheng et al. (1999). When estimating a pdff(x), we expect an estimator which fulfills nonnegativity and integration to one. When estimating a cdf F(x), we expect an estimator which fulfills monotonicity and right-continuity. Shape preserving means building functional estimators fˆ(x) or ˆF(x) which display such properties. In contrast to most nonparametric approaches, we aim at dealing with probabilistic functions with low spatial regularity, i.e. with occasional jumps or other discontinuities. In this set-up wavelet methods are known to be of relevance (Vidakovic (1999), Ogden (1996)), but meeting shape constraints is not clear in wavelet estimation.
In this paper we study shape-preserving wavelet estimation of pdfs and cdfs. We model the observed data as serially dependent. We do not need post-processors to implement the shape con- straints. Our construction is shape-preserving but not shape-imposing. We can also deal with non-monotone or non-positive functions. We start from the construction of Dechevsky and Penev (1997, 1998), hereafter DP. Their analysis of the estimation of a univariate probabilistic function with low spatial regularity with non-orthogonal wavelets is a pure theoretical one and for an identi- cally and independently distributed (i.i.d.) model of the data. DP do not suggest an algorithm to bring their method to the data. Our goal is to estimate amultivariatefunction under less stringent conditions than the usual ones in nonparametric approaches. Each component of the multivariate function can be either of a pdf type or a cdf type. Note that a direct transfer of the DP-construction to a multivariate framework is not possible. One of the main motivations for this extension is con- ditional quantile estimation, for dependent data in financial time series. There we need a general methodology which can estimate ad–dimensional function which is a cdf in one component, and a multivariate pdf in the remaining d−1 components.
To summarize, the three main contributions of this paper are,
• the definition of appropriate norms of convergence for an estimator of a multivariate function which is a cdf in one of its components;
• the generalization of the univariate results of DP for ani.i.d.model to the case of multivariate time series data subject to mixing conditions;
• the design of fast algorithms to implement our method.
Doukhan (1988) and Doukhan and L´eon (1990) provide early work on wavelet density esti- mation with univariate independent data. Masry provides a generalization to dependent data using orthonormal bases in the univariate setting (Masry (1994)) and in the multivariate setting (Masry (1997)), but the latter analysis is limited to the case of uniform convergence on compact sets. Kerkyacharian and Picard (1992) are the first authors to derive optimality results for lin- ear wavelet density estimation in function spaces, such as Besov spaces. Tribouley (1995) studies linear wavelet methods for multivariate density estimation. Nonlinear wavelet methodologies are applied to univariate density estimation in Donoho et al. (1996) and Kerkyacharian et al. (1996), and in Tribouley and Viennet (1998) for β−mixing data. In nonlinear methods, orthogonality or bi-orthogonality of the underlying wavelet bases has to be imposed. Other recent studies on density estimation with wavelets aim at dealing with shape preserving properties, see for instance Penev and Dechevsky (1997) and Pinheiro and Vidakovic (1997). These approaches use devices such as pre- or post-processing.
Choosing the DP shape preserving wavelets does not only overcome the need of pre- or post- processors but also yields the following advantages: 1) Since orthogonality has to be given up, we rely on a simple construction using B-splines. It allows us to have analytic expressions of our
basis functions in the time domain. This is essential for deriving the reconstruction of a cdf by integration; 2) The proof technique applies simultaneously to the pdf case and the cdf case; 3) Our approach gives general results for linear wavelet density estimation without a restriction to the Besov space framework of Kerkyacharian and Picard (1992).
Shape-preserving estimation of probabilistic functions turns out to be interesting for a variety of nonparametric estimation problems. To give only a few examples beyond multivariate density estimation, we state hazard rate estimation (Hall and Van Keilegom (2004)), and logistic regres- sion, see for instance McFadden and Train (2000) for an application to Mixed Multinomial Logit Models (MMNL). D. McFadden in his Nobel Prize lecture (McFadden (2003)) states explicitly that an appropriate multivariate extension of the DP set-up is required in an MMNL framework. This ensures that the multivariate indirect utility functions determining the choice probabilities display the required shape restrictions. Shape preserving estimators have also been designed for functional estimation in the context of diffusion processes (Chen et al. (1998)). Another application is quantile regression. It provides robust estimators such as median estimators, and allows to characterize the heterogeneous impact of variables on different points of a distribution. It finds important applica- tions in finance and insurance (quantiles of loss distributions), and in labor economics (measures of income inequality).
We briefly discuss quantile regression when we develop our main application, namely nonpara- metric estimation of conditional quantiles for time series, the conditioning information being the past observations of the time series. As pointed out by Hall et al. (1999) and Cai (2002), the shape preserving property of a cdf estimator is particularly important for quantile estimation. In our approach we strongly benefit from a direct wavelet estimate which is monotone and constrained to lie between 0 and 1. This is in contrast to other popular methods for quantile regression - see for instance the modified local linear quantile estimators of Yu and Jones (1998). Our time series framework calls for a particular care in proving consistency of our estimator. We provide such a result under less stringent assumptions on the smoothness of the conditional and marginal distribution functions of the random process.
Our paper is organized in the following way.
Section 2 gives an introduction to shape preserving wavelets and estimation of univariate prob- abilistic functions developed by DP. We present the relevant terminology and concepts such as moduli of smoothness, seminorms, as well as appropriate risk definitions. In Section 3 we briefly recall the main results of DP which are essential for our work. For details summarizing their work we refer to Appendix A in Cosma et al. (2005). Section 4 presents our theoretical contributions:
Theorem 1 states the most general result of this article. It is an extension of the univariate results of Section 3 to higher dimensions and the case of time series data. In Section 5 we examine quantile regression and, in particular, conditional quantile estimation for financial time series data. We dis- cuss numerical implementation via B-splines, and present a simulation study. A data-driven choice of the resolution level is investigated numerically. In a short conclusion we discuss some ideas for future research. The outline of the proofs is deferred to an appendix. The detailed proofs can be found in the technical report Cosma et al. (2005).
2 Preliminaries on shape-perserving wavelets
We start this section by introducing the concept of Multiresolution Analysis (MRA). Let L2(R) be the space of square integrable functions defined on the real line, that is L2(R) = ©
f : R → R¯
¯R+∞
−∞ dxf(x)2 < ∞ª
. An MRA is a sequence of closed subspaces Vj ⊂ L2(R), j ∈Z, with the following properties (Meyer (1992)): Vj ⊂Vj+1, ∩Vj ={0}, ∪Vj =L2(R),and for allv(x)∈L2(R) and j, k ∈ Z, v(x) ∈ Vj ⇔ v(2x) ∈ Vj+1 and v(x) ∈ V0 ⇔ v(x−k) ∈ V0. Moreover, a scaling
function ϕ∈V0 exists such that{ϕ(x−l)|l∈Z}is a Riesz basis of V0. It follows that in general ϕ(2jx)∈Vj, and©
ϕjk(x)ª
k∈Z
=. {2j/2ϕ(2jx−k)|k∈Z} is a Riesz basis inVj.
Orthogonal projection was the first idea exploited in wavelet analysis. It leads to a construc- tion of orthonormal bases of scaling functions, such thatR+∞
−∞ ϕ(x−l)ϕ(x−k)dx=δkl. However, orthogonality poses a number of constraints on the construction of the scaling functions. For in- stance it is not possible to construct{ϕ(· −l)}l∈Zthat are at the same time continuous, orthogonal, nonnegative and symmetric. Moreover we want to build shape preserving operators, that is projec- tion operators that map nonnegative functions to nonnegative functions and monotone functions to monotone functions. To achieve this, we need extra freedom in building the proper scaling func- tions. Hence we introduce a non-orthogonal projection operator starting from two different families of scaling functions{ϕ(2jx−k)}k∈Z and {ϕ(2˜ jx−k)}k∈Z. The projector on the space Vj is given by:
Aj¡ f¢
(x) =X
k∈Z
2jhf,ϕ(2˜ j · −k)i ϕ(2jx−k). (2.1) The two families of functions are called the primal basis and the dual basis. The primal basis is generated from the scaling functionϕ, and the dual basis is generated from the scaling function ˜ϕ.
We require that the two families display the following properties. Letϕbe such that
ϕ(x)>0, x∈R; (2.2)
ϕ(x) bounded, right continuous; (2.3)
suppϕ⊂[−a, a), a>1/2; (2.4)
X∞
k=−∞
ϕ(x−k)≡1,onR; (2.5)
there existsb∈(−a, a) such that
ϕis not decreasing for x6b and non-increasing for x>b. (2.6) Then let the dual scaling function be such that:
˜
ϕsatisfies (2.2), (2.4), ϕ˜∈L1, and Z +∞
−∞
dtϕ(t) = 1.˜ (2.7)
As for the primal basis, we define the scaled versions of ˜ϕ such that ©
˜ ϕjk(x)ª
k∈Z
=. {2j/2ϕ(2˜ jx− k) |k∈Z}.
The conditions given on ˜ϕ are weaker than those on ϕ. Condition P∞
−∞ϕ(x−k) = 1 implies R+∞
−∞ dt ϕ(x) = 1 (see Anastassiou and Yu (1992) for a proof). In particular, this means that both ϕ(x) and ˜ϕ(x) are normalized in the L1 norm.
The following notation is also used:
ςϕ,ϕ˜(t)=. Z +∞
−∞
dτ τϕ(τ˜ )− X+∞
k=−∞
(t−k)ϕ(t−k). (2.8)
Let us examine why these assumptions guarantee a shape-preserving approximation (2.1) of a pdf or a cdf. Assumption (2.2) on ϕ and ˜ϕ ensures that the reconstruction (2.1) of a pdf and a cdf is nonnegative. If assumption (2.5) is also satisfied by ˜ϕ, then the approximation of a pdf integrates to 1. Assumptions (2.3) and (2.6) are specific to shape preserving approximation of a cdf. Assumption (2.3) guarantees that the reconstruction has the minimum regularity conditions of a cdf (boundedness and right continuity), while assumption (2.6), jointly with the nonnegativity of ˜ϕ, guarantees the monotonicity of the reconstruction.
Assumption (2.4) is a usual compact support assumption. It boils down to the use of finite length filters in the implementation of the discrete wavelet transform and implies that νa= #©
ϕjk ¯
¯ x∈ Supp (ϕjk)ª
is independent of scalejand locationk. Assumptions (2.5) and (2.8) together with the condition ςϕ,˜ϕ = 0 almost everywhere are equivalent to the usual moment conditions in wavelet approximation theory. Define the moments of the dual scaling function: M˜0 =R+∞
−∞ dtϕ(t),˜ M˜1= R+∞
−∞ dt tϕ(t).˜ Then (2.5) and the condition ςϕ,˜ϕ = 0 can be rewritten in the following way:
+∞X
k=−∞
(t−k)p ϕ(t−k) = ˜Mp, p= 0,1. (2.9) These two conditions ensure that the multiresolution analysis Vj reproduces exactly polynomials of degree less or equal to 1. We say that the multiresolution analysis fulfills aStrang-Fix condition of order 1.
We finish this second section by recalling the useful concept of modulus of smoothness. This concept is used later on to derive the approximation properties of the projection operator (2.1) in general function spaces, such as Sobolev and Besov spaces (see Nikol’ski˘ı (1975)). For functions defined on a region Ω∈Rd, we introduce the increment of the functionf in the directioniand the corresponding modulus of smoothness. Let ∆1itf(x) =f(x+it)−f(x), ∆µitf(x) = ∆1(∆µ−1it f(x)), then, for h >0,µ∈N and 16p6∞, the integral p-modulus of smoothness in thei direction is given by
ωµi(f, h)p = sup
0<t6h
°°∆µitf(x)°°
p, (2.10)
with the usual convention of the sup
x -norm forp=∞, which is the classical modulus of continuity.
3 Shape preserving estimation of univariate probabilistic func- tions
First we briefly summarize the main concepts of DP in thei.i.d.case for the estimation ofunivariate probabilistic functions (pdfs and cdfs) by means of shape preserving wavelets. We recall these results since they have inspired our own work for multivariate time series data. Note that we need to define an estimation risk in a function norm which is appropriate for treating simultaneously the error when estimating a pdf or a cdf nonparametrically. See equation (3.5). In the two cases the inner products in equation (2.1) can be estimated from the observed data (X1, . . . , Xn) in the following way:
hf,\ϕ˜jki=hf,ϕ˜jk(X)i= 1 n
Xn i=1
˜
ϕjk(Xi), iff is a pdf, (3.1)
hF,\ϕ˜jki=hF,ϕ˜jk(X)i= 1 n
Xn i=1
2−j2 ©
1−Φ(2˜ jXi−k)ª
, ifF is a cdf, (3.2) with ˜Φ(x) =Rx
−∞ϕ(t)˜ dt. Sincef is a pdf,hf,ϕi˜ =E[ ˜ϕ]. Then (3.1) is an estimator of the expected value of ˜ϕjk. In a similar way we can obtain (3.2) by integration by parts
hF,ϕ˜jki = 2−j/2−2−j/2E[Φ(2jX−k)],
using the boundedness of the support and the normalization properties of ˜ϕjk. It then follows that the estimators for a univariate pdf and a univariate cdf are given by:
f(x) = ˆˆ A(n)j ¡ f¢
(x) = 1 n
X
k∈Z
Xn i=1
˜
ϕjk(Xi)ϕjk(x), iff is a pdf; (3.3) F(x) = ˆˆ A(n)j ¡
F¢
(x) = 1 n
X
k∈Z
Xn i=1
2−j2©
1−Φ(2˜ jXi−k)ª
ϕjk(x), ifF is a cdf. (3.4) Lemma 1. Let f be either a pdf or a cdf. Let ϕ,ϕ˜fulfill assumptions (2.2)to (2.7) and, iff is a pdf, let ϕ˜ fulfill also (2.5). Then the estimator Aˆ(n)j ¡
f¢
derived from the operator (2.1)using (3.3) or (3.4)is shape preserving.
By shape preserving, we mean that if f is a pdf, then ˆA(n)j ¡ f¢
is a nonnegative function that integrates to 1, and if F is a cdf, then ˆA(n)j ¡
F¢
is a monotone, right-continuous function and limx→±∞Aˆ(n)j ¡
F¢
(x) = 0,1. For the proofs we refer to Lemma 2.2.1 in DP (1997) for the pdf case, and to Lemma 2.1.1 in DP (1997) and Lemma 3 in Anastassiou and Yu (1992) for the cdf case.
The shape preserving properties of estimators (3.3) and (3.4) come from the approximation results derived in DP (1997).
To assess the behavior of the estimators we define a risk using the following quasi-norm for a functiong(x) defined on R, whose random values depend on the realization of (X1, . . . , Xn) :
°°g¯
¯Lp(Lq)°
°=
½ Z +∞
−∞
dx¡
E|g(x)|q¢p/q¾1/p ,
with 0< p, q 6∞. Recall that for a quasi-norm the triangular inequality holds with°
°g+h°
°A6 cA(°
°g°
°A+°
°h°
°A), cA >1. In order to be able to work with the usual triangular inequality, i.e.
cA= 1, we move to the space Lp(Lq)ρ with an appropriately chosen ρ >0 (see the definition and the discussion in Appendix D).
In theLp(Lq) quasi-norm thepparameter takes into account the smoothness of the function via (2.10), while the q parameter gives an additional degree of freedom to ensure that the estimation risk stays finite through a control of the tails. In the original work of DP (1998) the notation of the two parameters is inverted; in our setting we prefer to call p the “smoothness” parameter as is done for Besov spaces. Note that in contrast to usual Besov spaces, q is here connected to the stochastic dimension of the problem. Our change is notational only and does not alter the essence of the quasi-norm used in the original reference. For p=q, we get the usual Lp−risk, i.e. Ek · kp. The notationk · kp remains associated with the usualLp(Rd)-norm.
From now on let ˆf be an estimator for a pdf or a cdf. Then,
°°fˆ−f¯
¯Lp(Lq)°
°ρ=°
°fˆ−E( ˆf) +E( ˆf)−f¯
¯Lp(Lq)°
°ρ 6c(p, q, ρ)n°
°fˆ−E( ˆf)¯
¯Lp(Lq)°
°ρ+°
°E( ˆf)−f¯
¯Lp(Lq)°
°ρo
. (3.5) Hereafter we restrict ourselves to the ranges 1 6 p 6∞, 0 < q 62, and we always work with a choice ofρ such that c(p, q, ρ) = 1.
From equations (3.1) and (3.2), we can see that the estimators (3.1) and (3.2) are unbiased estimators of the inner products hf,ϕi, and that ˆ˜ Aj is also an unbiased estimator of Aj. The triangular inequality (3.5) can thus be rewritten as:
°°fˆ−f¯
¯Lp(Lq)°
°ρ6n°
°Aj¡ f¢
(·)−f(·)°
°ρ
p+°
°Aˆ(n)j ¡ f¢
(·)−Aj¡ f¢
(·)¯
¯Lp(Lq)°
°ρo
. (3.6)
The first result (DP (1997), Theorem 2.1.1) concerns the first part of equation (3.6), i.e., the bias term:
°°Aj¡ f¢
(·)−f(·)°
°p6c1kςϕ,ϕ˜k∞·ω1(f,21−ja)p+c2kϕk˜ p0 · kϕk∞·ω2(f,21−ja)p, (3.7) where p0 ∈ [1,∞] is such that 1p +p10 = 1, a is the length of the support of the scaling functions, and c1 >0 andc2 >0 are absolute constants that do not depend on the resolution levelj.
Note that, in contrast to pdfs, cdfs are not inLp with 16p <∞, but only inL∞. Yet by (3.7) theLp distance betweenAj(F) and F is bounded. This means that the approximation properties of Aj(F) can be studied in an appropriately chosenLp−norm.
It is possible to obtain how the bias depends explicitly on the resolution level j. To this end we would need to specify the function space to which f belongs and exploit the properties of the modulus of smoothness in this specific function space. For now, we just give a qualitative characterization of this dependence. Since ωµ(f,21−ja)p is increasing in its second argument by definition, the bias can be bounded by a decreasing function of j, so that equation (3.7) can be
rewritten as: °
°Aj¡ f¢
(·)−f(·)°
°p 6B(j), (3.8)
whereB(j) is a decreasing function ofj.
For the second term of (3.5), which is rewritten as
°°
°Aˆ(n)j ¡ f¢
(·)−E¡Aˆ(n)j ¡ f¢
(·)¢¯¯
¯Lp(Lq)
°°
°=
½ Z +∞
−∞
dx
³ E¯
¯Aˆ(n)j ¡ f¢
(x)−E( ˆA(n)j ¡ f¢
(x)¯
¯q´p
q
¾1
p , (3.9) we have two different behaviors depending onf being a cdf or a pdf.
For a cdf. (DP (1998), Theorem 2.1.1) The variance term (3.9) has a parametric decay O ¡
(1n)ρ/2¢
to zero. In this case the bias rate can be adapted by appropriately choosing the increasing functionj∗ =j∗(n) such that B(j∗) = O(n−ρ/2). The total risk (3.6) then decays at a parametric rate:
°°Aˆ(n)j ¡ F¢
(·)−F(·)¯
¯Lp(Lq)°
°ρ=O õ1
n
¶ρ/2!
, n→ ∞ . (3.10)
It can be easily seen that the above convergence rate can be achieved by choosingj>(p/2) log2n.
For a pdf. (DP (1998), Theorem 2.2.1) The variance term is an increasing function of j, that is (3.9) is bounded by a functionV(2j/n)¡
(2j/n)ρ¢
. When choosing the functionj∗ =j∗(n), we face the typical nonparametric trade-off between the competing behaviors of B and V as functions of j. In particular,j∗ =j∗(n) has to be an increasing function such that V(2j∗/n) =O¡
B(j∗)¢ .The convergence rate for the estimator of a pdf is of order
°°Aˆ(n)j ¡ f¢
(·)−F(·)¯
¯Lp(Lq)°
°ρ=O (2−j∗(n)ρ), n→ ∞ . (3.11) which is typically slower than the parametric one. These findings are the same as in classical wavelet estimation (H¨ardle et al. (1998), Kerkyacharian and Picard (1992)). For detailed results, we refer to Cosma et al. (2005), Appendix A, Corollaries 10 - 14.
4 Shape preserving estimation of multivariate probabilistic func- tions
This is the main section of our paper. Here we provide a multivariate extension of the results of DP for time series data. Our work is motivated by an application to quantile estimation (see Section 5).
We analyze a multivariate function F ∈ Rd which is a cdf in the last argument, and a pdf in the d−1 remaining arguments, i.e.,
FY(x, y) = Z y
−∞
dt f(x, t). (4.1)
However, our set-up allows for constructing estimators of multivariate densities f(x)∈Rd as well.
For ease of notation we present only the bivariate case. From now on the argumenty will always denote the variable with respect to which FY(x, y) is cumulated, andx the argument with respect to whichFY(x, y) is a density. The extension of the results to a bivariate pdf can be obtained with minor changes that will be given in the sequel.
Our constructions are based on tensor product wavelets. The primal and dual wavelet bases ϕ,ϕ˜ introduced in equations (2.2) - (2.7) are functions defined from R to R so that functions f :R→Rcan be approximated. It is straightforward to build scaling functions defined onRd, so that multivariate functions can be approximated by wavelet series, using tensor product wavelets.
It is known (see, e.g., Meyer (1992) Section 3.3) that, if {Vj}j∈Z is a multiresolution analysis of L2(R), then L2(R2) =
[∞
j1,j2=0
Vj1⊗Vj2.
Now we definej= (j1, j2) andVj =Vj1⊗Vj2, from which we can derive the 2-dimensional basis for each approximation space Vj:
{ϕjk(x, y)}k∈R2 ={ϕj1k1(x)}k1∈Z⊗ {ϕj2k2(y)}k2∈Z. (4.2) This construction can obviously be extended to any dimensiond >2. The use of independent scales ji, i= 1, . . . , d for each dimension allows for adaptation to possibly different regularity (over the different dimensions) of the d−dimensional function (see, e.g., Neumann and von Sachs (1997)).
Hence the cdf part can benefit from a higher j2 than thej1 of the pdf part.
In the sequel we provide approximation and estimation results of bivariate probability func- tions treating both the i.i.d. case and the serially dependent case. To emphasize the latter, we suppose that we have real-valued bivariate time series observations (Y1, X1), . . . ,(YT, XT),gener- ated from a stationary stochastic process {(Yt, Xt)}t∈Z. Its dependence structure is controlled via mixing conditions. In particular we could have that Yt = Zt and Xt = Zt−1, where {Zt}t∈Z is a univariate stationary process. Let Fik be the sigma-field of events generated by the random variables {(Yt, Xt), i6t6k}. The stationary process{(Yt, Xt)}t∈Z is called strongly or α- mixing if supA∈F0
−∞
B∈F∞ p
¯¯P[AB]−P[A]P[B]¯
¯ = α(p) −−−→
p→∞ 0. Below, if not differently stated, let f(·) be the “design” density fXt(x), which is the marginal distribution of the stationary process in the univariate time series case.
Our results are then derived under the following assumptions on the stochastic process.
Assumption 1: For every integer s > 0 the joint distribution F(X0,Y0),(Xs,Ys) exists and there is a positive constant M such that for every bounded zero-mean random variableT(Xt, Yt):
E[|T(X0, Y0)T(Xs, Ys)|]6M E[|T(X0, Y0)|]E[|T(Xs, Ys)|]. (4.3) Assumption 2: The process{(Xt, Yt)} isα-mixing and the coefficientsα(p) are such that:
X∞
p=N
£α(p)¤1−2/r
=O(N−1), (4.4)
forr >2.
Many processes verify the condition given on the mixing coefficients. Gaussian processes, non Gaussian autoregressive moving average processes (see Pham and Tran (1980)), many nonlinear
functionals of these processes, and various GARCH and stochastic volatility models, see Carrasco and Chen (2002).
We construct our estimators by mimicking the univariate constructions (3.3) and (3.4). We use the shape preserving scaling functions ϕ and ˜ϕthat fulfill assumptions (2.2) to (2.7). Recall that j= (j1, j2). Let{ϕjk(x, y)}k∈R2,{ϕ˜jk(x, y)}k∈R2 be the bivariate primal and dual bases (see (4.2)), then the estimators ofFY(x, y) and f(x, y) are given by:
Aˆ(Tj )¡ f¢
(x, y) = X
k∈Z2
n1 T
XT t=1
˜
ϕj1k1(Xt) ˜ϕj2k2(Yt) o
ϕjk(x, y), (x, y)∈R2, (4.5) Aˆ(Tj )¡
F¢
(x, y) = X
k∈Z2
XT t=1
2−j22
Ãϕ˜j1k1(Xt)
T −ϕ˜j1k1(Xt) ˜Φ(2j2Yt−k2) T
!
ϕjk(x, y), (x, y)∈R2. (4.6) Let us further introduce the multivariate approximator of the probabilistic functionf:
Aj¡ f¢
(x, y) = X
k∈Zd
hf,ϕ˜jkiϕjk(x, y). (4.7)
Note that the properties of the functions and the projectorsAj¡ f¢
(x, y) that allow us to approximate the cdf of one variable, cannot play the same role when we move to a multivariate analysis. In particular (3.7) does not continue to hold in the multivariate Lp−norm because it is not possible to bound the modulus of smoothness of a function which is in L∞ in the y-argument. We could use anL∞-norm but this would require continuity of the density part of FY(x, y). Instead we take advantage of the different convergence rates for a pdf and a cdf as discussed in Section 3, and we work with the following risk:
d©f(x, y),ˆ ˆg(x, y)ª
p= sup
y∈R
°°fˆ(·, y)−ˆg(·, y)¯
¯Lp(Lq)°
°. (4.8)
In order to use this norm, we have to assume continuity of FY(x, y) as a function ofy. We will see that the results on the approximation and estimation of the bivariate FY(x, y) look like the usual results on estimation of a univariate pdf. No additional effort is needed to interpret them.
Moreover, these results can be adapted with minor changes to a bivariate pdf, where the norm (4.8) becomes the usual Lp(R2)-norm. Then we do not need to make a continuity assumption on the bivariate pdf. Further details on the changes needed to adapt the pdf-cdf results to the pure pdf case will be given at the end of each of the following sub-sections.
4.1 Bias in multivariate cdf-pdf approximation
Here we bound the deterministic bias made by approximatingFY(x, y). A sketch of the proof can be found in Appendix A. The proof technique is largely inspired by the one in DP (1997), but with changes requested by the use of the norm (4.8). The proof is based on approximations by Steklov means. It allows us to relate the approximation error to the modulus of smoothness. We refer to Appendix C for a definition and relevant properties of Steklov means. Recall the definitions ofςϕϕ˜ in (2.8), and letex and ey be the unit vectors in thex and y directions, respectively.
Lemma 2. Let assumptions (2.2)to (2.7)hold. LetF(x, y)be, for a fixedx, a continuous function in y. Then, for16p6∞:
sup
y kAj¡ F¢
(·, y)−F(·, y)kp 6c1sup
y ωe1x¡
F(·, y),21−j1a¢
p+c2sup
y sup
0<δ<1
°°∆eyδ(21−j2a)F(·, y)°
°p
+c3 sup
y ωe2x¡
F(·, y),21−j1a¢
p+ c4sup
y sup
0<δ<1
°°∆2eyδ(21−j2a)F(·, y)°
°p.
(4.9)
The constants c1 and c2 include the value of the norms kςϕϕ˜(2j1x)k∞ and kςϕϕ˜(2j2y)k∞. If the latter twoL∞-norms are0, then the constants c1 andc2 will be smaller but they cannot be equal to 0.
The first order terms, i.e. the terms in which appear the constantsc1 andc2, apparently do not vanish in (4.9) even if the conditions kςϕϕ˜(2j1x)k∞ = 0 and kςϕϕ˜(2j2y)k∞ = 0 are fulfilled. This differs from equation (3.7) of the univariate setting. It is related to the proof technique and not to a fundamental reason.
Remark 1. For a bivariate pdf, we do not need to assume continuity off(x, y) with respect toy in Lemma 2. In (3.7) theLp-norm is taken with respect to both arguments, and the bound is given by the usual moduli of smoothness in the two directions. This amounts to the bivariate equivalent of (3.7).
4.2 Estimation of multivariate cdf-pdf from dependent data
We come now to our main results where we study the estimation of the bivariate FY(x, y) from time series observations{(X1, Y1), . . . ,(XT, YT)} generated under assumptions 1 and 2. The i.i.d.
case is a subcase of this one.
Again we make use of the triangular inequality to split the risk into a stochastic term and a bias term:
sup
y∈R
°°Aˆ(Tj )¡ F¢
(·, y)−F(·, y)¯
¯Lp(Lq)°
°ρ 6n
sup
y
°°Aj
¡F¢
(·, y)−F(·, y)°
°ρ
p+ sup
y
°°Aˆ(Tj )¡ F¢
(·, y)−Aj
¡F¢ (·, y)¯
¯Lp(Lq)°
°ρo . The bias part is bounded in Lemma 2. We now give a bound for the stochastic part.
Lemma 3. Let ϕ,ϕ˜ be as in (2.2) - (2.7). Let {(Xt, Yt)}t=0,...,T be realizations of a stationary process fulfilling assumptions 1 and 2. Let p>1, 0< q62 and ρ = min{2, p}. Assume, for fixed y, that FY(x, y)∈Lp/2∩L1∩Lp/r for r >2 . Let j2 >(p/2) log2T. Then
sup
y
°°
°Aˆ(Tj )¡ F¢
(·, y)−E£Aˆ(Tj )¡ F¢
(·, y)¤¯¯¯Lp(Lq)
°°
°ρ6V¯0+ ¯V1, (4.10) where
V¯0 = µ2j1
T
¶ρ/2· d1max
n sup
y
°°F(·, y)°
°1,sup
y
°°F(·, y)°
°p/2
oρ/2
+ d2(a) max n
sup
y ω1ex¡
F(·, y),21−j1a¢
1,sup
y ωe1x¡
F(·, y),21−j1a¢
p/2
oρ/2¸
+O
³ T−ρ/2
´
;
(4.11)
V¯1
V¯0 −→0, as T → ∞ . (4.12)
Moreover, we have max©
sup
y ω1(F(·, y),21−j1a)1,sup
y ω1(F(·, y),21−j1a)p/2ª
=o(1), j1 → ∞, if 26p <∞ ,
or if p=∞ and F is continuous also with respect to the argument x, or if 1< p <2 and sup
y sup
06t6h
Z +∞
−∞
dx
³ Z 1 0
dα F(x+αt, y)
´p/2
<∞.
Here d1 and d2(a) are absolute constants that do not depend on the resolution levelsj = (j1, j2), and a is the support of ϕ,ϕ. As it can be seen from equation (4.10) the stochastic component of˜ the risk has two contributions ¯V0 and ¯V1. We refer to the proof for an explicit form of the latter.
V¯0 is the only variance term if ˆFY(x, y) is estimated under ani.i.d. model.
In Cosma et al. (2005) Appendix F, we provide a slightly more general expression for the variance term in the i.i.d. set-up with a parameter range 0< q <∞.
A close inspection of (4.11) indicates that only the resolution parameterj1 appears. Formally the variance term obtained in Lemma 3 is equivalent to the one we would obtain for a univariate pdf, see Appendix A in Cosma et al. (2005). This is because, as remarked in the discussion following equations (3.10) and (3.11), the cdf like component of the variance has a faster convergence, and with the choice j2 >(p/2) log2T the convergence of the entire y component of the stochastic error is taken into account by theO(T−ρ/2) term . We can finally put together the results of Lemmas 2 and 3 to obtain the convergence rates of ˆA(Tj ).
Theorem 1. Let the assumptions of Lemmas 2 and 3 hold. Then the total risk of the estimator Aˆ(Tj ) of FY(x, y) is:
sup
y∈R
°°Aˆ(Tj )¡ F¢
(·, y)−F(·, y)¯¯Lp(Lq)°°ρ 6c1sup
y ω1ex¡
F(·, y),21−j1a¢ρ
p+c3 sup
y ω2ex¡
F(·, y),21−j1a¢ρ
p+ µ2j1
T
¶ρ/2n
C1+o(1) o
+O(T−ρ/2) +o(1).
(4.13)
The constants c1, c3 and C1, can be made explicit by comparing (4.13) with (4.9) and (4.11).
Again we see that only thej1 parameter appears. The incrementsk∆ieyfkof the functionFY(x, y) in the directiony that can be found in (4.9) are missing in (4.13), since they are taken into account in the O(T−ρ/2) term once we choose j2 to fulfill j2 > (p/2) log2T. The o(1) term takes into account the ¯V1 component of the variance. The risk given in Theorem 1 is formally equivalent to the risk in estimating a univariate pdf. Hence the explicit convergence rates of ˆA(Tj )¡
F¢
can be obtained again by choosing a resolution levelj1 in thexdirection that balances bias and variance.
They are equivalent to the convergence rates of the estimator of a univariate pdf having the same smoothness as the pdf part of FY (see Corollaries 10 - 14 in Cosma et al. (2005), Appendix A).
Remark 2. Lemma 3 can be extended to a bivariate pdf. The continuity assumption on the y variable is no more required, and we assume that f(x, y) ∈ Lp/2 ∩L1 ∩Lp/r. Then in (4.11) a (2j1+j2/T) factor appears instead of (2j1/T), and theO(T−ρ/2) term disappears.
Lemma 3 and the proofs deal with the bivariate distribution functionFY(x, y), but a close look at the proof reveals that the results can be immediately extended to a dimension d > 2. As a multivariate density has a finite Lp-module of smoothness forp <∞, we can extend Theorem 1 to functions FY(x, y), by looking at the uniform convergence in y of
sup
y
°°
°Aˆ(Tj )(f)(·, y)−EAˆ(Tj )(F)(·, y)
¯¯
¯Lp(Lq)
°°
° ,
where the above Lp-norm, made explicit in equation (3.9), is now defined onRd−1.
Let us briefly comment on the results obtained for the convergence of the distribution function FY(x, y). We have derived the upper bound and the asymptotic behavior of the stochastic term of the risk of the estimator (4.6) when the data come from a weakly dependent process. The risk is computed in the Lp(Lq)-norm. The advantage of separating the pointwise expectation from the global norm is probably more evident here than in other contexts. This can be seen by comparing our results with the ones in Masry (1994). First of all Masry computes the risk in the SobolevW2s
norm. We, however, start from the Lp(Lq)-risk and can specify the risk in a variety of different norms, especially in the norm of the Besov spaces built fromLp. But the main difference concerns the conditions that have to be imposed on the tails of the density for the risk not to explode. While we simply need to impose thatf ∈Lp/r, r >2, in Masry (1994) the decay of the density tails has to be related to the smoothnesssof the density, asking for a decay of order x−β, withβ >0.5 +s.
5 Statistical applications: quantile regression and estimation of conditional quantiles of financial data
We address the estimation of conditional quantiles as an important application of our methodology.
Let us consider the stationary process{(Xt, Yt)}introduced in Section 4. We wish to estimate the conditional distribution functionFY(y|x) =P(Yt6y|Xt=x) and, from this, the p-th conditional quantile, that is the value Q(x, p) = inf{y ∈ R|F(y|x) > p}, for a probability level 0 < p < 1.
The conditional medianQ(x, p= 0.5) has often been the main object of interest. It is used as an alternative to the conditional mean to deliver a robust estimate of the effect of the variableX on the response variable Y. In general we can build confidence intervals for the variable Y from a collection of conditional quantiles. In the case of a stationary process {Zt}t∈Z, whenYt=Zt and Xt=Zt−1, it is possible to build prediction intervals for Zt having observedZt−1. The estimation of conditional quantiles of financial time series motivates this work to a large extent. Conditional quantiles are used as measures of risk, and are known as conditional Value-at-Risk in the financial literature.
In this section we start by linking the problem of the estimation of conditional quantiles to the one of the estimation of the bivariate FY(x, y) studied in the previous sections. In order to achieve this, we show that a modification of the norm (4.8) is needed. We then provide a couple of scaling functions ϕ,ϕ˜ fulfilling assumptions (2.2) to (2.7) and (2.9), and we explicitly build shape preserving estimators. Hence we deliver an algorithm for the implementation of the shape preserving set-up. Finally we carry out a Monte Carlo experiment to check our methodology.
5.1 Set-up and conditional quantiles
We keep the same assumptions on the stochastic process as in Section 4. In particular we consider observations{Y1, . . . , YT}of a stationary process{Yt}t∈Zsuch that the couples{(Yt, Xt=Yt−1)}t∈Z form a Markovian process of order one fulfilling assumption 1 and the mixing conditions of assump- tion 2. Moreover we assume that the conditional distributions FYt(y|x) and fYt(y|x) of Yt given Xt = x exist. Here we are interested in the p-th conditional quantile, which is assumed to be unique. We define the estimator ˆQ(x, p) to be such that
Q(p, x) = infˆ © y∈R¯
¯Fˆ(y|x)>pª
. (5.1)
The solution of (5.1) always exists since ˆF(y|x) is monotone and bounded between 0 and 1.
Now, since we have assumed the existence of the conditional pdff(y|x), and sinceF(y|x) is its integral by definition, we can write a Taylor expansion with integral remainder:
F( ˆQ(p, x)|x)−F(Q(p, x)|x)
=¡Q(p, x)ˆ −Q(p, x)¢Z 1
0
dθf¡
Q(p, x) +θ( ˆQ(p, x)−Q(p, x))¯
¯x¢
. (5.2)
Denote ˜fQ,Qˆ (x)=. R1
0 dθf¡
Q(p, x) +θ( ˆQ(p, x)−Q(p, x))¯
¯x¢ .
In order to invert (5.2) we assume a local lower bound on the conditional densities, namely the existence of a positive c such that f(y|x) ≥c > 0,∀y ∈ |y−Q(p, x)|< εδ. Here εδ is chosen