Multivariate wavelet-based shape preserving estimation for dependent observations

(1)

Report

Reference

Multivariate wavelet-based shape preserving estimation for dependent observations

COSMA, Antonio, SCAILLET, Olivier, SACHS, Rainervon

Abstract

We present a new approach on shape preserving estimation of probability distribution and density functions using wavelet methodology for multivariate dependent data. Our estimators preserve shape constraints such as monotonicity, positivity and integration to one, and allow for low spatial regularity of the underlying functions. As important application, we discuss conditional quantile estimation for nancial time series data. We show that our methodology can be easily implemented with B-splines, and performs well in a nite sample situation, through Monte Carlo simulations.

COSMA, Antonio, SCAILLET, Olivier, SACHS, Rainervon. Multivariate wavelet-based shape preserving estimation for dependent observations. 2005

Available at:

http://archive-ouverte.unige.ch/unige:5760

Disclaimer: layout of this document may differ from the published version.

(2)

2005.04

FACULTE DES SCIENCES

ECONOMIQUES ET SOCIALES

HAUTES ETUDES COMMERCIALES

Multivariate wavelet-based shape preserving estimation for dependent observations

A. Cosma

O. Scaillet

R. Von Sachs

(3)

Multivariate wavelet-based shape preserving estimation for dependent observations

Antonio Cosma¹ Olivier Scaillet² Rainer von Sachs^3∗

July 13, 2006

Abstract

We introduce a new approach on shape preserving estimation of cumulative distribution functions and probability density functions using the wavelet methodology for multivariate dependent data. Our estimators preserve shape constraints such as monotonicity, positivity and integration to one, and allow for low spatial regularity of the underlying functions. We discuss conditional quantile estimation for financial time series data as an application. Our methodology can be implemented with B-splines. We show with Monte Carlo simulations that it performs well in finite samples and for a data-driven choice of the resolution level.

Keywords: Conditional quantile, time series, shape preserving wavelet estimation, B-splines, multivariate process.

Abbreviated title: Shape preserving wavelet estimation

AMS 2000 classification: 62G05, 62G07, 42C40, 41A15. JEL classification: C14, C15, C32.

1Luxembourg School of Finance, Universit´e du Luxembourg, Luxembourg. [email protected]

2HEC Gen`eve and Swiss Finance Institute, Gen`eve, Suisse. [email protected]

3Institut de statistique, Universit´e catholique de Louvain, Louvain-la-Neuve Belgique. [email protected]

∗Corresponding author

(4)

1 Introduction

The construction of shape-preserving estimators of probabilistic functions, such as probability density functions (pdfs) and cumulative distribution functions (cdfs), has attracted recent interest, see for instance Cheng et al. (1999). When estimating a pdff(x), we expect an estimator which fulfills nonnegativity and integration to one. When estimating a cdf F(x), we expect an estimator which fulfills monotonicity and right-continuity. Shape preserving means building functional estimators fˆ(x) or ˆF(x) which display such properties. In contrast to most nonparametric approaches, we aim at dealing with probabilistic functions with low spatial regularity, i.e. with occasional jumps or other discontinuities. In this set-up wavelet methods are known to be of relevance (Vidakovic (1999), Ogden (1996)), but meeting shape constraints is not clear in wavelet estimation.

In this paper we study shape-preserving wavelet estimation of pdfs and cdfs. We model the observed data as serially dependent. We do not need post-processors to implement the shape constraints. Our construction is shape-preserving but not shape-imposing. We can also deal with non-monotone or non-positive functions. We start from the construction of Dechevsky and Penev (1997, 1998), hereafter DP. Their analysis of the estimation of a univariate probabilistic function with low spatial regularity with non-orthogonal wavelets is a pure theoretical one and for an identi- cally and independently distributed (i.i.d.) model of the data. DP do not suggest an algorithm to bring their method to the data. Our goal is to estimate amultivariatefunction under less stringent conditions than the usual ones in nonparametric approaches. Each component of the multivariate function can be either of a pdf type or a cdf type. Note that a direct transfer of the DP-construction to a multivariate framework is not possible. One of the main motivations for this extension is conditional quantile estimation, for dependent data in financial time series. There we need a general methodology which can estimate ad–dimensional function which is a cdf in one component, and a multivariate pdf in the remaining d−1 components.

To summarize, the three main contributions of this paper are,

• the definition of appropriate norms of convergence for an estimator of a multivariate function which is a cdf in one of its components;

• the generalization of the univariate results of DP for ani.i.d.model to the case of multivariate time series data subject to mixing conditions;

• the design of fast algorithms to implement our method.

Doukhan (1988) and Doukhan and L´eon (1990) provide early work on wavelet density estimation with univariate independent data. Masry provides a generalization to dependent data using orthonormal bases in the univariate setting (Masry (1994)) and in the multivariate setting (Masry (1997)), but the latter analysis is limited to the case of uniform convergence on compact sets. Kerkyacharian and Picard (1992) are the first authors to derive optimality results for linear wavelet density estimation in function spaces, such as Besov spaces. Tribouley (1995) studies linear wavelet methods for multivariate density estimation. Nonlinear wavelet methodologies are applied to univariate density estimation in Donoho et al. (1996) and Kerkyacharian et al. (1996), and in Tribouley and Viennet (1998) for β−mixing data. In nonlinear methods, orthogonality or bi-orthogonality of the underlying wavelet bases has to be imposed. Other recent studies on density estimation with wavelets aim at dealing with shape preserving properties, see for instance Penev and Dechevsky (1997) and Pinheiro and Vidakovic (1997). These approaches use devices such as pre- or post-processing.

Choosing the DP shape preserving wavelets does not only overcome the need of pre- or post- processors but also yields the following advantages: 1) Since orthogonality has to be given up, we rely on a simple construction using B-splines. It allows us to have analytic expressions of our

(5)

basis functions in the time domain. This is essential for deriving the reconstruction of a cdf by integration; 2) The proof technique applies simultaneously to the pdf case and the cdf case; 3) Our approach gives general results for linear wavelet density estimation without a restriction to the Besov space framework of Kerkyacharian and Picard (1992).

Shape-preserving estimation of probabilistic functions turns out to be interesting for a variety of nonparametric estimation problems. To give only a few examples beyond multivariate density estimation, we state hazard rate estimation (Hall and Van Keilegom (2004)), and logistic regression, see for instance McFadden and Train (2000) for an application to Mixed Multinomial Logit Models (MMNL). D. McFadden in his Nobel Prize lecture (McFadden (2003)) states explicitly that an appropriate multivariate extension of the DP set-up is required in an MMNL framework. This ensures that the multivariate indirect utility functions determining the choice probabilities display the required shape restrictions. Shape preserving estimators have also been designed for functional estimation in the context of diffusion processes (Chen et al. (1998)). Another application is quantile regression. It provides robust estimators such as median estimators, and allows to characterize the heterogeneous impact of variables on different points of a distribution. It finds important applications in finance and insurance (quantiles of loss distributions), and in labor economics (measures of income inequality).

We briefly discuss quantile regression when we develop our main application, namely nonparametric estimation of conditional quantiles for time series, the conditioning information being the past observations of the time series. As pointed out by Hall et al. (1999) and Cai (2002), the shape preserving property of a cdf estimator is particularly important for quantile estimation. In our approach we strongly benefit from a direct wavelet estimate which is monotone and constrained to lie between 0 and 1. This is in contrast to other popular methods for quantile regression - see for instance the modified local linear quantile estimators of Yu and Jones (1998). Our time series framework calls for a particular care in proving consistency of our estimator. We provide such a result under less stringent assumptions on the smoothness of the conditional and marginal distribution functions of the random process.

Our paper is organized in the following way.

Section 2 gives an introduction to shape preserving wavelets and estimation of univariate probabilistic functions developed by DP. We present the relevant terminology and concepts such as moduli of smoothness, seminorms, as well as appropriate risk definitions. In Section 3 we briefly recall the main results of DP which are essential for our work. For details summarizing their work we refer to Appendix A in Cosma et al. (2005). Section 4 presents our theoretical contributions:

Theorem 1 states the most general result of this article. It is an extension of the univariate results of Section 3 to higher dimensions and the case of time series data. In Section 5 we examine quantile regression and, in particular, conditional quantile estimation for financial time series data. We discuss numerical implementation via B-splines, and present a simulation study. A data-driven choice of the resolution level is investigated numerically. In a short conclusion we discuss some ideas for future research. The outline of the proofs is deferred to an appendix. The detailed proofs can be found in the technical report Cosma et al. (2005).

2 Preliminaries on shape-perserving wavelets

We start this section by introducing the concept of Multiresolution Analysis (MRA). Let L₂(R) be the space of square integrable functions defined on the real line, that is L₂(R) = ©

f : R → R¯

¯R_+∞

−∞ dxf(x)² < ∞ª

. An MRA is a sequence of closed subspaces V_j ⊂ L₂(R), j ∈Z, with the following properties (Meyer (1992)): V_j ⊂V_j+1, ∩V_j ={0}, ∪V_j =L₂(R),and for allv(x)∈L₂(R) and j, k ∈ Z, v(x) ∈ V_j ⇔ v(2x) ∈ V_j+1 and v(x) ∈ V₀ ⇔ v(x−k) ∈ V₀. Moreover, a scaling

(6)

function ϕ∈V₀ exists such that{ϕ(x−l)|l∈Z}is a Riesz basis of V₀. It follows that in general ϕ(2^jx)∈V_j, and©

ϕ_jk(x)ª

k∈Z

=. {2^j/2ϕ(2^jx−k)|k∈Z} is a Riesz basis inV_j.

Orthogonal projection was the first idea exploited in wavelet analysis. It leads to a construction of orthonormal bases of scaling functions, such thatR_+∞

−∞ ϕ(x−l)ϕ(x−k)dx=δ_kl. However, orthogonality poses a number of constraints on the construction of the scaling functions. For instance it is not possible to construct{ϕ(· −l)}_l∈Zthat are at the same time continuous, orthogonal, nonnegative and symmetric. Moreover we want to build shape preserving operators, that is projection operators that map nonnegative functions to nonnegative functions and monotone functions to monotone functions. To achieve this, we need extra freedom in building the proper scaling functions. Hence we introduce a non-orthogonal projection operator starting from two different families of scaling functions{ϕ(2^jx−k)}_k∈Z and {ϕ(2˜ ^jx−k)}_k∈Z. The projector on the space V_j is given by:

A_j¡ f¢

(x) =X

k∈Z

2^jhf,ϕ(2˜ ^j · −k)i ϕ(2^jx−k). (2.1) The two families of functions are called the primal basis and the dual basis. The primal basis is generated from the scaling functionϕ, and the dual basis is generated from the scaling function ˜ϕ.

We require that the two families display the following properties. Letϕbe such that

ϕ(x)>0, x∈R; (2.2)

ϕ(x) bounded, right continuous; (2.3)

suppϕ⊂[−a, a), a>1/2; (2.4)

X∞

k=−∞

ϕ(x−k)≡1,onR; (2.5)

there existsb∈(−a, a) such that

ϕis not decreasing for x6b and non-increasing for x>b. (2.6) Then let the dual scaling function be such that:

˜

ϕsatisfies (2.2), (2.4), ϕ˜∈L₁, and Z _+∞

−∞

dtϕ(t) = 1.˜ (2.7)

As for the primal basis, we define the scaled versions of ˜ϕ such that ©

˜ ϕ_jk(x)ª

k∈Z

=. {2^j/2ϕ(2˜ ^jx− k) |k∈Z}.

The conditions given on ˜ϕ are weaker than those on ϕ. Condition P_∞

−∞ϕ(x−k) = 1 implies R_+∞

−∞ dt ϕ(x) = 1 (see Anastassiou and Yu (1992) for a proof). In particular, this means that both ϕ(x) and ˜ϕ(x) are normalized in the L₁ norm.

The following notation is also used:

ς_ϕ,_ϕ_˜(t)=. Z _+∞

−∞

dτ τϕ(τ˜ )− X+∞

k=−∞

(t−k)ϕ(t−k). (2.8)

Let us examine why these assumptions guarantee a shape-preserving approximation (2.1) of a pdf or a cdf. Assumption (2.2) on ϕ and ˜ϕ ensures that the reconstruction (2.1) of a pdf and a cdf is nonnegative. If assumption (2.5) is also satisfied by ˜ϕ, then the approximation of a pdf integrates to 1. Assumptions (2.3) and (2.6) are specific to shape preserving approximation of a cdf. Assumption (2.3) guarantees that the reconstruction has the minimum regularity conditions of a cdf (boundedness and right continuity), while assumption (2.6), jointly with the nonnegativity of ˜ϕ, guarantees the monotonicity of the reconstruction.

(7)

Assumption (2.4) is a usual compact support assumption. It boils down to the use of finite length filters in the implementation of the discrete wavelet transform and implies that ν_a= #©

ϕ_jk ¯

¯ x∈ Supp (ϕ_jk)ª

is independent of scalejand locationk. Assumptions (2.5) and (2.8) together with the condition ς_ϕ,˜_ϕ = 0 almost everywhere are equivalent to the usual moment conditions in wavelet approximation theory. Define the moments of the dual scaling function: M˜₀ =R_+∞

−∞ dtϕ(t),˜ M˜₁= R_+∞

−∞ dt tϕ(t).˜ Then (2.5) and the condition ς_ϕ,˜_ϕ = 0 can be rewritten in the following way:

+∞X

k=−∞

(t−k)^p ϕ(t−k) = ˜M_p, p= 0,1. (2.9) These two conditions ensure that the multiresolution analysis V_j reproduces exactly polynomials of degree less or equal to 1. We say that the multiresolution analysis fulfills aStrang-Fix condition of order 1.

We finish this second section by recalling the useful concept of modulus of smoothness. This concept is used later on to derive the approximation properties of the projection operator (2.1) in general function spaces, such as Sobolev and Besov spaces (see Nikol’ski˘ı (1975)). For functions defined on a region Ω∈R^d, we introduce the increment of the functionf in the directioniand the corresponding modulus of smoothness. Let ∆¹_itf(x) =f(x+it)−f(x), ∆^µ_itf(x) = ∆¹(∆^µ−1_it f(x)), then, for h >0,µ∈N and 16p6∞, the integral p-modulus of smoothness in thei direction is given by

ω^µ_i(f, h)_p = sup

0<t6h

°°∆^µ_itf(x)°°

p, (2.10)

with the usual convention of the sup

x -norm forp=∞, which is the classical modulus of continuity.

3 Shape preserving estimation of univariate probabilistic func- tions

First we briefly summarize the main concepts of DP in thei.i.d.case for the estimation ofunivariate probabilistic functions (pdfs and cdfs) by means of shape preserving wavelets. We recall these results since they have inspired our own work for multivariate time series data. Note that we need to define an estimation risk in a function norm which is appropriate for treating simultaneously the error when estimating a pdf or a cdf nonparametrically. See equation (3.5). In the two cases the inner products in equation (2.1) can be estimated from the observed data (X₁, . . . , X_n) in the following way:

hf,\ϕ˜_jki=hf,ϕ˜_jk(X)i= 1 n

Xn i=1

˜

ϕ_jk(X_i), iff is a pdf, (3.1)

hF,\ϕ˜_jki=hF,ϕ˜_jk(X)i= 1 n

Xn i=1

2⁻^j² ©

1−Φ(2˜ ^jX_i−k)ª

, ifF is a cdf, (3.2) with ˜Φ(x) =R_x

−∞ϕ(t)˜ dt. Sincef is a pdf,hf,ϕi˜ =E[ ˜ϕ]. Then (3.1) is an estimator of the expected value of ˜ϕ_jk. In a similar way we can obtain (3.2) by integration by parts

hF,ϕ˜_jki = 2^−j/2−2^−j/2E[Φ(2^jX−k)],

(8)

using the boundedness of the support and the normalization properties of ˜ϕ_jk. It then follows that the estimators for a univariate pdf and a univariate cdf are given by:

f(x) = ˆˆ A⁽ⁿ⁾_j ¡ f¢

(x) = 1 n

X

k∈Z

Xn i=1

˜

ϕ_jk(X_i)ϕ_jk(x), iff is a pdf; (3.3) F(x) = ˆˆ A⁽ⁿ⁾_j ¡

F¢

(x) = 1 n

X

k∈Z

Xn i=1

2⁻^j²©

1−Φ(2˜ ^jX_i−k)ª

ϕ_jk(x), ifF is a cdf. (3.4) Lemma 1. Let f be either a pdf or a cdf. Let ϕ,ϕ˜fulfill assumptions (2.2)to (2.7) and, iff is a pdf, let ϕ˜ fulfill also (2.5). Then the estimator Aˆ⁽ⁿ⁾_j ¡

f¢

derived from the operator (2.1)using (3.3) or (3.4)is shape preserving.

By shape preserving, we mean that if f is a pdf, then ˆA⁽ⁿ⁾_j ¡ f¢

is a nonnegative function that integrates to 1, and if F is a cdf, then ˆA⁽ⁿ⁾_j ¡

F¢

is a monotone, right-continuous function and lim_x→±∞Aˆ⁽ⁿ⁾_j ¡

F¢

(x) = 0,1. For the proofs we refer to Lemma 2.2.1 in DP (1997) for the pdf case, and to Lemma 2.1.1 in DP (1997) and Lemma 3 in Anastassiou and Yu (1992) for the cdf case.

The shape preserving properties of estimators (3.3) and (3.4) come from the approximation results derived in DP (1997).

To assess the behavior of the estimators we define a risk using the following quasi-norm for a functiong(x) defined on R, whose random values depend on the realization of (X₁, . . . , X_n) :

°°g¯

¯L_p(L_q)°

°=

½ Z _+∞

−∞

dx¡

E|g(x)|^q¢_p/q¾_1/p ,

with 0< p, q 6∞. Recall that for a quasi-norm the triangular inequality holds with°

°g+h°

°A6 c_A(°

°g°

°A+°

°h°

°A), c_A >1. In order to be able to work with the usual triangular inequality, i.e.

c_A= 1, we move to the space L_p(L_q)^ρ with an appropriately chosen ρ >0 (see the definition and the discussion in Appendix D).

In theL_p(L_q) quasi-norm thepparameter takes into account the smoothness of the function via (2.10), while the q parameter gives an additional degree of freedom to ensure that the estimation risk stays finite through a control of the tails. In the original work of DP (1998) the notation of the two parameters is inverted; in our setting we prefer to call p the “smoothness” parameter as is done for Besov spaces. Note that in contrast to usual Besov spaces, q is here connected to the stochastic dimension of the problem. Our change is notational only and does not alter the essence of the quasi-norm used in the original reference. For p=q, we get the usual L_p−risk, i.e. Ek · k_p. The notationk · k_p remains associated with the usualL_p(R^d)-norm.

From now on let ˆf be an estimator for a pdf or a cdf. Then,

°°fˆ−f¯

¯L_p(L_q)°

°^ρ=°

°fˆ−E( ˆf) +E( ˆf)−f¯

¯L_p(L_q)°

°^ρ 6c(p, q, ρ)n°

°fˆ−E( ˆf)¯

¯L_p(L_q)°

°^ρ+°

°E( ˆf)−f¯

¯L_p(L_q)°

°^ρo

. (3.5) Hereafter we restrict ourselves to the ranges 1 6 p 6∞, 0 < q 62, and we always work with a choice ofρ such that c(p, q, ρ) = 1.

From equations (3.1) and (3.2), we can see that the estimators (3.1) and (3.2) are unbiased estimators of the inner products hf,ϕi, and that ˆ˜ A_j is also an unbiased estimator of A_j. The triangular inequality (3.5) can thus be rewritten as:

°°fˆ−f¯

¯L_p(L_q)°

°^ρ6n°

°A_j¡ f¢

(·)−f(·)°

°^ρ

p+°

°Aˆ⁽ⁿ⁾_j ¡ f¢

(·)−A_j¡ f¢

(·)¯

¯L_p(L_q)°

°^ρo

. (3.6)

(9)

The first result (DP (1997), Theorem 2.1.1) concerns the first part of equation (3.6), i.e., the bias term:

°°A_j¡ f¢

(·)−f(·)°

°p6c₁kς_ϕ,_ϕ_˜k_∞·ω¹(f,2^1−ja)_p+c₂kϕk˜ _p⁰ · kϕk_∞·ω²(f,2^1−ja)_p, (3.7) where p⁰ ∈ [1,∞] is such that ¹_p +_p¹0 = 1, a is the length of the support of the scaling functions, and c₁ >0 andc₂ >0 are absolute constants that do not depend on the resolution levelj.

Note that, in contrast to pdfs, cdfs are not inL_p with 16p <∞, but only inL_∞. Yet by (3.7) theL_p distance betweenA_j(F) and F is bounded. This means that the approximation properties of A_j(F) can be studied in an appropriately chosenL_p−norm.

It is possible to obtain how the bias depends explicitly on the resolution level j. To this end we would need to specify the function space to which f belongs and exploit the properties of the modulus of smoothness in this specific function space. For now, we just give a qualitative characterization of this dependence. Since ω^µ(f,2^1−ja)_p is increasing in its second argument by definition, the bias can be bounded by a decreasing function of j, so that equation (3.7) can be

rewritten as: °

°A_j¡ f¢

(·)−f(·)°

°p 6B(j), (3.8)

whereB(j) is a decreasing function ofj.

For the second term of (3.5), which is rewritten as

°°

°Aˆ⁽ⁿ⁾_j ¡ f¢

(·)−E¡Aˆ⁽ⁿ⁾_j ¡ f¢

(·)¢¯¯

¯L_p(L_q)

°°

°=

½ Z _+∞

−∞

dx

³ E¯

¯Aˆ⁽ⁿ⁾_j ¡ f¢

(x)−E( ˆA⁽ⁿ⁾_j ¡ f¢

(x)¯

¯^q´^p

q

¾¹

p , (3.9) we have two different behaviors depending onf being a cdf or a pdf.

For a cdf. (DP (1998), Theorem 2.1.1) The variance term (3.9) has a parametric decay O ¡

(¹_n)^ρ/2¢

to zero. In this case the bias rate can be adapted by appropriately choosing the increasing functionj^∗ =j^∗(n) such that B(j^∗) = O(n^−ρ/2). The total risk (3.6) then decays at a parametric rate:

°°Aˆ⁽ⁿ⁾_j ¡ F¢

(·)−F(·)¯

¯L_p(L_q)°

°^ρ=O Ãµ1

n

¶_ρ/2!

, n→ ∞ . (3.10)

It can be easily seen that the above convergence rate can be achieved by choosingj>(p/2) log₂n.

For a pdf. (DP (1998), Theorem 2.2.1) The variance term is an increasing function of j, that is (3.9) is bounded by a functionV(2^j/n)¡

(2^j/n)^ρ¢

. When choosing the functionj^∗ =j^∗(n), we face the typical nonparametric trade-off between the competing behaviors of B and V as functions of j. In particular,j^∗ =j^∗(n) has to be an increasing function such that V(2^j^∗/n) =O¡

B(j^∗)¢ .The convergence rate for the estimator of a pdf is of order

°°Aˆ⁽ⁿ⁾_j ¡ f¢

(·)−F(·)¯

¯L_p(L_q)°

°^ρ=O (2^−j^∗^(n)ρ), n→ ∞ . (3.11) which is typically slower than the parametric one. These findings are the same as in classical wavelet estimation (H¨ardle et al. (1998), Kerkyacharian and Picard (1992)). For detailed results, we refer to Cosma et al. (2005), Appendix A, Corollaries 10 - 14.

4 Shape preserving estimation of multivariate probabilistic func- tions

This is the main section of our paper. Here we provide a multivariate extension of the results of DP for time series data. Our work is motivated by an application to quantile estimation (see Section 5).

(10)

We analyze a multivariate function F ∈ R^d which is a cdf in the last argument, and a pdf in the d−1 remaining arguments, i.e.,

F_Y(x, y) = Z _y

−∞

dt f(x, t). (4.1)

However, our set-up allows for constructing estimators of multivariate densities f(x)∈R^d as well.

For ease of notation we present only the bivariate case. From now on the argumenty will always denote the variable with respect to which F_Y(x, y) is cumulated, andx the argument with respect to whichF_Y(x, y) is a density. The extension of the results to a bivariate pdf can be obtained with minor changes that will be given in the sequel.

Our constructions are based on tensor product wavelets. The primal and dual wavelet bases ϕ,ϕ˜ introduced in equations (2.2) - (2.7) are functions defined from R to R so that functions f :R→Rcan be approximated. It is straightforward to build scaling functions defined onR^d, so that multivariate functions can be approximated by wavelet series, using tensor product wavelets.

It is known (see, e.g., Meyer (1992) Section 3.3) that, if {V_j}_j∈Z is a multiresolution analysis of L₂(R), then L₂(R²) =

[∞

j1,j2=0

V_j₁⊗V_j₂.

Now we definej= (j₁, j₂) andV_j =V_j₁⊗V_j₂, from which we can derive the 2-dimensional basis for each approximation space V_j:

{ϕ_jk(x, y)}_k∈R2 ={ϕ_j₁_k₁(x)}_k₁_∈Z⊗ {ϕ_j₂_k₂(y)}_k₂_∈Z. (4.2) This construction can obviously be extended to any dimensiond >2. The use of independent scales j_i, i= 1, . . . , d for each dimension allows for adaptation to possibly different regularity (over the different dimensions) of the d−dimensional function (see, e.g., Neumann and von Sachs (1997)).

Hence the cdf part can benefit from a higher j₂ than thej₁ of the pdf part.

In the sequel we provide approximation and estimation results of bivariate probability functions treating both the i.i.d. case and the serially dependent case. To emphasize the latter, we suppose that we have real-valued bivariate time series observations (Y₁, X₁), . . . ,(Y_T, X_T),generated from a stationary stochastic process {(Y_t, X_t)}_t∈Z. Its dependence structure is controlled via mixing conditions. In particular we could have that Y_t = Z_t and X_t = Z_t−1, where {Z_t}_t∈Z is a univariate stationary process. Let F_i^k be the sigma-field of events generated by the random variables {(Y_t, X_t), i6t6k}. The stationary process{(Y_t, X_t)}_t∈Z is called strongly or α- mixing if supA∈F0

−∞

B∈F∞ p

¯¯P[AB]−P[A]P[B]¯

¯ = α(p) −−−→

p→∞ 0. Below, if not differently stated, let f(·) be the “design” density f_X_t(x), which is the marginal distribution of the stationary process in the univariate time series case.

Our results are then derived under the following assumptions on the stochastic process.

Assumption 1: For every integer s > 0 the joint distribution F_(X₀_,Y₀_),(X_s_,Y_s₎ exists and there is a positive constant M such that for every bounded zero-mean random variableT(X_t, Y_t):

E[|T(X₀, Y₀)T(X_s, Y_s)|]6M E[|T(X₀, Y₀)|]E[|T(X_s, Y_s)|]. (4.3) Assumption 2: The process{(X_t, Y_t)} isα-mixing and the coefficientsα(p) are such that:

X∞

p=N

£α(p)¤_1−2/r

=O(N⁻¹), (4.4)

forr >2.

Many processes verify the condition given on the mixing coefficients. Gaussian processes, non Gaussian autoregressive moving average processes (see Pham and Tran (1980)), many nonlinear

(11)

functionals of these processes, and various GARCH and stochastic volatility models, see Carrasco and Chen (2002).

We construct our estimators by mimicking the univariate constructions (3.3) and (3.4). We use the shape preserving scaling functions ϕ and ˜ϕthat fulfill assumptions (2.2) to (2.7). Recall that j= (j₁, j₂). Let{ϕ_jk(x, y)}_k∈R²,{ϕ˜_jk(x, y)}_k∈R² be the bivariate primal and dual bases (see (4.2)), then the estimators ofF_Y(x, y) and f(x, y) are given by:

Aˆ^(T_j ⁾¡ f¢

(x, y) = X

k∈Z²

n1 T

XT t=1

˜

ϕ_j₁_k₁(X_t) ˜ϕ_j₂_k₂(Y_t) o

ϕ_jk(x, y), (x, y)∈R², (4.5) Aˆ^(T_j ⁾¡

F¢

(x, y) = X

k∈Z²

XT t=1

2⁻^j²²

Ãϕ˜_j₁_k₁(X_t)

T −ϕ˜_j₁_k₁(X_t) ˜Φ(2^j²Y_t−k₂) T

!

ϕ_jk(x, y), (x, y)∈R². (4.6) Let us further introduce the multivariate approximator of the probabilistic functionf:

A_j¡ f¢

(x, y) = X

k∈Z^d

hf,ϕ˜_jkiϕ_jk(x, y). (4.7)

Note that the properties of the functions and the projectorsA_j¡ f¢

(x, y) that allow us to approximate the cdf of one variable, cannot play the same role when we move to a multivariate analysis. In particular (3.7) does not continue to hold in the multivariate L_p−norm because it is not possible to bound the modulus of smoothness of a function which is in L_∞ in the y-argument. We could use anL_∞-norm but this would require continuity of the density part of F_Y(x, y). Instead we take advantage of the different convergence rates for a pdf and a cdf as discussed in Section 3, and we work with the following risk:

d©f(x, y),ˆ ˆg(x, y)ª

p= sup

y∈R

°°fˆ(·, y)−ˆg(·, y)¯

¯L_p(L_q)°

°. (4.8)

In order to use this norm, we have to assume continuity of F_Y(x, y) as a function ofy. We will see that the results on the approximation and estimation of the bivariate F_Y(x, y) look like the usual results on estimation of a univariate pdf. No additional effort is needed to interpret them.

Moreover, these results can be adapted with minor changes to a bivariate pdf, where the norm (4.8) becomes the usual L_p(R²)-norm. Then we do not need to make a continuity assumption on the bivariate pdf. Further details on the changes needed to adapt the pdf-cdf results to the pure pdf case will be given at the end of each of the following sub-sections.

4.1 Bias in multivariate cdf-pdf approximation

Here we bound the deterministic bias made by approximatingF_Y(x, y). A sketch of the proof can be found in Appendix A. The proof technique is largely inspired by the one in DP (1997), but with changes requested by the use of the norm (4.8). The proof is based on approximations by Steklov means. It allows us to relate the approximation error to the modulus of smoothness. We refer to Appendix C for a definition and relevant properties of Steklov means. Recall the definitions ofς_ϕ_ϕ_˜ in (2.8), and lete_x and e_y be the unit vectors in thex and y directions, respectively.

Lemma 2. Let assumptions (2.2)to (2.7)hold. LetF(x, y)be, for a fixedx, a continuous function in y. Then, for16p6∞:

sup

y kA_j¡ F¢

(·, y)−F(·, y)k_p 6c₁sup

y ω_e¹_x¡

F(·, y),2^1−j¹a¢

p+c₂sup

y sup

0<δ<1

°°∆_e_y_δ(21−j2a)F(·, y)°

°p

+c₃ sup

y ω_e²_x¡

F(·, y),2^1−j¹a¢

p+ c₄sup

y sup

0<δ<1

°°∆²_e_y_δ(21−j2a)F(·, y)°

°p.

(4.9)

(12)

The constants c₁ and c₂ include the value of the norms kς_ϕ_ϕ_˜(2^j¹x)k_∞ and kς_ϕ_ϕ_˜(2^j²y)k_∞. If the latter twoL_∞-norms are0, then the constants c₁ andc₂ will be smaller but they cannot be equal to 0.

The first order terms, i.e. the terms in which appear the constantsc₁ andc₂, apparently do not vanish in (4.9) even if the conditions kς_ϕ_ϕ_˜(2^j¹x)k_∞ = 0 and kς_ϕ_ϕ_˜(2^j²y)k_∞ = 0 are fulfilled. This differs from equation (3.7) of the univariate setting. It is related to the proof technique and not to a fundamental reason.

Remark 1. For a bivariate pdf, we do not need to assume continuity off(x, y) with respect toy in Lemma 2. In (3.7) theL_p-norm is taken with respect to both arguments, and the bound is given by the usual moduli of smoothness in the two directions. This amounts to the bivariate equivalent of (3.7).

4.2 Estimation of multivariate cdf-pdf from dependent data

We come now to our main results where we study the estimation of the bivariate F_Y(x, y) from time series observations{(X₁, Y₁), . . . ,(X_T, Y_T)} generated under assumptions 1 and 2. The i.i.d.

case is a subcase of this one.

Again we make use of the triangular inequality to split the risk into a stochastic term and a bias term:

sup

y∈R

°°Aˆ^(T_j ⁾¡ F¢

(·, y)−F(·, y)¯

¯Lp(Lq)°

°^ρ 6n

sup

y

°°Aj

¡F¢

(·, y)−F(·, y)°

°^ρ

p+ sup

y

°°Aˆ^(T_j ⁾¡ F¢

(·, y)−Aj

¡F¢ (·, y)¯

¯Lp(Lq)°

°^ρo . The bias part is bounded in Lemma 2. We now give a bound for the stochastic part.

Lemma 3. Let ϕ,ϕ˜ be as in (2.2) - (2.7). Let {(X_t, Y_t)}_t=0,...,T be realizations of a stationary process fulfilling assumptions 1 and 2. Let p>1, 0< q62 and ρ = min{2, p}. Assume, for fixed y, that F_Y(x, y)∈L_p/2∩L₁∩L_p/r for r >2 . Let j₂ >(p/2) log₂T. Then

sup

y

°°

°Aˆ^(T_j ⁾¡ F¢

(·, y)−E£Aˆ^(T_j ⁾¡ F¢

(·, y)¤¯¯¯L_p(L_q)

°°

°^ρ6V¯₀+ ¯V₁, (4.10) where

V¯₀ = µ2^j¹

T

¶_ρ/2· d₁max

n sup

y

°°F(·, y)°

°1,sup

y

°°F(·, y)°

°p/2

o_ρ/2

+ d₂(a) max n

sup

y ω¹_e_x¡

F(·, y),2^1−j¹a¢

1,sup

y ω_e¹_x¡

F(·, y),2^1−j¹a¢

p/2

o_ρ/2¸

+O

³ T^−ρ/2

´

;

(4.11)

V¯₁

V¯₀ −→0, as T → ∞ . (4.12)

sup

y ω¹(F(·, y),2^1−j¹a)₁,sup

y ω¹(F(·, y),2^1−j¹a)_p/2ª

=o(1), j₁ → ∞, if 26p <∞ ,

or if p=∞ and F is continuous also with respect to the argument x, or if 1< p <2 and sup

y sup

06t6h

Z _+∞

−∞

dx

³ Z 1 0

dα F(x+αt, y)

´_p/2

<∞.

(13)

Here d₁ and d₂(a) are absolute constants that do not depend on the resolution levelsj = (j₁, j₂), and a is the support of ϕ,ϕ. As it can be seen from equation (4.10) the stochastic component of˜ the risk has two contributions ¯V₀ and ¯V₁. We refer to the proof for an explicit form of the latter.

V¯₀ is the only variance term if ˆF_Y(x, y) is estimated under ani.i.d. model.

In Cosma et al. (2005) Appendix F, we provide a slightly more general expression for the variance term in the i.i.d. set-up with a parameter range 0< q <∞.

A close inspection of (4.11) indicates that only the resolution parameterj₁ appears. Formally the variance term obtained in Lemma 3 is equivalent to the one we would obtain for a univariate pdf, see Appendix A in Cosma et al. (2005). This is because, as remarked in the discussion following equations (3.10) and (3.11), the cdf like component of the variance has a faster convergence, and with the choice j₂ >(p/2) log₂T the convergence of the entire y component of the stochastic error is taken into account by theO(T^−ρ/2) term . We can finally put together the results of Lemmas 2 and 3 to obtain the convergence rates of ˆA^(T_j ⁾.

Theorem 1. Let the assumptions of Lemmas 2 and 3 hold. Then the total risk of the estimator Aˆ^(T_j ⁾ of F_Y(x, y) is:

sup

y∈R

°°Aˆ^(T_j ⁾¡ F¢

(·, y)−F(·, y)¯¯L_p(L_q)°°^ρ 6c₁sup

y ω¹_e_x¡

F(·, y),2^1−j¹a¢_ρ

p+c₃ sup

y ω²_e_x¡

F(·, y),2^1−j¹a¢_ρ

p+ µ2^j¹

T

¶_ρ/2n

C₁+o(1) o

+O(T^−ρ/2) +o(1).

(4.13)

The constants c₁, c₃ and C₁, can be made explicit by comparing (4.13) with (4.9) and (4.11).

Again we see that only thej₁ parameter appears. The incrementsk∆ⁱ_e_yfkof the functionF_Y(x, y) in the directiony that can be found in (4.9) are missing in (4.13), since they are taken into account in the O(T^−ρ/2) term once we choose j₂ to fulfill j₂ > (p/2) log₂T. The o(1) term takes into account the ¯V₁ component of the variance. The risk given in Theorem 1 is formally equivalent to the risk in estimating a univariate pdf. Hence the explicit convergence rates of ˆA^(T_j ⁾¡

F¢

can be obtained again by choosing a resolution levelj₁ in thexdirection that balances bias and variance.

They are equivalent to the convergence rates of the estimator of a univariate pdf having the same smoothness as the pdf part of F_Y (see Corollaries 10 - 14 in Cosma et al. (2005), Appendix A).

Remark 2. Lemma 3 can be extended to a bivariate pdf. The continuity assumption on the y variable is no more required, and we assume that f(x, y) ∈ L_p/2 ∩L₁ ∩L_p/r. Then in (4.11) a (2^j¹^+j²/T) factor appears instead of (2^j¹/T), and theO(T^−ρ/2) term disappears.

Lemma 3 and the proofs deal with the bivariate distribution functionF_Y(x, y), but a close look at the proof reveals that the results can be immediately extended to a dimension d > 2. As a multivariate density has a finite L_p-module of smoothness forp <∞, we can extend Theorem 1 to functions F_Y(x, y), by looking at the uniform convergence in y of

sup

y

°°

°Aˆ^(T_j ⁾(f)(·, y)−EAˆ^(T_j ⁾(F)(·, y)

¯¯

¯L_p(L_q)

°°

° ,

where the above L_p-norm, made explicit in equation (3.9), is now defined onR^d−1.

Let us briefly comment on the results obtained for the convergence of the distribution function F_Y(x, y). We have derived the upper bound and the asymptotic behavior of the stochastic term of the risk of the estimator (4.6) when the data come from a weakly dependent process. The risk is computed in the L_p(L_q)-norm. The advantage of separating the pointwise expectation from the global norm is probably more evident here than in other contexts. This can be seen by comparing our results with the ones in Masry (1994). First of all Masry computes the risk in the SobolevW₂^s

(14)

norm. We, however, start from the L_p(L_q)-risk and can specify the risk in a variety of different norms, especially in the norm of the Besov spaces built fromL_p. But the main difference concerns the conditions that have to be imposed on the tails of the density for the risk not to explode. While we simply need to impose thatf ∈L_p/r, r >2, in Masry (1994) the decay of the density tails has to be related to the smoothnesssof the density, asking for a decay of order x^−β, withβ >0.5 +s.

5 Statistical applications: quantile regression and estimation of conditional quantiles of financial data

We address the estimation of conditional quantiles as an important application of our methodology.

Let us consider the stationary process{(X_t, Y_t)}introduced in Section 4. We wish to estimate the conditional distribution functionF_Y(y|x) =P(Y_t6y|X_t=x) and, from this, the p-th conditional quantile, that is the value Q(x, p) = inf{y ∈ R|F(y|x) > p}, for a probability level 0 < p < 1.

The conditional medianQ(x, p= 0.5) has often been the main object of interest. It is used as an alternative to the conditional mean to deliver a robust estimate of the effect of the variableX on the response variable Y. In general we can build confidence intervals for the variable Y from a collection of conditional quantiles. In the case of a stationary process {Z_t}_t∈Z, whenY_t=Z_t and X_t=Z_t−1, it is possible to build prediction intervals for Z_t having observedZ_t−1. The estimation of conditional quantiles of financial time series motivates this work to a large extent. Conditional quantiles are used as measures of risk, and are known as conditional Value-at-Risk in the financial literature.

In this section we start by linking the problem of the estimation of conditional quantiles to the one of the estimation of the bivariate F_Y(x, y) studied in the previous sections. In order to achieve this, we show that a modification of the norm (4.8) is needed. We then provide a couple of scaling functions ϕ,ϕ˜ fulfilling assumptions (2.2) to (2.7) and (2.9), and we explicitly build shape preserving estimators. Hence we deliver an algorithm for the implementation of the shape preserving set-up. Finally we carry out a Monte Carlo experiment to check our methodology.

5.1 Set-up and conditional quantiles

We keep the same assumptions on the stochastic process as in Section 4. In particular we consider observations{Y₁, . . . , Y_T}of a stationary process{Y_t}_t∈Zsuch that the couples{(Y_t, X_t=Y_t−1)}_t∈Z form a Markovian process of order one fulfilling assumption 1 and the mixing conditions of assumption 2. Moreover we assume that the conditional distributions F_Y_t(y|x) and f_Y_t(y|x) of Y_t given X_t = x exist. Here we are interested in the p-th conditional quantile, which is assumed to be unique. We define the estimator ˆQ(x, p) to be such that

¯Fˆ(y|x)>pª

. (5.1)

The solution of (5.1) always exists since ˆF(y|x) is monotone and bounded between 0 and 1.

Now, since we have assumed the existence of the conditional pdff(y|x), and sinceF(y|x) is its integral by definition, we can write a Taylor expansion with integral remainder:

F( ˆQ(p, x)|x)−F(Q(p, x)|x)

=¡Q(p, x)ˆ −Q(p, x)¢Z ₁

0

dθf¡

Q(p, x) +θ( ˆQ(p, x)−Q(p, x))¯

¯x¢

. (5.2)

Denote ˜f_Q,Q_ˆ (x)=. R₁

0 dθf¡

Q(p, x) +θ( ˆQ(p, x)−Q(p, x))¯

¯x¢ .

In order to invert (5.2) we assume a local lower bound on the conditional densities, namely the existence of a positive c such that f(y|x) ≥c > 0,∀y ∈ |y−Q(p, x)|< ε_δ. Here ε_δ is chosen