• Aucun résultat trouvé

Nonparametric Instrumental Variable Estimation of Structural Quantile effects

N/A
N/A
Protected

Academic year: 2022

Partager "Nonparametric Instrumental Variable Estimation of Structural Quantile effects"

Copied!
93
0
0

Texte intégral

(1)

Report

Reference

Nonparametric Instrumental Variable Estimation of Structural Quantile effects

CHERNOZHUKOV, V., GAGLIARDINI, P., SCAILLET, Olivier

Abstract

We study the asymptotic distribution of Tikhonov Regularized estimation of quantile structural effects implied by a nonseparable model. The nonparametric instrumental variable estimator is based on a minimum distance principle. We show that the minimum distance problem without regularization is locally ill-posed, and consider penalization by the norms of the parameter and its derivatives. We derive pointwise asymptotic normality and develop a consistent estimator of the asymptotic variance. We study the small sample properties via simulation results, and provice an empirical illustration to estimation of nonlinear pricing curves for telecommunications services in the U.S.

CHERNOZHUKOV, V., GAGLIARDINI, P., SCAILLET, Olivier. Nonparametric Instrumental Variable Estimation of Structural Quantile effects. 2008

Available at:

http://archive-ouverte.unige.ch/unige:5721

Disclaimer: layout of this document may differ from the published version.

1 / 1

(2)

2008.05

FACULTE DES SCIENCES

ECONOMIQUES ET SOCIALES

HAUTES ETUDES COMMERCIALES

NONPARAMETRIC INSTRUMENTAL VARIABLE

ESTIMATION OF STRUCTURAL QUANTILE EFFECTS

V. CHERNOZHUKOV P. GAGLIARDINI

O. SCAILLET

(3)

NONPARAMETRIC INSTRUMENTAL VARIABLE ESTIMATION OF STRUCTURAL QUANTILE EFFECTS

VICTOR CHERNOZHUKOV PATRICK GAGLIARDINI OLIVIER SCAILLET

Abstract. We study the asymptotic distribution of Tikhonov Regularized estimation of quantile structural effects implied by a nonseparable model. The nonparametric instrumental variable estimator is based on a minimum distance principle. We show that the minimum distance problem without regularization is locally ill-posed, and consider penalization by the norms of the parameter and its derivatives. We derive pointwise asymptotic normality and develop a consistent estimator of the asymptotic variance. We study the small sample prop- erties via simulation results, and provide an empirical illustration to estimation of nonlinear pricing curves for telecommunications services in the U.S.

Key words: Nonparametric Quantile Regression, Instrumental Variable, Ill-Posed Inverse Problems, Tikhonov Regularization, Nonlinear Pricing Curve.

JEL classification: C13, C14, D12. AMS 2000 classification: 62G08, 62G20, 62P20.

Date: 31 August 2009.

We thank the editor and three referees for helpful suggestions. We thank J. Hausman and G. Sidak for gen- erously providing us with the data on telecommunications services in the U.S. We are especially grateful to X.

Chen for comments that have led to major improvements in the paper. We also thank G. Dhaene, R. Koenker, O. Linton, E. Mammen, M. Wolf, and seminar participants at Athens University, Zurich University, Bern Uni- versity, Boston University, Queen Mary, Greqam Marseille, Mannheim University, Leuven University, Toulouse University, ESEM 2008 and SITE 2009 workshop for helpful comments. The last two authors acknowledge the support provided by the Swiss National Science Foundation through the National Center of Competence in Research: Financial Valuation and Risk Management (NCCR FINRISK).

1

(4)

2

1. Introduction

We propose and analyze estimators of the nonseparable model Y = g(X, U), where the error U is independent of the instrument Z, and has a uniform distribution U ∼ U(0,1) (Chernozhukov and Hansen (2005), Chernozhukov, Imbens and Newey (2007)). The function g(x, u) is strictly monotonic increasing w.r.t. u∈[0,1]. The variable X has compact support X = [0,1] and is potentially endogenous. The variables Y and Z have compact supports Y ⊂[0,1] andZ = [0,1]dZ. The parameter of interest is the quantile structural effectϕ0(x) = g(x, τ) onX for a givenτ (0,1).The functionϕ0 satisfies the conditional quantile restriction P[Y ≤ϕ0(X)|Z] =P[g(X, U)≤g(X, τ)|Z] =P[U ≤τ |Z] =τ, (1.1) which yields the endogenous quantile regression representation (Horowitz and Lee (2007)):

Y =ϕ0(X) +V, P[V 0|Z] =τ. (1.2)

The main contribution of this paper is the derivation of the large sample distribution of a Tikhonov regularized estimator ofϕ0. This is the first distributional result in the literature on nonlinear problems, and it is noteworthy because of a fundamental difficulty of linearization of a nonlinear ill-posed problem such as (1.2), as pointed out in Horowitz and Lee (2007). Even though this paper focuses on a particular case (1.2) of a nonlinear ill-posed problem, the results of the paper are conceptually amenable to other problems. Indeed, the nonsmooth case (1.2) analyzed here is in some sense the hardest; so our analysis could be applied to other problems along similar lines.

We build on a series of fundamental papers on ill-posed endogenous mean regressions (Ai and Chen (2003), Darolles, Florens, and Renault (2003), Newey and Powell (2003), Hall and Horowitz (2005), Horowitz (2007), Blundell, Chen, and Kristensen (2007)), and the review paper by Carrasco, Florens, and Renault (CFR, 2007). Here we follow Gagliardini and Scaillet (GS, 2006), and study a Tikhonov Regularized (TiR) estimator where regularization is achieved via a compactness-inducing penalty term, the Sobolev norm (Tikhonov (1963a,b), Groetsch (1984), Kress (1999)). For nonparametric instrumental variable estimation of endogenous quantile regression (NIVQR), Chernozhukov, Imbens and Newey (2007) discuss identification and estimation via a constrained minimum distance criterion. Horowitz and Lee (2007) give optimal consistency rates for aL2-norm penalized estimator.

In independent work for a general setting, Chen and Pouzo (2008, 2009) study semiparamet- ric sieve estimation of conditional moment models based on possibly nonsmooth generalized residual functions. Specifically, Chen and Pouzo (2009) focus on the semiparametric efficiency, asymptotic normality, and a weighted bootstrap procedure for the finite-dimensional param- eter, and use a finite-dimensional sieve to estimate the functional parameter. They cover

(5)

3

partially linear IVQR as a particular example. Chen and Pouzo (2008) give an in-depth, uni- fying treatment of convergence rates of penalized sieve-based estimators, and characterize when the sieve or the penalization dominates the convergence rates. Our and results of Chen and Pouzo (2008, 2009) are complementary to each other (their results do not nest our results, and vice versa, since we work in an infinite-dimensional parameter space, while Chen and Pouzo (2008) work in finite-dimensional parameter spaces of increasing dimensions). The most im- portant difference is the derivation of pointwise asymptotic normality which is not available in Chen and Pouzo (2008, 2009). Our other specific contributions for NIVQR include a proof of ill-posedness and a proof of consistency under weak conditions on the penalization parameter.

We organize the rest of the paper as follows. In Section 2, we prove local ill-posedness, and clarify the importance of including a derivative in the penalization. In Section 3, we prove consistency of our Q-TiR estimator. In Section 4, we show pointwise asymptotic normality, and introduce a consistent estimator of the asymptotic variance. In Section 5, we provide computational experiments and present an empirical illustration. In the Appendix, we gather the technical assumptions and some proofs. We place all omitted proofs in the online supple- mentary materials, where we use results in Alt (1992), Andrews (1994), Bosq (1998, 2000), Hall and Hosseini-Nasab (2006), Hansen (2008) and Reed and Simon (1980).

2. Ill-posedness in nonseparable models

From (1.1), the quantile structural effectϕ0is a solution of the nonlinear functional equation A0) =τ,where the operatorA is defined by

A(ϕ) (z) = Z

FY|X,Z(ϕ(x)|x, z)fX|Z(x|z)dx, z∈ Z,

and FY|X,Z and fX|Z denote the c.d.f. of Y given X, Z, and the p.d.f. of X given Z, re- spectively. Alternatively, in terms of the conditional c.d.f. FU|X,Z of U given X, Z, we can rewrite A(ϕ) (z) = R

FU|X,Z¡

g−1(x, ϕ(x))|x, z¢

fX|Z(x|z)dx, where g−1(x, .) denotes the gen- eralized inverse of function g(x, .) w.r.t. its second argument. The functional parameter ϕ0 belongs to a subset Θ of the weighted Sobolev space Hl[0,1], l N∪ {∞}, that is the com- pletion of ©

ϕ∈Cl[0,1]| kϕkH <∞ª

w.r.t. the weighted Sobolev norm kϕkH := hϕ, ϕi1/2H , where hϕ, ψiH := Pl

s=0ash∇sϕ,∇sψi, as 0, is the weighted Sobolev scalar product, and hϕ, ψi = R

ϕ(x)ψ(x)dx. Below we focus on (i) as = 1 for s l and as = 0 for s > l, when l < ∞, (ii) as = 1/s!, when l = ∞. The former gives the classical Sobolev space of order l (Adams and Fournier (2003)) while the latter gives the Sobolev space H[0,1] :=

©ϕ∈C[0,1]|P

s=0 1

s!h∇sϕ,∇sϕi<∞ª

of infinite order (Dubinskij (1986)). These Sobolev spaces are Hilbert spaces w.r.t. h., .iH, and the embeddings H[0,1] Hl[0,1] L2[0,1],

(6)

4

l∈ N, are compact (Adams and Fournier (2003), Theorem 6.3). We use theL2-norm kϕk= hϕ, ϕi1/2 as consistency norm, and we assume that Θ is bounded and closed w.r.t. k.k.

We assume thatϕ0is globally identified on Θ (see Appendix C in Chernozhukov and Hansen (2005) for a discussion) and interior.

Assumption 1: (i)A(ϕ)−τ = 0,ϕ∈Θ,if and only if ϕ=ϕ0; (ii)function ϕ0 is an interior point of set Θ w.r.t. norm k.k.

We use the conditional moment restriction m(ϕ0, z) := A0) (z)−τ = 0, z ∈ Z, and consider the criterion

Q(ϕ) := 1 τ(1−τ)E£

m(ϕ, Z)2¤

=:kA(ϕ)−τk2L2(FZ,τ),

where FZ denotes the marginal distribution of Z and L2(FZ, τ) denotes the L2 space w.r.t.

measureFZ/(τ(1−τ)). The true structural functionϕ0 minimizes this criterion functionQ. The following proposition shows that the minimum distance problem above is ill-posed, namely there are sequences of increasingly oscillatory functions that also approximately min- imize Q. Therefore, ill-posedness can lead to inconsistency of the naive analog estimators based on the empirical analog of Q. In order to rule out these explosive solutions we use penalization.

Proposition 1: Under Assumptions 1 (i) and A.3: (a) the problem is locally ill-posed, namely for any r > 0 small enough, there exist ε∈ (0, r) and a sequencen) Br0) :=

©ϕ∈L2[0,1] :kϕ−ϕ0k< rª

such that n−ϕ0k ≥ ε and Qn) 0; (b) any sequencen)⊂Br0) such that n−ϕ0k ≥ εfor r > ε >0 and Qn)0 satisfies

lim sup

n→∞k∇ϕnk= +∞.

The proof of result (a) gives explicit sequences (ϕn) generating ill-posedness. Moreover, since there is no general characterization of the ill-posedness of a nonlinear problem through conditions on its linearization, i.e., on the Frechet derivative of the operator (Engl, Kunisch and Neubauer (1989), Schock (2002)), this result does not follow from the ill-posedness of the linearized version of our problem. Part (b) provides a theoretical underpinning for including the normk∇ϕk of the derivative in the penalty term.

3. Consistency of the Q-TiR estimator

We consider a penalized criterionLT (ϕ) :=QT (ϕ) +λT kϕk2H, whereλT >0, P-a.s., and QT(ϕ) := 1

T τ(1−τ) XT

t=1

ˆ

m(ϕ, Zt)2. (3.1)

(7)

5

The conditional momentm(ϕ, z) is estimated nonparametrically by ˆ

m(ϕ, z) :=

Z

FˆY|X,Z(ϕ(x)|x, z) ˆfX|Z(x|z)dx−τ =: ˆA(ϕ) (z)−τ, z∈ Z, (3.2) where ˆfX|Z and ˆFY|X,Z denote kernel estimators offX|Z and FY|X,Z with bandwidth hT >0 and kernelK satisfying Assumption A.2.

Proposition 2:Suppose λT is a stochastic sequence such that λT >0,λT 0,P-a.s., and

λ1T

µ

logT

T hdZT +1 +h2mT

=Op(1), where m≥2 is the order of differentiability of the joint density fX,Y,Z of (X, Y, Z). Then, under Assumptions 1 (i) and A.1-A.3, the Q-TiR estimator ϕˆ defined by

ˆ

ϕ:= arg inf

ϕ∈ΘQT(ϕ) +λT kϕk2H, (3.3)

is consistent, namely ˆ−ϕ0k→p 0, as sample size T → ∞.

To show Proposition 2 we use two results. First, the Sobolev penalty implies that the sequence of estimates ˆϕ for T N is tight in ¡

L2[0,1],k.k¢

. This induces an effective com- pactification of the parameter space: there exists a compact set that contains ˆϕfor any large T with probability 1−δ, for any arbitrarily smallδ >0. Second, we obtain a suitable uniform convergence result forQT on an infinite-dimensional and possibly non-totally bounded param- eter set Θ by exploiting the specific expression of ˆm(ϕ, z) given in (3.2). We are able to reduce the sup over Θ to a sup over a bounded subset of a finite-dimensional space. Proposition 6.2 in Chen and Pouzo (2008) states a consistency result for nonparametric additive IVQR using a series estimator form(ϕ, z) under similar conditions. In the rest of the paper we assume a de- terministic regularization parameterλT. The assumption in Proposition 2 becomes hT ³T−η andλT ³T−γ for 0< η < d 1

Z+1, 0< γ <min{1−η(dZ+ 1),2mη}; the relation aT ³bT, for positive sequencesaT and bT, means that aT/bT is bounded away from 0 and asT → ∞.

4. Asymptotic distribution of the Q-TiR estimator

In this section we derive a feasible asymptotic normality theorem. After deriving the first- order condition (Section 4.1) we show how to control the error induced by linearization of the problem under suitable smoothness assumptions (Sections 4.2 and 4.3). The validity of a Bahadur-type representation for the functional estimator makes it possible to show asymptotic normality (Section 4.4). We then provide a consistent estimator for the asymptotic variance (Section 4.5).

4.1. First-order condition. Let us introduce the operators from L2[0,1] to L2(FZ, τ) cor- responding to the Frechet derivative A := DA0) of operator A at ϕ0, and to the Frechet

(8)

6

derivative ˆA:=DAˆ( ˆϕ) of operator ˆAat ˆϕ, respectively:

(z) = Z

fX,Y|Z(x, ϕ0(x)|z)ψ(x)dx, (4.1) and

ˆ (z) = Z

fˆX,Y|Z(x,ϕ(x)|z)ψˆ (x)dx, (4.2) where z ∈ Z, ψ L2[0,1]. Under Assumptions A.4 (ii)-(iii), operator A is compact. Under Assumption 1 (ii), the Q-TiR estimator satisfies w.p.a. 1 the first-order condition

0 = d

dεLT( ˆϕ+εψ)

¯¯

¯¯

ε=0

= 2 DAˆ

³Aˆ( ˆϕ)−τ

´

+λTϕ, ψˆ E

H, (4.3)

for allψ∈Hl[0,1], where operator ˆAis defined by D

ϕ,Aˆψ E

H := T τ(1−τ)1 PT

t=1ˆ (Zt)ψ(Zt), for any ϕ∈Hl[0,1] and ψ L2(FZ, τ). Operator ˆA is the empirical counterpart of the ad- joint operator A of A w.r.t. the Sobolev scalar product on Hl[0,1]. For l = 1, we have A = D−1A, where ˜˜ A denotes the adjoint of operator A w.r.t. the L2 scalar product, de- fined by ˜Aψ(x) = τ(1−τ)1 R

fX,Y,Z(x, ϕ0(x), z)ψ(z)dz. Operator D−1 denotes the inverse of D:H02[0,1]→L2[0,1] with H02[0,1] :=©

ϕ∈H2[0,1] :∇ϕ(0) =∇ϕ(1) = 0ª

andD:= 1− ∇2 (see the supplementary materials for the derivation with l 1). From (4.3) ˆϕ satisfies the nonlinear integro-differential equation of Type II:

Aˆ

³Aˆ( ˆϕ)−τ

´

+λTϕˆ= 0. (4.4)

4.2. Highlighting the nonlinearity issue. We can rewrite Equation (4.4) by using the second-order expansion

Ab( ˆϕ) =Ab(ϕ0) + ˆA0∆ ˆϕ+ ˆR, where

Aˆ0ψ(z) :=

Z

fˆX,Y|Z(x, ϕ0(x)|z)ψ(x)dx, ∆ ˆϕ:= ˆϕ−ϕ0,

and ˆR:= ˆR( ˆϕ, ϕ0) is the second-order residual term (see Lemma A.2). Then, after rearranging terms and performing an asymptotic expansion, we get the linearization (Appendix A.2):

∆ ˆϕ= (λT +AA)−1Aζˆ+BT +RT + ˆKT (∆ ˆϕ), (4.5) where ˆζ(z) :=R R

1{y ≤ϕ0(x)})∆ ˆfX,Y,Zf (x,y,z)

Z(z) dydx, ∆ ˆf = ˆf−f, and BT :=

h

T +AA)−1AA−1 i

ϕ0 (4.6)

is the regularization bias function. The remainder termRT accounts for kernel estimation of operatorAand its expression is given in (A.3) below. The Bahadur-type representation (4.5) of the Q-TiR estimator (see e.g. Koenker (2005)) is key to asymptotic normality.

(9)

7

The major difference between our ill-posed setting and standard finite-dimensional paramet- ric estimation problems, or well-posed functional estimation problems, concerns the behaviour (and the complex control) of the nonlinearity term ˆKT(∆ ˆϕ) :=−

³

λT + ˆA0Aˆ0

´−1

Aˆ0R, whereˆ Aˆ0 is defined as ˆA, but with ϕ0 substituted for ˆϕ. We prove in Lemma A.3 that ˆKT(∆ ˆϕ) satisfies a quadratic bound

°°

°KˆT(∆ ˆϕ)

°°

° C

√λT k∆ ˆϕk2, (4.7)

w.p.a. 1, with a suitable constantC. In the RHS of (4.7) the coefficient of the quadratic bound diverges as the sample size increases. Hence, the usual argument that the quadratic nonlinearity term is negligible w.r.t. the first-order term no matter the convergence rate of the latter, does not apply. Still, under Assumptions 2-4 discussed below and ensuring k∆ ˆϕk =op¡√

λT¢ , we can control the remainder termsRT and ˆKT (∆ ˆϕ).

4.3. Assumptions for asymptotic normality. Let ϕλ := arg infϕ∈ΘQ(ϕ) +λkϕk2H, λ >0,denote the nonlinear Tikhonov solution in the population. We will make the following assumptions as well as the more technical Assumptions A.4-5 stated in the Appendix.

Assumption 2: The solution ϕλ is unique and the equation DA(ϕ)(A(ϕ)−τ) +λϕ= 0, ϕ∈Θ, admits the unique solution ϕ=ϕλ,for all small λ >0 .

This assumption involves the first-order condition for minimization of the penalized mini- mum distance criterion in the population. It rules out local extrema different from the global minimumϕλ.

Assumption 3: = 0 for ψ∈L2[0,1]if and only if ψ= 0.

This is a local identification condition; see Chernozhukov, Imbens and Newey (2007) for examples and discussion.

Assumption 4: (i) The function ϕ0 satisfies the source condition P

j=1

j0i2H

νj < ∞, δ (1/2,1], where νj & 0 and φj are the eigenvalues and eigenfunctions of the compact, self-adjoint operator AA on Hl[0,1], with jkH = 1. (ii)P

j=1

j0i2H

νj <1/c2, where c:=

supx,y,z¯¯yfX,Y|Z(x, y|z)¯¯. (iii) Γ (λ) := infψ∈Hl[0,1]:

kψk=1

kAψk2L2(FZ,τ)+λkψk2H ≥Cλa, for C >0, a∈(0,1/2).

The source condition in Assumption 4 (i) controls the bias contribution in the linearized problem as in the proof of Proposition 3.11 in CFR, and implies (see Appendix A.2)

kBTkH =O

³ λδT

´

. (4.8)

Assumption 4 (ii) is the analog of Assumption 6 in Horowitz and Lee (2007) for Sobolev penalization. It controls the bias in the nonlinear problem along the lines of Proposition 10.7 in Engl, Hanke and Neubauer (2000). Together with Assumption 4 (i) it impliesλT −ϕ0kH =

(10)

8

O¡ λδT¢

(see the proof of Lemma B.8 in the supplementary materials). Finally, Assumption 4 (iii) implies that

ϕ∈Θ:inf

kϕ−ϕ0k≥d λ

Q(ϕ) +λkϕk2H −λkϕ0k2H ≥cλΓ (λ), (4.9)

as λ 0, for any c < d2 (see Lemma B.8 in the supplementary materials). Inequality (4.9) replaces the usual ”identifiable uniqueness” condition (White and Wooldridge (1991)) infϕ∈Θ:kϕ−ϕ0k≥εQ(ϕ)−Q0)>0, which does not hold with ill-posedness (see Proposition 1 (a)). It is sharp in the sense that for a linear problem the LHS of (4.9) is equal tod2λΓ (λ) + O¡

λ3/2¢

. Assumptions 2-4 are used to control the large deviation probability P[k∆ ˆϕk ≥ d√

λT], d >0 (see Lemma B.9 in the supplementary materials). This yields an upper bound for the difference between ∆ ˆϕand its linearization (see Lemmas A.4, A.5 and A.7).

In order to discuss plausibility of Assumption 4 (iii), let ˜j : j N} be an orthonormal basis inL2[0,1] of eigenfunctions of ˜AAto eigenvalues ˜νj. Consider the two conditions

(a) X j,k=1:j6=k

˜j˜ki2H

˜jk2H˜kk2H <1, (b)˜jk2H ≥Cωj, C >0, ∀j≥1.

The first condition states that the eigenfunctions are not very correlated. The second condition gives a lower bound on the speed of divergence of the Sobolev norms of the eigenfunctions in terms of a sequence of constants ωj specified below. Then conditions (a) and (b) imply Assumption 4 (iii) witha= α+2l˜α˜ for an hyperbolic decay ˜νj ≥Cj−˜α,C,α >˜ 0, corresponding to mild ill-posedness, if

ωj =j2l and 2l >α.˜

Moreover, conditions (a) and (b) imply Assumption 4 (iii) with anya∈(0,1/2) for a geometric decay of ˜νj, corresponding to severe ill-posedness, if

ωj = expj2 and l=∞.

Note that the orthonormal basis of L2[0,1] given by ˜ψ1(x) = 1, ˜ψj(x) =

2 cos(π(j 1)x), j = 2,3, . . . satisfies the inequalities underlying (a) and (b) since ˜j˜kiH = 1{j = k}Pl

s=0as(πj)2s. Hence conditions (a) and (b) mean that the behavior of the eigenfunctions φ˜j and the orthonormal basis ˜ψj should not be too different. They are related to the adaptation condition (8.28) for operatorAin Engl, Hanke and Neubauer (2000).

The regularity conditions on the eigenfunctions ofAAare more restrictive than in Horowitz and Lee (2007), and Chen and Pouzo (2008) (see also the discussion of Assumption A.5 in Appendix 1). They are required to establish pointwise asymptotic normality.

(11)

9

4.4. Asymptotic normality. Let us define the following quantities σT2(x) :=

X j=1

νj

T +νj)2φ2j(x) and VTT) := 1 T

Z

σT2(x)dx.

The following proposition computes the limit distribution of the Q-TiR estimator for (any sequence of) typical points. By (sequence of) typical points, we mean setsXT ⊂ X = [0,1]

such that forx∈ XT

VTT)

σT2(x)/T =O(1), (4.10)

that is, the variance at a typical pointx is not much smaller than the integrated variance.

Proposition 3: Suppose Assumptions 1-4 and A hold. Let hT and λT be such that:

(a)hT ³T−η and λT ³T−γ with 0< η < 2(d1

Z+2), 0< γ < 12min n

1−η(dZ+ 1), mη,1+a1 o

; (b)λT =O(VTT)),VTT) =oT); (c)h1/4T VVTT;2)

TT) =o(1)andh2mT VTVT;2m+ε1)

TT) =o(1), for someε1 >1, where VTT;ε) := T1 P

j=1 νj

Tj)2 jk2jε. Then for any sequence x∈ XT q

T /σT2(x) ( ˆϕ(x)−ϕ0(x)− BT(x))d N(0,1).

Furthermore, ifλT =o(VTT)), q

T /σT2(x) ( ˆϕ(x)−ϕ0(x))d N(0,1).

The variance functionσT2(x) involves the spectrum of operatorAA and the regularization parameter λT. Since P

j=1νj−1jk2 = and by using (4.10), σT2(x) → ∞ as λT 0.

Thus, σT2(x) summarizes the impact of ill-posedness on the nonparametric convergence rate q

T /σ2T(x). The conditions (a)-(c) onhT and λT ensure that the asymptotic distribution of the nonlinear estimator ˆϕis the same as the one of its linearized counterpart. These conditions depend on the instrument dimensiondZ, the smoothness properties of the parameterϕ0 viam and the coefficientsj, ϕ0iH, and the severity of ill-posedness, through the decay behaviour of the spectrum. In particular, we need δ > 1/2 > a in Assumption 4. Since BT(x) = O(kBTkH) = O¡

λδT¢

as shown in the proof of Proposition 3, λT =o(VTT)) is a sufficient condition for bias negligibility at a typical point. Below in Remark 1 we state an example to discuss the mutual compatibility of the conditions onhT and λT for linearization and bias negligibility. Finally, we show in the supplementary materials that under a strengthening of Assumption A.5 (iii), the mean integrated square error (MISE) is asymptotically likeE[kϕˆ ϕ0k2] =MTT) (1 +o(1)), where MTT) :=R ¡1

TσT2(x) +BT(x)2¢ dx.

4.5. Estimating the asymptotic variance. The estimation of the asymptotic variance of ˆϕ requires the estimation of the spectrum of operatorAA. Let us assume that the eigenvalues are such that νj = C1e−α1jj−α2, for large j and some constants C1 > 0, α1, α2 0. This

(12)

10

specification accommodates both geometric decay (α1 > 0 and α2 = 0, severe ill-posedness) and hyperbolic decay (α1 = 0 and α2 > 0, mild ill-posedness). Similarly, the decay of the eigenfunction values atx∈ X is such thatφj(x)2 =C2e−β1jj−β2 for largejand some constants C2 >0,β1, β20. LetnT ≤NT be integers growing withT. Let ¯νj and ¯φj, forj= 1,· · · , nT, be the firstnT eigenvalues and eigenfunctions of operator ¯AA,¯ with°

°φ¯j°

°H = 1,P-a.s., where A¯=DAˆ( ¯ϕ) is based on a pilot estimator ¯ϕ. The pilot estimator ¯ϕis a Q-TiR estimator with regularization parameter ¯λT and bandwidth ¯hT satisfying the assumptions of Proposition 3.

The estimators ¯α1 and ¯α2 are computed by Ordinary Least Squares (OLS) in the regression log ¯νj =c−α1j−α2logj+εj for j = 1,· · ·, nT, where c = logC1 and εj is the error term.

Similarly, we get the estimators ¯β1and ¯β2by OLS regression of log ¯φj(x)2 on a constant,jand logj. Define ˆνj = ¯νj and ˆφj(x)2 = ¯φj(x)2for 1≤j≤nT, and ˆνj = ¯νnTe−¯α1(j−nT)

³ j nT

´−¯α2 and φˆj(x)2 = ¯φnT(x)2eβ¯1(j−nT)

³ j nT

´β¯2

fornT < j ≤NT. The estimator of the variance function is ˆσT2(x) =PNT

j=1 ˆ νj

νjT)2φˆj(x)2.The spectrum of AA may be difficult to estimate precisely forj close to the truncation parameter NT. This explains the introduction of parameternT and the extrapolation procedure betweennT and NT for the estimated spectrum.

Proposition 4: Assume that νj = C1e−α1jj−α2 and φj(x)2 = C2e−β1jj−β2, for large j.

Denote ∆νj := mink≤jk+1−νk) and S(n) :=P

j=n+1νjjk2,n∈N. Let NT be such that S(NT) =o¡

T λ2TVTT

, (4.11)

and nT such that NT −nT =O(logT) and 1

φnT(x)2∆νnT µ 1

T h2T +h2mT +MT ¡λ¯T¢¶1/2

=Op

³ T−b

´

, (4.12)

for b >0. Then, under Assumptions 1-4 and A, σσˆT22(x) T(x)

p 1.

Condition (4.11) ensures that the truncation bias is negligible. Condition (4.12) ensures that ¯νj and ¯φj(x)2 are consistent in relative terms uniformly over 1≤j ≤nT. It involves the asymptotic MISEMTλT) of the pilot estimator. The conditionNT −nT =O(logT) and the extrapolation procedure yield relative consistency of ˆνj and ˆφj(x)2uniformly over 1≤j≤NT. Proposition 4 can be modified to handleνj/(e−α1jj−α2) andφj(x)2/(e−β1jj−β2) slowly varying inj.

Remark 1. Let us verify that, for severe ill-posedness and hyperbolic decay of the eigen- functions (α1, β2 > 0 and α2 = β1 = 0 in Proposition 4), there exist admissible values of γ, η, γ¯, η, so that the conditions on¯ λT, hT, λ¯T, h¯T, nT, NT, to get Propositions 3 and 4 are all compatible. Assume jk2 ³ j−β2. Then VTT;ε) ³ T λ 1

T[log(1/λT)]β2−ε for ε 0, and σT2(x) ³ λ 1

T[log(1/λT)]β2 (see GS). If m(1+2δ)2 < η <¯ d1

Z+12δ−1

2δ+1, 1+2δ1 < ¯γ <

(13)

11 12min

n

1(dZ+ 1)¯η, mη,¯ 1+a1 o

, the pilot estimator satisfies the assumptions of Proposition 3. If m(1+2δ)2 < η < d 1

Z+12δ−1

2δ+1, 1+2δ1 < γ < 12min n

1(dZ+ 1)η, mη,1+a1 o

, estimator ϕˆ is asymptotically normal with vanishing bias. The conditions on γ, η, ¯γ, η¯ are compatible if m >2(dZ+1)andδ > 12+max

ndZ+1 m , a

o

.SinceS(n)≤Ce−α1n and∆νj =νj+1−νj ³e−α1j, conditions (4.11) and (4.12) are satisfied for nT = c1logT and NT = c2logT such that c1 < α1

1 min n1−2η

2 , mη,1−¯2γ, δ¯γ o

and c2 > α1

1γ. Then q

T /ˆσT2(x) ( ˆϕ(x)−ϕ0(x)) d N(0,1), from which asymptotically valid pointwise confidence intervals can be computed.

5. Monte-Carlo results and empirical illustration

5.1. Monte-Carlo results. We consider an experiment, following Newey and Powell (2003), where the errors U1, U2 and the instrument Z follow a trivariate normal distribution, with zero means, unit variances and a correlation coefficient of 0.5 between U1 and U2. We take X = Z +U2, and build X = Φ (X), U = Φ(U1) and Y = sin (πX) +U. The median condition is P[Y −ϕ0(X)0|Z] =.5,where the functional parameter is ϕ0(x) = sin (πx), x [0,1]. We consider sample size T = 1000 and a classical Sobolev penalty of order l {1,2}. For l = 1 we consider ¯λ = λ ∈ {.00002, .00005, .0001}, while for l = 2 we consider

¯λ=λ∈ {.0000003, .000001, .000003}. We use Gaussian kernels and select the bandwidth with the standard rule of thumb (Silverman (1986)). To compute ˆϕand ¯ϕwe opt for a numerical series approximation based on standardized shifted Chebyshev polynomials of the first kind, and a user-supplied analytical gradient and Hessian optimization procedure. We report results using 16 polynomials (order 0 to 15); results using the first 8, or more, polynomials are nearly identical. The variance estimator ˆσT2(x) is computed with NT = nT = 4. Figure 1 displays the QQ-plots of the finite sample distributions of

q

T /ˆσ2T(x) ( ˆϕ(x)−ϕ0(x)), x=.5,forl= 1,

¯λ=λ=.00005 (left), andl= 2, ¯λ=λ=.000001 (right), built on 1000 replications. The finite sample distributions are close to the standard normal distribution. For the selected values of

¯λ, λ, the regularization bias is rather small. Table 1 reports the finite sample coverage of pointwise confidence intervals forϕ0(x),x∈ {.10, .25, .50, .75, .90} using the different values of l, ¯λ, λ. Let us first consider the results for l = 1 (left panel). The finite sample coverage is close to the nominal coverage at 90%, 95%, 99% for x = .5 and ¯λ =λ =.00005. At x =.5 we observe overrejection for ¯λ=λ=.0001 because of regularization bias, and underrejection for ¯λ = λ = .00002 because of a too small number of terms in the estimated variance. For instance, with NT = nT = 5 and at x = .5, the finite sample coverage at 90%, 95%, 99% is .964,.985,.998 forl= 1, ¯λ=λ=.00002, and.947,.974,.999 forl= 2, ¯λ=λ=.0000003. For x=.10, .25, .75, .90 and the considered values of ¯λ, λ, we typically observe some overrejection.

Finally, the results forl= 2 are qualitatively similar to those for l= 1.

(14)

12

5.2. Empirical illustration. This section presents an empirical illustration with U.S. long- distance call data extracted from the sample of Hausman and Sidak (2004). They investigate nonlinear price schedules chosen by consumers of message toll service offered by long-distance interexchange carriers. We estimate median structural effects for nonlinear pricing curves based on the conditional quantile condition P[Y ≤ϕ0(X)|Z] = .5, with X = Φ (min) and Z= Φ (Inc). VariableY is the price per minute in dollars,minis the standardized amount of use in minutes, andInc is the standardized logarithm of annual income. We look at clients of a leading long-distance interexchange carrier with age between 30 and 45. To study the effect of education on the chosen nonlinear price schedule we divide the sample into people with at most 12 years of education (T = 978), and people with more than 12 years of education (T = 435).

We set ¯λ=.0001, and start the optimization algorithm with the NIVR estimates of GS. The specification test of Gagliardini and Scaillet (2007) does not reject the null hypothesis of the correct specification of the moment restriction used in estimating the mean pricing curve at the 5% significance level (p-value =.32, .77). We apply a reduction factor.8 to the regularization parameters selected by the heuristic data-driven procedure of GS run on the linearization. For the two education categories, we get λ= .154, .034 under a Sobolev penalty with l= 1, and λ = .088, .001 under a Sobolev penalty with l = 2. We present the empirical results with sixteen polynomials. They remain virtually unchanged when increasing gradually the number of polynomials from eight to sixteen. There is a stabilization of the value of the optimized objective function, of the loadings in the numerical series approximation, and of the data- driven regularization parameter. Higher order polynomials receive loadings which are closer and closer to zero. This suggests that we can limit ourselves to a small number of polynomials in this application.

Figure 2 plots the estimated NIV median structural effect, pointwise asymptotic confidence intervals at 95%, and the linear IVQR estimate for the two education categories. The upper panel forl= 1 and the lower panel forl= 2 show that estimated NIVQR and IVQR structural effects are close, and their patterns differ across the two education categories. As in Hausman and Sidak (2004) we observe that less educated customers pay more than better educated customers when the number of minutes of use increases. A possible explanation is that the latter exploit better the tariff options for long-distance calls available at those ranges.

Figure 3 is a picture ”`a la box-plot” where we represent the estimated quartile structural effects and the estimated mean structural effect withl= 1. The box-plot interpretation comes from the conditional probability of the shaded area being asymptotically P[g(X,0.25)≤Y ≤g(X,0.75)|Z =z] = .5, for any given value z of the instrument. Ver- tical sections of the shaded areas correspond to measures of dispersion. For both education

(15)

13

categories, the dispersion is smaller at high usage of the service, likely because of the effort of people to find the most convenient tariffs.

References

[1] Adams, R. and J. Fournier (2003):Sobolev Spaces, Academic Press, Boston.

[2] Ai, C. and X. Chen (2003): ”Efficient Estimation of Models with Conditional Moment Restrictions Con- taining Unknown Functions”, Econometrica, 71, 1795-1843.

[3] Alt, H. (1992):Linear Functional Analysis, Springer, Berlin.

[4] Andrews, D. (1994): ”Empirical Process Methods in Econometrics”, in theHandbook of Econometrics, Vol.

4, Engle, R. and D. McFadden, Eds., North Holland, 2247-2294.

[5] Blundell, R., Chen, X. and D. Kristensen (2007): ”Semi-Nonparametric IV Estimation of Shape Invariant Engel Curves”, Econometrica, 75, 1613-1669.

[6] Bosq, D. (1998):Nonparametric Statistics for Stochastic Processes, Springer Verlag, Berlin.

[7] Bosq, D. (2000):Linear Processes in Function Spaces. Theory and Applications, Springer Verlag, Berlin.

[8] Carrasco, M., Florens, J.-P. and E. Renault (2007): ”Linear Inverse Problems in Structural Econometrics:

Estimation Based on Spectral Decomposition and Regularization”, Handbook of Econometrics, Vol. 6, Heckman, J. and E. Leamer, Eds., North Holland, 5633-5751.

[9] Chen, X. and D. Pouzo (2008): ”Estimation of Nonparametric Conditional Moment Models with Possibly Nonsmooth Moments”, Cowles Foundation Discussion Paper 1650.

[10] Chen, X. and D. Pouzo (2009): ”Efficient Estimation of Semiparametric Conditional Moment Models with Possibly Nonsmooth Residuals”, forthcoming in Journal of Econometrics.

[11] Chernozhukov, V. and C. Hansen (2005): ”An IV Model of Quantile Treatment Effect”, Econometrica, 73, 245-261.

[12] Chernozhukov, V., Imbens, G. and W. Newey (2007): ”Instrumental Variable Estimation of Nonseparable Models”, Journal of Econometrics, 139, 4-14.

[13] Darolles, S., Florens, J.-P. and E. Renault (2003): ”Nonparametric Instrumental Regression”, Working Paper.

[14] Dubinskij, J. (1986): Sobolev Spaces of Infinite Order and Differential Equations, Kluwer academic pub- lishers, Dordrecht.

[15] Engl, H., Hanke, M. and A. Neubauer (2000): Regularization of Inverse Problems, Kluwer academic pub- lishers, Dordrecht.

[16] Engl, H., Kunisch, K. and A. Neubauer (1989): ”Convergence Rates for Tikhonov Regularization of Non- linear Ill-posed Problems”, Inverse problems, 5, 523-540.

[17] Gagliardini, P. and O. Scaillet (2006): ”Tikhonov Regularization for Nonparametric Instrumental Variable Estimators”, Swiss Finance Institute DP 2006.33.

[18] Gagliardini, P. and O. Scaillet (2007): ”A Specification Test for Nonparametric Instrumental Variable Regression”, Swiss Finance Institute DP 2007.13.

[19] Groetsch, C. (1984): The Theory of Tikhonov Regularization for Fredholm Equations of the First Kind, Pitman Advanced Publishing Program, Boston.

[20] Hall, P. and J. Horowitz (2005): ”Nonparametric Methods for Inference in the Presence of Instrumental Variables”, Annals of Statistics, 33, 2904-2929.

Références

Documents relatifs

Table 8 shows the average pinball loss comparison between the nonparametric quantile regression without (npqr) and with (noncross) non-crossing constraints. The figures in

To get a rate of convergence for the estimator, we need to specify the regularity of the unknown function ϕ and compare it with the degree of ill-posedness of the operator T ,

Based on local poly- nomial regression, we propose estimators for weighted integrals of squared derivatives of regression functions.. The rates of convergence in mean square error

Nonlinear ill-posed problems, asymptotic regularization, exponential integrators, variable step sizes, convergence, optimal convergence rates.. 1 Mathematisches Institut,

On the basis of the results obtained when applying this and the previously proposed methods to experimental data from a typical industrial robot, it can be concluded that the

By taking account of model uncertainty in quantile regression, there is the possibility of choosing a single model as the “best” model, for example by taking the model with the

Table 3 provides estimates of the model parameters for the median case (τ = 0.5) as well as a comparison with the results obtained by three different estimation approaches, namely

We report the averaged bias (Bias) and relative efficiency (EF F ) of the estimates of β 0 , β 1 and β 2 using different quantile regres- sion methods (ordinary quantile