Tikhonov Regularization for Nonparametric Instrumental Variable Estimators

(1)

Report

Reference

Tikhonov Regularization for Nonparametric Instrumental Variable Estimators

GAGLIARDINI, P., SCAILLET, Olivier

Abstract

We study a Tikhonov Regularized (TiR) estimator of a functional parameter identified by conditional moment restrictions in a linear model with both exogenous and endogenous regressorts. The nonparametric instrumental variable estimator is based on a minimum distance principle with penalization by the norms of the parameter and its derivative. After showing its consisteny in the Sobolev norm we derive the expression of the asymptotic Mean Integrated Square Error. The convergence rate with optimal value of the regularization parameter is characterized in two examples. We illustrate our theoretical findings and the small sample properties with simulation results. Finally, we provide an empirical appli8cation to estimation of an Engel curve, and discuss a data driven selection procedure for the regularization parameter.

GAGLIARDINI, P., SCAILLET, Olivier. Tikhonov Regularization for Nonparametric Instrumental Variable Estimators. 2006

Available at:

http://archive-ouverte.unige.ch/unige:5742

Disclaimer: layout of this document may differ from the published version.

(2)

2006.08

FACULTE DES SCIENCES

ECONOMIQUES ET SOCIALES

HAUTES ETUDES COMMERCIALES

TIKHONOV REGULARIZATION FOR

NONPARAMETRIC INSTRUMENTAL VARIABLE ESTIMATORS

P. GAGLIARDINI

O. SCAILLET

(3)

TIKHONOV REGULARIZATION FOR NONPARAMETRIC INSTRUMENTAL VARIABLE ESTIMATORS

P. Gagliardini and O. Scaillet

^y

This version: December 2009

^z

(First version: May 2006)

University of Lugano and Swiss Finance Institute. Corresponding author: Patrick Gagliardini, Univer- sity of Lugano, Faculty of Economics, Via Bu¢ 13, CH-6900 Lugano, Switzerland. Tel.: ++ 41 58 666 4660.

Fax: ++ 41 58 666 4734. Email: patrick.gagliardini@usi.ch.

yHEC Université de Genève and Swiss Finance Institute.

zWe thank the editor, the associate editor, and the two referees for helpful comments. An earlier version of this paper circulated under the title “Tikhonov regularization for functional minimum distance estimators”. Both authors received support by the Swiss National Science Foundation through the National Center of Competence in Research: Financial Valuation and Risk Management (NCCR FINRISK). We also thank Joel Horowitz for providing the dataset of the empirical section and many valuable suggestions as well as Manuel Arellano, Xiaohong Chen, Victor Chernozhukov, Jean-Pierre Florens, Oliver Linton, Enno Mammen, seminar participants at the University of Geneva, Catholic University of Louvain, University of Toulouse, Princeton University, Columbia University, ECARES, MIT/Harvard, CREST, Queen Mary’s College, Maas- tricht University, Carlos III University, ESRC 2006 Annual Conference in Bristol, SSES 2007 Annual Meeting in St. Gallen, the Workshop on Statistical Inference for Dependent Data in Hasselt, ESAM 2007 in Brisbane and ESEM 2007 in Budapest for helpful comments.

(4)

Tikhonov Regularization for Nonparametric Instrumental Variable Estimators Abstract

We study a Tikhonov Regularized (TiR) estimator of a functional parameter identi…ed by conditional moment restrictions in a linear model with both exogenous and endogenous regressors. The nonparametric instrumental variable estimator is based on a minimum distance principle with penalization by the norms of the parameter and its derivative. After showing its consistency in the Sobolev norm we derive the expression of the asymptotic Mean Integrated Square Error. The convergence rate with optimal value of the regularization parameter is characterized in two examples. We illustrate our theoretical …ndings and the small sample properties with simulation results. Finally, we provide an empirical application to estimation of an Engel curve, and discuss a data driven selection procedure for the regularization parameter.

Keywords and phrases: Minimum Distance, Nonparametric Estimation, Ill-posed In- verse Problems, Tikhonov Regularization, Endogeneity, Instrumental Variable, Engel curve.

JEL classi…cation: C13, C14, C15, D12.

AMS 2000 classi…cation: 62G08, 62G20.

(5)

1 Introduction

Kernel and sieve estimators provide inference tools for nonparametric regression in empirical economic analysis. Recently, several suggestions have been made to correct for endogeneity in such a context, mainly motivated by functional instrumental variable (IV) estimation of structural equations. Newey and Powell (NP, 2003) consider nonparametric estimation of a function, which is identi…ed by conditional moment restrictions given a set of instruments.

Ai and Chen (AC, 2003) opt for a similar approach to estimate semiparametric speci…cations.

Darolles, Florens and Renault (DFR, 2003) and Hall and Horowitz (HH, 2005) concentrate on nonparametric IV estimation of a regression function. Horowitz (2007) shows the pointwise asymptotic normality for an asymptotically negligible bias. Horowitz and Lee (2007) extend HH to nonparametric IV quantile regression (NIVQR). Florens (2003) and Blundell and Powell (2003) give further background on endogenous nonparametric regressions.

There is a growing recent literature in econometrics extending the above methods and considering empirical applications. Blundell, Chen and Kristensen (BCK, 2007) investigate application of index models to Engel curve estimation with endogenous total expenditure.

As argued, e.g., in Blundell and Horowitz (2007), the knowledge of the shape of an Engel curve is a key ingredient of any consumer behaviour analysis. Chen and Pouzo (2008, 2009) consider a general semiparametric setting including partially linear quantile IV regression, and apply their results to sieve estimation of Engel curves. Further, Chen and Ludvigson (2009) consider asset pricing models with functional speci…cations of habit preferences; Cher- nozhukov, Imbens and Newey (2007) estimate nonseparable models for quantile regression

(6)

analysis; Loubes and Vanhems (2004) discuss the estimation of the solution of a di¤erential equation with endogenous variables for microeconomic applications. Other related works include Chernozhukov and Hansen (2005), Florens, Johannes and Van Bellegem (2005), Horowitz (2006), Hoderlein and Holzmann (2007), and Hu and Schennach (2008).

The main theoretical di¢ culty in nonparametric estimation with endogeneity is overcom- ing ill-posedness of the associated inverse problem (see Kress (1999), and Carrasco, Florens and Renault (CFR, 2007) for overviews). It occurs since the mapping of the reduced form parameter (that is, the distribution of the data) into the structural parameter (the instrumental regression function) is not continuous. We need a regularization of the estimation to recover consistency. For instance, DFR and HH adopt an L² regularization technique resulting in a kind of ridge regression in a functional setting.

The aim of this paper is to introduce a new minimum distance estimator for a functional parameter identi…ed by conditional moment restrictions in a linear model with both exogenous and endogenous regressors. We consider a penalized extremum estimator which minimizesQ_T (') + _TG('), where Q_T(')is a minimum distance criterion in the functional parameter ', G(') is a penalty function, and T is a positive sequence converging to zero.

The penalty function G(') exploits the Sobolev norm of function ', which involves the L² norms of both ' and its derivative r'. The basic idea is that the penalty term TG(') damps highly oscillating components of the estimator. These oscillations are otherwise unduly ampli…ed by the minimum distance criterionQ_T (')because of ill-posedness. Parameter

T tunes the regularization. We call our estimator a Tikhonov Regularized (TiR) estimator

(7)

by reference to the pioneering papers of Tikhonov (1963a,b) where regularization is achieved via a penalty term incorporating the function and its derivative (Groetsch (1984)). The TiR estimator admits a closed form and is numerically tractable.

The key contribution of our paper is the computation of an explicit asymptotic expression for the mean integrated squared error (MISE) of a Sobolev penalized estimator in an NIVR setting with both exogenous and endogenous regressors. Such a sharp result extends the asymptotic bounds of HH obtained under a L² penalty. Our other speci…c contributions are consistency of the TiR estimator in the Sobolev norm (and as a consequence uniform consistency), and a detailed analytic treatment of two examples yielding the optimal value of the regularization parameter.

Our paper is related to di¤erent contributions in the literature. To address ill-posedness NP and AC propose to introduce bounds on the norms of the functional parameter of interest and of its derivatives. This amounts to set compactness on the parameter space. This approach does not yield a closed-form estimator because of the inequality constraint on the functional parameter. In their empirical application, BCK compute a penalized estimator similar to ours. Their estimation relies on series estimators instead of kernel smoothers that we use. Chen and Pouzo (2008, 2009) examine the convergence rate of a sieve approach for an implementation as in BCK.

In de…ning directly the estimator on a function space, we follow the route of Horowitz and Lee (2007) and the suggestion of NP, p. 1573 (see also Gagliardini and Gouriéroux (2007), Chernozhukov, Gagliardini, and Scaillet (CGS, 2006)). Working directly over an

(8)

in…nite-dimensional parameter space (and not over …nite-dimensional parameter spaces of increasing dimensions) allows us to develop a well-de…ned theoretical framework which uses the penalization parameter as the single regularization parameter. In a sieve approach, either the number of sieve terms, or both the number of sieve terms and the penalization coe¢ cient, are regularization parameters that need to be controlled (see Chen and Pouzo (2008, 2009) for a detailed treatment). As in the implementation of a sieve approach, our computed estimator uses a projection on a …nite-dimensional basis of polynomials. The approximation error is of a purely numerical nature, and not of a statistical nature as in a sieve approach where the number of sieve terms can be used as a regularization parameter. The dimension of the basis should be selected large to get a small approximation error. In some cases, for example when the parameter of interest is close to a line, a few basis functions are enough to successfully implement our approach. We cannot see our approach as a sieve approach with an in…nite number of terms, and both asymptotic theoretical treatments do not nest each other (see CGS for similar comments in the quantile regression case). However we expect an asymptotic equivalence between our approach and a sieve approach under a number of sieve terms growing su¢ ciently fast to dominate the decay of the penalization term. The proof of such an equivalence is left for future research.

While the regularization approach in DFR and HH can be viewed as a Tikhonov regularization, their penalty term involves theL²norm of the function only (without any derivative).

By construction this penalization dispenses from a di¤erentiability assumption of the function '. To avoid confusion, we refer to DFR and HH estimators as regularized estimators

(9)

with L² norm. In our Monte-Carlo experiments and in an analytic example, we …nd that the use of the Sobolev penalty substantially enhances the performance of the regularized estimator relative to the use of the L² penalty. Finally CGS focus on a feasible asymptotic normality theorem for a TiR estimator in an NIVQR setting. Their results can be easily specialized to the linear setting of this paper, and are not further considered here.

In Section 2 we discuss ill-posedness in nonparametric IV regression. We introduce the TiR estimator in Section 3 and prove its consistency in Sobolev norm in Section 4. In Section 5, we derive the exact asymptotic MISE of the TiR estimator. In Section 6 we discuss optimal rates of convergence in two examples, and provide an analytic comparison with L² regularization. We discuss the numerical implementation in Section 7, and we present the Monte-Carlo results in Section 8. In Section 9 we provide an empirical example where we estimate an Engel curve nonparametrically, and discuss a data driven selection procedure for the regularization parameter. Gagliardini and Scaillet (GS, 2006) give further simulation results and implementation details. The set of regularity conditions and the proofs of propositions are gathered in the Appendices. Omitted proofs of technical Lemmas are collected in a Technical Report, which is available online at our web pages.

2 Ill-posedness in nonparametric regression

Let f(Yt; Xt; Zt) :t= 1; :::; Tg be i.i.d. copies of vector (Y; X; Z), where vectors X and Z are decomposed as X := (X₁; X₂) and Z := (Z₁; X₁). Let the supports of X and Z be X := X¹ X² and Z := Z¹ X¹; where Xⁱ := [0;1]^d^Xi, i = 1;2, and Z¹ = [0;1]^d^Z¹; while

(10)

the support of Y is Y R: The parameter of interest is a function'₀ de…ned on X which satis…es the NIVR:

E[Y '₀(X)jZ] = 0: (1)

The subvectors X₁ and X₂ correspond to exogenous and endogenous regressors, while Z is a vector of instruments. The conditional moment restriction (1) is equivalent to:

m_x₁ '_x₁_;0; Z₁ :=E Y '_x₁_;0(X₂)jZ₁; X₁ =x₁ = 0, for all x₁ 2 X¹;

where '_x₁_;0(:) :='₀(x₁; :). For any given x₁ 2 X¹, the function '_x₁_;0 satis…es a NIVR with endogenous regressors X₂ only. Parameter '₀ is such that, for all x₁ 2 X¹, the function '_x₁_;0 belongs to a subset of the Sobolev space H¹(X²) of order 1, i.e., the completion of the linear space f 2C¹(X²)j r 2L²(X²);j j 1g with respect to the scalar producth 1; ₂iH¹(X²) := X

j j 1

hr 1;r 2iL²(X²), where h 1; ₂iL²(X²) :=

Z

X2

1(u) ₂(u)du and 2N^d^X² is a multi-index. See CGS for use of Sobolev spaces of higher order. The Sobolev space H¹(X²) is an Hilbert space w.r.t. the scalar product h 1; ₂i^H¹⁽X2); and the corresponding Sobolev norm is denoted byk kH¹(X2) :=h ; i¹⁼²_H¹₍_X₂₎. We denote theL² norm by k kL²(X2) :=h ; i¹⁼²L²(X2). Further, we assume the following identi…cation condition.

Assumption 1: '_x₁_;0 is the unique function '_x₁ 2 that satis…es the conditional moment restriction m_x₁ '_x₁; Z₁ = 0, for all x₁ 2 X¹:

We refer to NP, Theorems 2.2-2.4, for su¢ cient conditions ensuring Assumption 1. In par- ticular, the order condition d_Z₁ d_X₂ is not a necessary condition for identi…cation. Since we work below with a penalized quadratic criterion in the parameter of interest, we do not

(11)

need further assumptions on the parameter set , such as compactness. See Chen (2007), Horowitz and Lee (2007), and Chen and Pouzo (2008, 2009) for similar noncompact settings.

Let us now consider a given x₁ 2 X¹ and a nonparametric minimum distance approach for '_x₁_;0. This relies on '_x₁_;0 minimizing

Q_x₁_;₁('_x₁) := Eh

x1;0(Z₁)m_x₁ '_x₁; Z₁ ² jX₁ =x₁i

; '_x₁ 2 ; (2)

where x1;0 is a nonnegative function onZ¹:The conditional moment functionm_x₁ '_x₁; z₁ can be written as:

m_x₁ '_x₁; z₁ = A_x₁'_x₁ (z₁) r_x₁(z₁) = A_x₁ '_x₁ (z₁); (3)

where '_x₁ := '_x₁ '_x₁_;0, linear operator A_x₁ is de…ned by A_x₁'_x₁ (z₁) :=

Z '_x

1(x₂)f_X₂_j_Z(x₂jz)dx₂ andr_x₁(z₁) :=

Z

yf_Y_j_Z(yjz)dy;wheref_X₂_j_Zandf_Y_j_Z are the conditional densities ofX₂ givenZ, andY givenZ. Assumption 1 on identi…cation of'_x₁_;0holds if and only if operatorA_x₁ is injective for allx₁ 2 X¹. Further, we assume thatA_x₁ is a bounded operator from L²(X²) to L²_x

1(Z¹), where L²_x

1(Z¹) is the L² space of square integrable functions ofZ₁ de…ned by scalar producth 1; ₂i^L²x1(Z1) =E[ _x₁_;0(Z₁) ₁(Z₁) ₂(Z₁)jX₁ =x₁]: The limit criterion (2) becomes

Q_x₁_;₁('_x

1) = hA_x₁ '_x

1; A_x₁ '_x

1iL²_x₁(Z¹) (4)

= h '_x₁; A_x

1A_x₁ '_x₁i^H¹⁽X2) =h '_x₁;A~_x₁A_x₁ '_x₁i^L²⁽X2); whereA_x

1, resp. A~_x₁, denotes the adjoint operator ofA_x₁ w.r.t. the scalar productsh:; :iH¹(X²), resp. h:; :i^L²⁽X2), and h:; :i^L²x1(Z1):

(12)

Assumption 2: The linear operator Ax1 fromL²(X²)toL²_x₁(Z¹)is compact for all x1 2 X¹.

Assumption 2 on compactness of operator A_x₁ holds under mild conditions on the conditional density f_X₂_j_Z and the weighting function x1;0 (see Assumptions B.3 (i) and B.6 in Appendix 1). Then, operatorA_x₁Ax1 is compact and self-adjoint in H¹(X²), whileA~x1Ax1

is compact and self-adjoint in L²(X²). We denote by _x₁_;j :j 2N an orthonormal basis in H¹(X²) of eigenfunctions of operator A_x

1A_x₁, and by x1;1 x1;2 > 0 the corresponding eigenvalues (see Kress (1999), Section 15.3, for the spectral decomposition of compact, self-adjoint operators). Similarly, n

~x1;j :j 2No

is an orthonormal basis in L²(X²) of eigenfunctions of operator A~_x₁A_x₁ for eigenvalues ~_x₁_;1 ~_x₁_;2 > 0. By compactness ofA_x₁Ax1 andA~x1Ax1, the eigenvalues are such that x1;j;~x1;j !0, asj ! 1, for any givenx₁ 2 X¹: The limit criterion Q_x₁_;₁('_x₁)can be minimized by a sequence '_x₁_;n in such that

'_x₁_;n ='_x₁_;0+"~

x1;n; n2N; (5)

for " > 0, which does not converge to '_x₁_;0 in L²-norm k:kL²(X2): Indeed, we have Q_x₁_;₁('_x₁_;n) ="²h~

x1;n;A~_x₁A_x₁~

x1;ni^L²⁽X2)="²~_x₁_;n !0asn! 1, but '_x₁_;n '_x₁_;0 _L₂₍

X2)

=",8n:Since" >0is arbitrary, the usual “identi…able uniqueness”assumption (e.g., White and Wooldridge (1991))

inf

'_x

12 :k^'^x1 '_x

1;0kL2(X2) "

Q_x₁_;₁('_x₁)>0 =Q_x₁_;₁('_x₁_;0); for " >0; (6)

isnot satis…ed. In other words, function '_x₁_;0 is not identi…ed in as an isolated minimum ofQx1;1. This is the identi…cation problem of minimum distance estimation with functional

(13)

parameter and endogenous regressors. Failure of Condition (6) despite validity of Assump- tion 1 comes from0being a limit point of the eigenvalues of operator A~_x₁A_x₁ (and A_x₁A_x₁).

This shows that the minimum distance problem for any given x₁ 2 X¹ is ill-posed. The minimum distance estimator of '_x₁_;0 which minimizes the empirical counterpart of criterion Q_x₁_;₁('_x₁)over the set is not consistent w.r.t. the L²-normk:kL²(X2).

To conclude this section, let us further discuss the link between function'₀ and functions '_x₁_;0,x₁ 2 X¹. First, '₀ 2L²(X). Indeed, the setP := ' :'_x₁ 2 ; 8x₁ 2 X¹ is a subset ofL²(X), sincek'k²L²(X) =

Z

X1

'_x₁ ²_L₂₍

X2)dx₁. Second, Assumption 1 implies identi…cation of'₀ 2 P:Third, minimizingQ_x₁_;₁w.r.t.'_x₁ 2 for allx₁ 2 X¹is equivalent to minimizing the global criterion:

Q₁(') :=E ₀(Z)m('; Z)² =E Q_X₁_;₁('_X₁) ;

w.r.t. ' 2 P, where m('; z) := E[Y '(X)jZ =z] and 0(z) = _x₁_;0(z₁). Under As- sumptions B.3 (i) and B.6, ill-posedness of the minimum distance approach for'_x

1,x₁ 2 X¹, transfers by Lebesgue theorem to ill-posedness of the minimum distance approach for'. In- deed, the sequence '_n induced by (5) yieldsQ₁('_n)!0 and '_n9'₀ asn ! 1. Finally, Assumption 2 cannot hold for the conditional expectation operator of X given Z. Indeed, as discussed in DFR, this operator is not compact in the presence of exogenous regressors.

This explains why we work x₁ byx₁ as in HH to estimate '₀.

(14)

3 The Tikhonov Regularized (TiR) estimator

We address ill-posedness by Tikhonov regularization (Tikhonov (1963a,b); see Kress (1999), Chapter 16). We consider a penalized criterionL_x₁_;T('_x

1) := Q_x₁_;T '_x

1 + _x₁_;T '_x

1

2 H¹(X²), where Q_x₁_;T '_x₁ is an empirical counterpart of Q_x₁_;₁ '_x₁ de…ned by

Q_x₁_;T '_x₁ = Z

Z1

^x1(z₁) ^m_x₁ '_x₁; z₁ ²f^_Z₁_j_X₁(z₁jx₁)dz₁; (7) and ^

x1 is a sequence of positive functions converging in probability to x1;0. In (7) we estimate the conditional moment nonparametrically with

^

m_x₁ '_x₁; z₁ = Z

'_x₁(x₂) ^f_X₂_j_Z(x₂jz)dx₂ Z

yf^_Y_j_Z(yjz)dy=: A^_x₁'_x₁ (z₁) r^_x₁(z₁); wheref^_X₂_j_Z and f^_Y_j_Z denote kernel estimators of the density ofX₂ givenZ, andY given Z.

We use a common kernel K and two di¤erent bandwidths h_T for Y, X₂, Z₁, and h_x₁_;T for X1.

De…nition 1: The Tikhonov Regularized (TiR) minimum distance estimator for '_x₁_;0 is de…ned by

^

'_x₁ := arg inf

'_x

12 Lx1;T('_x₁); (8)

where x1;T >0 and x1;T !0, for any x₁ 2 X¹. The TiR estimator '^ for '₀ is de…ned by

^

'(x) := ^'_x₁(x2), x2 X:

To emphasize the di¤erence between '^_x₁ for a given x₁ 2 X¹, and ', we refer to the former^ as alocal estimator, and to the latter as a global estimator.

From the proof of Proposition 1 in CGS, we know that sequences '_x₁_;n such that Q_x₁_;₁('_x₁_;n) ! 0 and '_x₁_;n 9 '_x₁_;0 have the property lim sup

n!1 r'_x₁_{;n L}₂₍

X2) = 1. This

(15)

explains why we prefer in de…nition (8) to use a Sobolev penalty x1;T '_x₁ ²_H₁₍

X2) instead of anL²penalty x1;T '_x₁ ²_L₂₍

X2)to dampen the highly oscillating components in the estimated function. Without penalization, oscillations are unduly ampli…ed, since ill-posedness yields a criterionQ_x₁_;T('_x₁)asymptotically ‡at along some directions. The tuning parameter x1;T

in De…nition 1 controls for the amount of regularization, and how this depends on point x₁ and sample size T. Its rate of convergence to zero a¤ects the one of '^_x₁.

The TiR estimator admits a closed form expression. The objective function in (8) can be rewritten as (see Lemma A.2 (i) in Appendix 2)

L_x₁_;T('_x₁) =h'_x₁;A^_x₁A^_x₁'_x₁i^H¹⁽X2) 2h'_x₁;A^_x₁^r_x₁i^H¹⁽X2)+ _x₁_;Th'_x₁; '_x₁i^H¹⁽X2); (9)

up to a term independent of '_x₁. Operator A^_x₁ is given by A^_x₁ =D ¹Ae^_x₁; Ae^_x₁ (x₂) :=

Z

Z1

^x1(z₁) ^f_X₂_;Z₁_j_X₁(x₂; z₁jx₁) (z₁)dz₁; (10)

where D ¹ denotes the inverse of operator D : H₀²(X²) ! L²(X²) with D := 1

dX2

X

i=1

r²i

and H₀²(X²) = f 2H²(X²)j rⁱ (x₂) = 0 for x_2;i = 0; 1, and i= 1; :::; d_X₂g. The space H²(X²) is the Sobolev space of order 2, i.e., the completion of the linear space

2C²(X²)j r 2L²(X²);j j 2 w.r.t. the scalar product h 1; ₂iH²(X²) :=

X

j j 2

hr 1;r 2i^L²⁽X2). Operators A^_x₁ and Ae^_x₁ are the empirical counterparts of A_x₁ and A~_x₁, which are linked by A_x₁ = D ¹A~_x₁ (see Lemma A.1 in Appendix 2). The boundary conditions rⁱ (x₂) = 0 for x_2;i = 0; 1 and i = 1; :::; d_X₂; in the de…nition of H₀²(X²) are not restrictive since they concern the estimate '^_x₁, but not the true function '_x₁_;0. More precisely, we study in Propositions 1-4 below the properties of '^_x₁ in the L² and uniform

(16)

norms, and the properties of r'^_x₁ in the L² norm. These propositions hold independently whether'_x₁_;0 satisfy the boundary conditions or not (see also Kress (1999), Theorem 16.20).

From Lemma A.2 (ii), operator A^_x

1

A^_x₁ is compact, and hence T + ^A_x

1

A^_x₁ is invertible (Kress (1999), Theorem 3.4). Then, Criterion (9) admits a global minimum'^_x₁ onH¹(X²), which solves the …rst order condition

x1;T + ^A_x

1

A^_x₁ '_x₁ = ^A_x

1r^_x₁: (11)

This is an integro-di¤erential Fredholm equation of Type II (see e.g. Mammen, Linton and Nielsen (1999), Linton and Mammen (2005), Gagliardini and Gouriéroux (2007), Linton and Mammen (2008), and the survey by CFR for other examples). The transformation of the ill-posed problem (1) in the well-posed estimating equation (11) is induced by the penalty term involving the Sobolev norm. The TiR estimator of '_x₁_;0 is the explicit solution of Equation (11):

^ '_x

1 = _x₁_;T + ^A_x

1

A^_x₁ ¹A^_x

1^r_x₁: (12)

4 Consistency

Equation (12) can be rewritten as (see Appendix 3):

^

'_x₁ '_x₁_;0 = _x₁_;T +A_x₁A_x₁ ¹A_x₁^

x1+B^rx1;T + _x₁_;T +A_x₁A_x₁ ¹A_x₁ _x₁ +R^x¹^;T

=:V^x1;T +B^rx1;T +B^ex1;T +R^x1;T; (13)

(17)

where

^x1(z₁) :=

Z

(y '_x₁_;0(x₂))

f^_W;Z(w; z) Eh

f^_W;Z(w; z)i

f_Z(z) dw;

x1(z₁) :=

Z

(y '_x₁_;0(x₂)) Eh

f^_W;Z(w; z)i

f_W;Z(w; z)

f_Z(z) dw; (14)

and W := (Y; X₂) 2 W := Y X¹. In Equation (13) the …rst three terms V^x¹^;T, B^rx1;T := _x₁_;T +A_x₁A_x₁ ¹A_x₁A_x₁'_x₁_;0 '_x₁_;0 =: '_x₁_; '_x₁_;0; and B^ex1;T are the lead- ing terms asymptotically, while R^x1;T is a remainder term given in (26). The stochastic term V^x¹^;T has mean zero and contributes to the variance. The deterministic term Bx^e1;T

corresponds to kernel estimation bias. The deterministic termB^rx1;T corresponds to the regularization bias in the theory of Tikhonov regularization (Kress (1999), Groetsch (1984)).

Indeed, function'_x₁_; minimizes the penalized limit criterionQ_x₁_;₁ '_x₁ + _x₁_;T '_x₁ ²_H₁₍

X2)

w.r.t.'_x₁ 2 . Thus,Bx^r1;T is the asymptotic bias term arising from introducing the penalty

x1;T '_x

1

2

H¹(X²) in the criterion. To control B^rx1;T we introduce a source condition (see DFR).

Assumption 3: The function '_x₁_;0 satis…es X1

j=1

x1;j; '_x₁_;0 ²_H₁₍

X2) 2 x1

x1;j

<1 for x1 2(0;1].

As in the proof of Proposition 3.11 in CFR, Assumption 3 implies:

B^rx1;T H¹(X²) =O _x^x¹

1;T : (15)

By bounding the Sobolev norms of the other terms V^x1;T; B^ex1;T; and R^x1;T (see Appendix 3), we get the following consistency result. The relation a_T b_T, for positive sequences a_T and b_T, means thata_T=b_T is bounded away from 0and 1as T ! 1.

(18)

Proposition 1: Let the bandwidths hT T and hx1;T T ^x¹ and the regularization parameter x1;T T ^x¹ be such that:

>0; _x₁ >0; _x₁ >0; (16)

x1 +d_X₁ _x₁+ (d_Z₁ +d_X₂) <1; (17) and:

x1 <min m _x₁; m ;1 d_X₁ _x

1 maxfd_Z₁; d_X₂g

2 : (18)

where m 2 is the order of di¤erentiability of the joint density of (W; Z). Under Assump- tions 1-3, B.1-B.3, B.6, B.7 (i)-(ii): '^_x₁ '_x₁_;0 _H₁₍

X2) =op(1).

Proposition 1 shows that the powers _x₁, _x₁ , and need to be su¢ ciently small for large dimensions d_X₁, d_X₂, and d_Z and small order of di¤erentiability m to ensure consistency.

An analysis of _x₁, _x₁ , and close to the origin reveals that conditions (16)-(18) are not mutually exclusive, and that these conditions do not yield an empty region. Consistency of '^_x₁ in the Sobolev norm H¹(X²) implies consistency of both '^_x₁ and r'^_x₁ in the norm L²(X²). Lemma C.1 in CGS states that for any ' 2 H¹(X²), sup

x22X2

j'(x2)j 2k'k^H¹⁽X2). Hence we also get uniform consistency of '^_x₁, i.e. sup

x22X²j'^_x₁(x₂) '_x₁_;0(x₂)j = o_p(1), for a given x₁ 2 X¹.

Building on the bounds for terms V^x¹^;T; B^ex1;T; and R^x¹^;T in the proof of Proposition 1, we can further show uniform consistency of the global estimator '^ (and as a consequence in theL²(X)norm) if we introduce a strengthening of the source condition.

(19)

Assumption 3 bis: The function '₀ satis…es sup

x12X1

X1 j=1

x1;j; '_x₁_;0 ²_H₁₍

X2) 2 x1

x1;j

<1; for x1 2 (0;1], x₁ 2 X¹.

Assumption 3 bis implies:

sup

x12X1

Bx^r1;T 2

H¹(X2)=O sup

x12X1

2 x1

x1;T ; (19)

and we get the next uniform consistency result.

Proposition 2: Let the bandwidths hT T and hx1;T T ^x¹ and the regularization parameter x1;T T ^x¹ be such that:

>0; _x₁ "; _x₁ "; @ _x

1

@x₁ "

x1 +d_X₁ _x

1 + (d_Z₁ +d_X₂) 1 ";

and:

x1 min m _x₁; m ;1 d_X₁ _x₁ maxfd_Z₁; d_X₂g

2 ";

for some "; " >0 and any x₁ 2 X¹. Under Assumptions 1-3 bis, B.1-B.3, B.6, B.7 (i)-(ii):

sup

x12X1

^

'_x₁ '_x₁_;0

H¹(X2) =o_p(1).

Again from Lemma C.1 in CGS, Proposition 2 implies consistency of the global estimator

^

' in the sup-norm: sup_x_2Xj'(x)^ '₀(x)j =o_p(1). This in turn implies the L²-consistency k'^ '₀kL²(X)=op(1).

(20)

5 Mean Integrated Square Error

As in AC, Assumption 4.1, we assume the following choice of the weighting matrix.

Assumption 4: The asymptotic weighting matrix is 0(z) = V [Y '₀(X)jZ =z] ¹:

In a semiparametric setting, AC show that this choice of the weighting matrix yields e¢ - cient estimators of the …nite-dimensional component. Here, Assumption 4 is used to derive the exact asymptotic expansion of the MISE of the TiR estimator provided in the next proposition.

Proposition 3: Under Assumptions 1-4, Assumptions B, the conditions (16)-(18) and 1

T h^d_x^X¹

1;Th^d_T^Z¹^+d^X²

+h^2m_x₁_;T+h^2m_T =o( _x₁_;Tb( _x₁_;T; h_x₁_;T)); h_Th^m_x₁_;T¹+h^m_T p

x1;T

=o(b( _x₁_;T; h_x₁_;T)); (20) the MISE of '^_x₁ is given by

Eh

^

'_x₁ '_x₁_;0 ²_L₂₍

X2)

i

=M_x₁_;T( _x₁_;T; h_x₁_;T)(1 +o(1)); (21)

where

M_x₁_;T( _x₁_;T; h_x₁_;T) := 1 T h^d_x^X₁_;T¹

2

x1( _x₁_;T) +b_x₁( _x₁_;T; h_x₁_;T)²; (22) and:

2

x1( _x₁_;T) :=!²f_X₁(x₁) X1

j=1

x1;j

( _x₁_;T + _x₁_;j)² ^x¹^;j

2 L²(X2); b_x₁( _x₁_;T; h_x₁_;T) := Bx^r1;T +h^m_x

1;T x1;T +A_x

1A_x₁ ¹A_x

1 x1

L²(X2)

;

(21)

with !² = Z

K(x₁)²dx₁ and

x1(z₁) := 1 m!

X

j j=m

Z

(y '_x

1;0(x₂))rX1f_W;Z(w; z) f_Z(z) dw:

The asymptotic expansion (22) of the MISE consists of one bias component and one variance component which we comment on.

(i) The bias function bx1( x1;T; hx1;T) is the L² norm of the sum of two contributions, namely the Tikhonov regularization biasB^rx1;T and functionh^m_x₁_;T _x₁_;T +A_x₁A_x₁ ¹A_x₁ _x₁. The latter contribution corresponds to a population Tikhonov regression applied to function h^m_x₁_;T x1. Functionh^m_x₁_;T x1 arises from smoothing the exogenous regressorsX1and is derived by a standard Taylor expansion w.r.t. X₁ of the kernel estimation bias Eh

f^_W;Z(w; z)i f_W;Z(w; z)in B^ex1;T (see (14)).

(ii) The variance term is Vx1;T := 1 T h^d_x^X¹

1;T 2

x1( x1;T): The ratio 1= T h^d_x^X₁_;T¹ and the mul- tiplicative factor !²f_X₁(x₁) are standard for kernel regression in dimension d_X₁ and are induced by smoothingX₁. The coe¢ cient ²_x₁( _x₁_;T)involves a weighted sum of the regularized inverse eigenvalues x1;j=( x1;T + x1;j)² of operator A_x₁Ax1, with weights _x₁_;j ²_L2(X2)

(since x1;j=( _x₁_;T + _x₁_;j)² _x₁_;j, the in…nite sum converges under Assumption B.8 (ii) in Appendix 1). To have an interpretation, note that the inverse of operator A_x

1A_x₁ corresponds to the standard asymptotic variance matrix QXZV₀ ¹QZX

1 of the 2-Stage Least Square (2SLS) estimator of the …nite-dimensional parameter in the instrumental regression Y =X⁰ +U withE[UjZ] = 0, whereQ_ZX =E ZX⁰ andV₀ =V U²ZZ⁰ . In the ill-posed