Efficient estimation with many weak instruments using regularization techniques

Texte intégral

(1)2013s-21. Efficient estimation with many weak instruments using regularization techniques Marine Carrasco, Guy Tchuente. Série Scientifique Scientific Series. Montréal Juillet 2013. © 2013 Marine Carrasco, Guy Tchuente. Tous droits réservés. All rights reserved. Reproduction partielle permise avec citation du document source, incluant la notice ©. Short sections may be quoted without explicit permission, if full credit, including © notice, is given to the source..

(2) CIRANO Le CIRANO est un organisme sans but lucratif constitué en vertu de la Loi des compagnies du Québec. Le financement de son infrastructure et de ses activités de recherche provient des cotisations de ses organisations-membres, d’une subvention d’infrastructure du Ministère du Développement économique et régional et de la Recherche, de même que des subventions et mandats obtenus par ses équipes de recherche. CIRANO is a private non-profit organization incorporated under the Québec Companies Act. Its infrastructure and research activities are funded through fees paid by member organizations, an infrastructure grant from the Ministère du Développement économique et régional et de la Recherche, and grants and research mandates obtained by its research teams. Les partenaires du CIRANO Partenaire majeur Ministère de l'Enseignement supérieur, de la Recherche, de la Science et de la Technologie Partenaires corporatifs Autorité des marchés financiers Banque de développement du Canada Banque du Canada Banque Laurentienne du Canada Banque Nationale du Canada Banque Scotia Bell Canada BMO Groupe financier Caisse de dépôt et placement du Québec Fédération des caisses Desjardins du Québec Financière Sun Life, Québec Gaz Métro Hydro-Québec Industrie Canada Investissements PSP Ministère des Finances et de l’Économie Power Corporation du Canada Rio Tinto Alcan State Street Global Advisors Transat A.T. Ville de Montréal Partenaires universitaires École Polytechnique de Montréal École de technologie supérieure (ÉTS) HEC Montréal Institut national de la recherche scientifique (INRS) McGill University Université Concordia Université de Montréal Université de Sherbrooke Université du Québec Université du Québec à Montréal Université Laval Le CIRANO collabore avec de nombreux centres et chaires de recherche universitaires dont on peut consulter la liste sur son site web. Les cahiers de la série scientifique (CS) visent à rendre accessibles des résultats de recherche effectuée au CIRANO afin de susciter échanges et commentaires. Ces cahiers sont écrits dans le style des publications scientifiques. Les idées et les opinions émises sont sous l’unique responsabilité des auteurs et ne représentent pas nécessairement les positions du CIRANO ou de ses partenaires. This paper presents research carried out at CIRANO and aims at encouraging discussion and comment. The observations and viewpoints expressed are the sole responsibility of the authors. They do not necessarily represent positions of CIRANO or its partners.. ISSN 1198-8177. Partenaire financier.

(3) Efficient estimation with many weak instruments using regularization techniques * Marine Carrasco †, Guy Tchuente ‡. Résumé / Abstract The problem of weak instruments is due to a very small concentration parameter. To boost the concentration parameter, we propose to increase the number of instruments to a large number or even up to a continuum. However, in finite samples, the inclusion of an excessive number of moments may be harmful. To address this issue, we use regularization techniques as in Carrasco (2012) and Carrasco and Tchuente (2013). We show that normalized regularized 2SLS and LIML are consistent and asymptotically normally distributed. Moreover, their asymptotic variances reach the semiparametric efficiency bound unlike most competing estimators. Our simulations show that the leading regularized estimators (LF and T of LIML) work very well (is nearly median unbiased) even in the case of relatively weak instruments. Mots clés/Keywords : Many weak instruments, LIML, 2SLS, regularization methods.. *. Carrasco gratefully acknowledges financial support from SSHRC. University of Montreal, CIREQ, CIRANO, email: [email protected]. ‡ University of Montreal, CIREQ, email: [email protected]. †.

(4) 1. Introduction. The problem of weak instruments or weak identi…cation has recently received considerable attention in both theoretical and applied econometrics1 . Empirical examples include Angrist and Krueger (1991) who measure return to schooling, Eichenbaum, Hansen, and Singleton (1988) who consider consumption asset pricing models. Theoretical literature on weak instruments includes papers by Staiger and Stock (1997), Zivot, Startz, and Nelson (1998), Guggenberger and Smith (2005), Chao and Swanson (2005), Han and Phillips (2006), Hansen, Hausman, and Newey (2008), and Newey and Windmeijer (2009) among others2 . Staiger and Stock (1997) proposed an asymptotic framework with local-to-zero parametrization of the coe¢ cients of the instruments in the …rst-stage regression. They show that with the number of instruments …xed, the two-stage least squares (2SLS) and limited information maximum likelihood (LIML) estimators are not consistent and converge to nonstandard distributions. Subsequent work focused on situations where the number of instruments is large, using an asymptotic framework that lets the number of instruments go to in…nity as a function of sample size. In these settings, the use of many moments can improve estimator accuracy. Unfortunately, usual Gaussian asymptotic approximation can be poor and IV estimators can be biased. Carrasco (2012) and Carrasco and Tchuente (2013) proposed respectively regularized versions of 2SLS and LIML estimators for many strong instruments. The regularization permits to address the singularity of the covariance matrix resulting from many instruments. These papers use three regularization methods borrowed from inverse problem literature. The …rst estimator is based on Tikhonov (ridge) regularization, the second estimator is based on an iterative method called Landweber-Fridman (LF), the third estimator is based on the principal components associated with the largest eigenvalues. We extend these previous works to allow for the presence of a 1. Hahn and Hausman (2003) de…ne weak instruments, by two features: (i) two-stage least squares (2SLS) analysis is badly biased toward the ordinary least-squares (OLS) estimate, and alternative unbiased estimators such as limited-information maximum likelihood (LIML) may not solve the problem; and (ii) the standard (…rst-order) asymptotic distribution does not give an accurate framework for inference. Weak instrument may also be an important cause of the …nding that the often-used test of over identifying restrictions (OID test) rejects too often. 2 Section 4 discusses related literature in more details.. 2.

(5) large number of weak instruments or weak identi…cation. We consider a linear model with homoskedastic error and allow for weak identi…cation as in Hansen, Hausman, and Newey (2008) and Newey and Windmeijer (2009). This speci…cation helps us to have di¤erent types of weak instruments sequences, including the many instruments sequence of Bekker (1994) and the many weak instruments sequence of Chao and Swanson (2005). We impose no condition on the number of moment conditions since our framework allows for an in…nite countable or even a continuum of instruments. The advantage of regularization is that all available moments can be used without discarding any a priori. We show that regularized 2SLS and LIML are consistent in the presence of many weak instruments. If properly normalized, the regularized 2SLS and LIML are asymptotically normal and reach the semiparametric e¢ ciency bounds. Therefore, their asymptotic variance is smaller than that of Hansen, Hausman, and Newey (2008) and Newey and Windmeijer (2009). All these methods involve a regularization parameter, which is the counterpart of the smoothing parameter that appears in the nonparametric literature. A data driven method was developed in Carrasco (2012) and Carrasco and Tchuente (2013) to select the best regularization parameter when the instruments are strong. We use these methods in our simulations for selecting the regularization parameter when the instruments are weak but we do not prove that these methods are valid in this case. A Monte-carlo experiment shows that the leading regularized estimators (LF and T LIML) perform very well (are nearly median unbiased) even in the case of weak instruments. The paper is organized as follows. Section 2 introduces four regularization methods we consider and the associated estimators. Section 3 derives the asymptotic properties of the estimators. Section 4 discusses e¢ ciency and related results. Section 5 presents Monte Carlo experiments. Section 6 concludes. The proofs are collected in Appendix.. 3.

(6) 2. Presentation of the regularized 2SLS and LIML estimators. This section presents the weak instruments setup used in this paper and the estimators. Estimators studied here are the regularized 2SLS and LIML estimators introduced in Carrasco (2012) and Carrasco and Tchuente (2013). They can be used with many or even a continuum of instruments. This work extends previous works by allowing for weak instruments as in Hansen, Hausman, and Newey (2008). Our model is inspired from Hausman, Newey, Woutersen, Chao, and Swanson (2012). The model is. 8 < yi = W 0 i : W = i. i = 1; 2; ::::; n. The vector of interest. 0. E(ui jxi ) = E("i jxi ) = 0; E("2i jxi ) =. is p 2 ".. 0. + "i ;. i. + ui ; 1.. yi is a scalar and xi is a vector of exogenous. variables. Some rows of Wi may be exogenous, with the corresponding rows of ui being zero.. i. = E(Wi jxi ) is a p. 1 vector of reduced form values with E( i "i ) = 0.. i. is the optimal instrument which is typically unknown. The estimation is based on a sequence of instruments Zi = Z( ; xi ).. 2 S may be an integer or an index taking its. values in an interval. Examples of Zi are the following. - S = f1; 2; ::::Lg thus we have L instruments. - Zij = (xi )j. 1. with j 2 S = N in…nite countable instruments.. - Zi = Z( ; xi ) = exp(i 0 xi ) where Let. 2 S = Rdim(xi ) continuum of moment. be a positive measure on S. We denote L2 ( ) the Hilbert space of square. integrable functions with respect to . This model allows for model also allows for i. i. i. to be a linear or a non linear combination of Zi . The. to approximate the reduced form. For example, we could let. be a vector of unknown functions of a vector xi of underlying instruments. The estimate. is based on the orthogonality condition. E[(yi. Wi0 )Zi ] = 0:. 4.

(7) The extension of the generalized method of moments described in Carrasco and Florens (2000), Carrasco and Florens (2012), and Carrasco (2012) gives the regularized 2SLS. 1 1 0 0 u01 W10 C C B B C B 0 C B B u2 C B W20 C C B C B C C B B Let W = B : C n p and u = B : C n p. C B C B C B C B B : C B : C A @ A @ 0 0 un Wn We de…ne the covariance operator K of the instruments as K : L2 ( ) ! L2 ( ) (Kg)( 1 ) =. Z. E(Z( 1 ; xi )Z( 2 ; xi ))g( 2 ) ( 2 )d. 2. where Z( 2 ; xi ) denotes the complex conjugate of Z( 2 ; xi ). K is assumed to be a Hilbert-Schmidt operator. Let. j. an. j;. j = 1; 2; ::: be re-. spectively, the eigenvalues (ranked in decreasing order) and orthonormal eigenfunctions of K. K can be estimated by Kn de…ned as: Kn : L2 ( ) ! L2 ( ) (Kn g)( 1 ) =. Z. n. 1X Z( 1 ; xi )Z( 2 ; xi )g( 2 ) ( 2 )d 2 : n i=1. If the number of moment conditions is in…nite then the inverse of Kn needs to be regularized because it is not continuous. By de…nition (see Kress, 1999, page 269), a regularized inverse of an operator K is R : L2 ( ) ! L2 ( ) such that lim R K' = ', 8 ' 2 L2 ( ): !0. Three di¤erent types of regularization schemes are considered: Tikhonov (T), Landwerber Fridman (LF), Spectral cut-o¤ (SC) or Principal Components (PC). They are de…ned as follows:. 5.

(8) 1. Tikhonov(T) This regularization scheme is related to the ridge regression.. 1. (K ) (K ). 1. = (K 2 + I). r=. 1 X j=1. where. j 2 j. 1. K. r;. +. j. j. > 0 is the regularization parameter. A …xed. would result in a loss of. e¢ ciency. For the estimator to be asymptotically e¢ cient,. has to go to zero at. a certain rate which will be determined later on. 2. Landweber Fridman (LF) This method of regularization is iterative. Let 0 < c < 1=kKk2 where kKk is the largest eigenvalue of K (which can be estimated by the largest eigenvalue of Kn ). ' ^ = (K ). 1. r is computed using the following algorithm: 8 < ' ^ l = (1. where. 1. cK 2 )^ 'l. 1. + cKr; l=1,2,...,. 1. 1;. : ' ^ 0 = cKr; 1 is some positive integer. We also have. (K ). 1. r=. 1 X [1. (1. c. 2 1 j) ]. r;. j. j=1. j. j:. 3. Spectral cut-o¤ (SC) This method consists in selecting the eigenfunctions associated with the eigenvalues greater than some threshold. The aim is to select those who have greater contribution. (K ). 1. r=. X 1 2 j. for. j. r;. j. j. > 0:. This method is similar to principal components (PC) which consists in using the. 6.

(9) …rst eigenfunctions: (K ). 1. 1= X 1. r=. where. 1. r;. j. j=1. j. j. is some positive integer. It is equivalent to projecting on the …rst. principal components of W . Interestingly, this approach is used in factor models where Wi is assumed to depend on a …nite number of factors (see Bai and Ng (2002), Stock and Watson (2002)) As the estimators based on PC and SC are identical, we will use PC and SC interchangeably. These regularized inverses can be rewritten in common notation as: (K ). 1. r=. 1 X q( ;. for LF: q( ;. 2 j). for SC: q( ;. 2 j). 2 j). =. 2 j. + (1 c. = [1 = I(. 2 j. 2 j. r;. j. j=1. where for T: q( ;. 2 j). j. j. ; 2 1= j). ],. ), for PC q( ;. 2 j). = I(j. 1= ).. In order to compute the inverse of Kn we have to choose the regularization parameter . Let (Kn ). 1. be the regularized inverse of Kn and P. as in Carrasco (2012) by P = T (Kn ). 1. T where. T : L2 ( ) ! Rn 0. Z1 ; g. B B B Z2 ; g B B Tg = B : B B B : @ Zn ; g. 1 C C C C C C C C C A. and T : Rn ! L2 ( ). 7. an. n matrix de…ned.

(10) n. T v=. 1X Zi vi n i=1. Zi ; Zj . n ::: > 0, j = 1; 2; ::: be the orthonormalized eigenfunctions and. such that Kn = T T and T T Let ^ j , ^ 1. ^2. is an n. n matrix with typical element. eigenvalues of Kn : ^ j are consistent estimators of j the eigenvalues of T T . We then p p ^ have T ^ j = j j and T j j. j = 1 X For v 2 Rn , P v = q( ; 2j ) v; j j : It follows that for any vectors v and w of j=1. Rn :. v 0 P w = v 0 T (Kn ) *. 1. T w n X 1=2 (Kn ) Zi (:) vi ; (Kn ). =. i=1. n. X 1=2 1 Zi (:) wi n i=1. +. :. (1). Consider regularized k-class estimators de…ned as follows: ^ = (W 0 (P. In )W ). 1. W 0 (P. In )y:. where. is either a constant term or a random variable. We are interested in the case. where. = 0 corresponding to regularized 2SLS studied in Carrasco (2012): ^ = (W 0 P W ). 1. W 0P y. (y W )0 P (y W ) corresponding to the regularized (y W )0 (y W ) LIML studied in Carrasco and Tchuente (2013). We denote ^ the 2SLS estimators and and in the case. =. = min. ^L the LIML estimators. We study both 2SLS and LIML because LIML may have some advantages over 2SLS. For example when the number of instruments, L, increases with the sample size, n, so that L=n ! c (with c constant), the standard 2SLS estimator is not consistent whereas standard LIML estimator is consistent.. 8.

(11) 3. Asymptotic properties. In Carrasco (2012) and Carrasco and Tchuente (2013), modi…ed IV estimators based on di¤erent regularization schemes were proposed. These estimators improved the small sample properties of IV estimators when the number of instruments is very large or in…nite. The focus was on strong instruments. They found that regularized 2SLS and LIML estimators are asymptotically normal and attain the semiparametric e¢ ciency bound. We extend Carrasco (2012) and Carrasco and Tchuente (2013) results to the case of weak instruments. Assumption 1: i) There exists a p. p matrix Sn = S~n diag(. 1n ; :::;. pn ). such that S~n is bounded,. the smallest eigenvalue of S~n S~n0 is bounded away from zero; for each j, either p p n (strong identi…cation) or jn = n ! 0 (weak identi…cation), jn = n. = min. 1 j p. jn. ! 1 and. ! 0.. p ii) There exists a function fi = f (xi ) such that i = Sn fi = n and n Sn 1 ! S0 : n n X X 4 2 kfi k =n ! 0, fi fi0 =n is bounded and uniformly nonsingular. i=1. i=1. These conditions allow for many weak instruments. If. jn. =. p. n this leads to. asymptotic theory like in Kunitomo (1980), Morimune (1983), and Bekker (1994), but here we use regularization parameter instead of having an increasing sequence of p instruments. For 2n growing slower than n the convergence rate will be slower that n, leading to an asymptotic approximation like that of Chao and Swanson (2005). This is the case where we have many instruments without strong identi…cation. Assumption 1 also allows for some components of the reduced form to give only weak identi…cation p (corresponding to jn = n ! 0 which allows the concentration parameter to grow p p slower than n), and other components (corresponding to jn = n) to give strong identi…cation for some coe¢ cients of the reduced form. In particular, this condition allows for …xed constant coe¢ cients in the reduced form. This speci…cation of weak instruments can also be viewed as a generalization of Chao and Swanson (2007). This speci…cation of weak identi…cation is di¤erent of Antoine and Lavergne (2012) whose identi…cation strength is directly de…ned through the conditional moments that ‡atten. 9.

(12) as the sample size increases. To illustrate Assumption 1 let us consider the following example.. 0. 1. Example 1: Assume that p = 2, S~n = @. 0. 21. 1. 0 0 0 Then for f (xi ) = (f1i ; f2i ) the reduced form is. i. We also have. 0. B =@. n Sn. 1. f1i n 21 f1i + p f2i n. 0. 0. ! S0 = @. 1. A, and. n. j=1. .. j=2. 1. C A: 0. 21. jn. 8 p < n; = : ;. 1. 1. A:. Proposition 1. (Asymptotic properties of regularized 2SLS with many weak instruments) Assume fyi ; Wi ; xi g are iid, E("2i jX) = goes to in…nity. Moreover,. a. 2 ",. K is compact,. goes to zero and n. belongs to the closure of the linear span of fZ(:; x)g. for a = 1; :::; p; E(Z(:; xi )fia ) belong to the range of K and Assumption 1 is satis…ed. Then, the T, LF, and SC estimators of 2SLS satisfy: 1. Consistency: Sn0 (^ This implies (^. 0 )= n 0). ! 0 in probability as n, n. 1 2. and. n. go to in…nity.. ! 0.. 2. Asymptotic normality: moreover, if each element of E(Z(:; xi )Wi ) belongs to the range of K, then Sn0 (^ as n, n. 0). d. ! N 0;. and. n. 2 ". E(fi fi0 ). 1. go to in…nity; where E(fi fi0 ) is the p. p matrix.. Proof In Appendix. This result shows that the three estimators have the same asymptotic distribution. Instead of restricting the number of instruments, we have a regularization parameter which goes to zero. This insures us that all available and valid instruments are used in an e¢ cient way even if there are weak. To obtain consistency, the condition on n. 1 2. is. go in…nity, whereas for the asymptotic normality, we need n go to in…nity. This. means that. is allowed to go to zero at a slower rate. However this rate does not. depend on the weakness of the instruments.. 10.

(13) Interestingly, our regularized 2SLS estimators reach the semiparametric e¢ ciency bound. This result will be further discussed in Section 4. We are now going derive the asymptotic properties of the regularized LIML with many weak instruments. Proposition 2. (Asymptotic properties of regularized LIML with many weak instruments) Assume fyi ; Wi ; xi g are iid, E("2i jX) = compact,. 2 ",. E "4i jX < 1; E u4bi jX < 1; K is. goes to zero and n goes to in…nity. Moreover,. a. belongs to the closure of. the linear span of fZ(:; x)g for a = 1,..., p, E(Z(:; xi )fia ) belong to the range of K and Assumption 1 is satis…ed. Then, the T, LF, and SC estimators of LIML with weak instruments satisfy: 1. Consistency: Sn0 (^L This implies (^. 0). 0 )= n. ! 0 in probability as n,. n. 2 n. and. go to in…nity.. ! 0.. 2. Asymptotic normality: moreover, if each element of E(Z(:; xi )Wi ) belongs to the range of K, then Sn0 (^L as n,. 0) n. d. ! N 0;. and. 2 n. 2 ". E(fi fi0 ). 1. go to in…nity where E(fi fi0 ) is the p. p matrix.. Proof In Appendix. Interestingly we obtain the same asymptotic distribution as in the many strong instruments case (with a slower rate of convergence). We also …nd the same speed of convergence as in Newey and Windmeijer (2009) and Hansen, Hausman, and Newey (2008). For the consistency and asymptotic normality,. 2 n. needs to go to in…nity,. which means that the regularization parameter should go to zero at a slower rate than the concentration parameter. The asymptotic variance of regularized LIML corresponds to the lower bound and is smaller than that obtained in Hansen, Hausman, and Newey (2008). Example 1:(cont) 0 p n[(^1 Sn0 (^ 0) = @. The linear combination (^1. 01 ) +. ^ 21 ( 2. 02 )]. ^ n( 2. 02 ) ^ 01 ) + 21 ( 2. 11. 02 ). 1. A is jointly asymptotically normal.. converges at rate. p. n. This is the co-.

(14) e¢ cient of fi1 in the reduced form equation for yi . And the estimator of the coe¢ cients 1 . 2 of Wi2 variables converges at rate n. Now as in Newey and Windmeijer (2009) we consider a t-ratios for a linear combination c0 of the parameter of interest. The following proposition is a corollary of Proposition 2. Proposition 3. Under the assumptions of Proposition 2 and if we assume that there exist rn , c and c 6= 0 such that rn Sn 1 c ! c and Sn0 ^ Sn =n ! =. 2 ". 1. E(fi fi0 ). in probability with. .. Then,. as n and. 2 n. c0 (^L 0) d p ! N (0; 1) c0 ^ c. go to in…nity.. This result allows us to form con…dence intervals and test statistics for a single linear combination of parameters in the usual way.. 4. E¢ ciency and Related Literature. 4.1. E¢ ciency. If the optimal instrument. were known, the estimator would be solution of. i. n. 1X n. i. Wi0. yi. = 0:. i=1. Hence,. ^ =. n X. 0 i Wi. i=1. ^. 0. =. n X. 0 i Wi. i=1. =. Sn. Pn. !. !. 1 n X. 1 n X. 0 i=1 fi fi. n. i yi ;. i=1. i "i. i=1. Sn0. Pn fi ui p + Sn i=1 n. 12. 1. Pn fi "i p Sn i=1 ; n.

(15) Sn ^. 0 i=1 fi fi. =. 0. Pn. d. n. ! N 0;. 2 ". Pn fi ui 0 p + i=1 Sn n 1. E fi fi0. 1 1. Pn fi "i i=1 p n. :. So the lowest asymptotic variance that can be obtained is. 2 ". E fi fi0. 1. : We will. refer to this as the semiparametric e¢ ciency bound. Now, if the. i. is unknown but estimated by a consistent nonparametric estimator. ^ i . The resulting estimator n X. ^=. ^ i Wi0. i=1. will satisfy Sn ^. d. 0. ! N 0;. 2 ". !. E fi fi0. 1 n X. ^ i yi ;. i=1. 1. regardless of the convergence rate of. ^ i . This can be proved using the same arguments as in Newey (1990). In Carrasco (2012, Section 2.4), it was shown that the regularized 2SLS estimator coincides with a 2SLS estimator that uses a speci…c nonparametric estimator of This explains why for the regularized 2SLS estimator, the conditions on related to. n:. i:. are not. On the contrary, LIML can not written as such a 2SLS estimator. This. may explain why in the case of LIML, the rate of convergence of. depends on how. weak the instruments are.. 4.2. Related Literature. In the literature on many weak instruments the asymptotic behavior of estimator depends on the relation between the number of moment conditions and sample size. For the CUE, the number of moments L and the sample size n needs to satisfy L2 =n ! 0 to. have consistency. Asymptotic normality, requires L3 =n ! 0. Under homoskedasticity, Stock and Yogo (2005) require L2 =n ! 0. Hansen, Hausman, and Newey (2008) al-. lowed L to grow at the same rate as n, but restricted L to grow slower than the square of the concentration parameter, for the consistency of LIML and FULL. Andrews and Stock (2006) require L3 =n ! 0 when normality is not imposed. Caner and Yildiz (2012) in a recent work consider a Continuous Updating Estimator (CUE) with many weak moments under nearly singular design. They show that the. 13.

(16) nearly singular design a¤ects the form of asymptotic covariance matrix of the estimator compared to that of Newey and Windmeijer (2009). Our work is also related to Hausman, Lewis, Menzel, and Newey (2011) who propose the Regularized CUE (RCUE) as a solution in presence of many weak instruments. The RCUE solves a modi…cation of the …rst-order conditions for the CUE estimator and is shown to be asymptotically equivalent to CUE under many weak moments asymptotics. Belloni, Chen, Chernozhukov, and Hansen (2012) propose to use an alternative regularization named lasso in the IV context. This regularization imposes a l1 type penalty on the …rst stage coe¢ cient. Assuming that the …rst stage equation is approximately sparse, they show that the postlasso estimator reaches the asymptotic e¢ ciency bound. Just as 2SLS is not consistent if L is too large relative to n, LIML estimator is not feasible if L > N because the matrix Z 0 Z is not invertible. Therefore, some form of regularization needs to be implemented to obtain consistent estimators when the number of instruments is really large. Regularization has also the advantage to deliver an asymptotically e¢ cient estimator. Table 1 gives an overview of the assumptions used in the main papers on many weak instruments.. Table 1: Comparison of di¤erent IV asymptotics Conventional Phillips (1989) Staiger and Stock (1997) Bekker (1994) Han and Phillips (2006). Chao and Swanson (2005). Number of instruments Fixed L Fixed L, Cov(W; x) = 0 Fixed L, Cov(W; x) = O(n L=n ! c < 1; 2n = O(n) L L ! 1 and !c ncn cn n constant or zero L L1=2 ! 0 or !0 2 2 n. L. n. L. 1=2. ). Hansen et al. (2008). (I). Newey and Windmeijer (2009). L ! 1 and. Carrasco (2012). No condition on L, possibly continuum strong instruments log(L) = o(n1=3 ); strong instruments. Belloni et al. (2012). 2 n. bounded or (II). Extra assumptions. L 2 n. 14. 2 n. !1. bounded. X. zi zi0 =n nonsingular. Compactness of covariance matrix Approximately sparse …rst stage equation.

(17) 5. Monte Carlo study. We now carry out a Monte Carlo simulation for the simple linear IV model where the disturbances and instruments have a Gaussian distribution as in Newey and Windmeijer (2009). The parameters of this experiment are the correlation coe¢ cient. between. the structural and reduced form errors, the concentration parameter (CP ), and the number of instruments L The data generating process is given by: yi = Wi0. 0. + "i ;. Wi = x0i + ui ; "i = u i + ui. where. L. N (0; 1); vi. p. 1. 2v ; i. N (0; 1); xi r CP = L Ln. N (0; IL ). is an L-vector of ones. The sample size is n = 500; the number of instruments. L equals to 15 and 30; value of. = 0:5;. 0. = 0:1; values of CP equals to 4, 35 or 65.. The estimators we proposed in this paper depend on a regularization (smoothing) parameter for selecting. that needs to be selected. In the simulations, we use a data-driven method based on an expansion of the MSE and proposed in Carrasco (2012) and. Carrasco and Tchuente (2013). These selection criteria were derived assuming strong instruments and may not be valid in presence of weak instruments. Providing a robust to weak instruments selection procedure is beyond the scope of this paper. We report the median bias (Med.bias), the median of the absolute deviation of the estimator from the true value (Med.abs), the di¤erence between the 0.1 and 0.9 quantiles(dis) of the distribution of each estimator, and the coverage rate(Cov.) of a nominal 95% con…dence interval for unfeasible instrumental variable regression(IV), regularized two-stage least squares (T2SLS (Tikhonov), L2SLS (Landweber Fridman), P2SLS (Principal component)), LIML and regularized LIML (TLIML (Tikhonov), LLIML (Landweber Fridman), PLIML(Principal component)) and Donald and Newey’s (2001). 15.

(18) 2SLS and LIML (D2SLS and DLIML). For con…dence intervals we compute the coverage probabilities using the following estimate of asymptotic variance as in Donald and Newey (2001) and Carrasco (2012). (y V^ (^) =. W ^)0 (y n. W ^). ^ 0W W. 1. 1. ^ 0W ^ W 0W ^ W. 1. ^ = P W. where W. Table 2: Simulations results with Cp = 4, 8, 35 and 65 ; n = 500; L = 15 and 30; 1000 replications. L= 15. Cp=4. Cp=8. C p= 35. C p= 65. L= 30. Cp=4. Cp=8. C p= 35. C p= 65. M e d .b ia s M e d .a b s D isp C ov M e d .b ia s M e d .a b s D isp C ov M e d .b ia s M e d .a b s D isp C ov M e d .b ia s M e d .a b s D isp C ov. T 2SLS 0 .3 9 1 8 0 .3 9 2 5 0 .5 5 9 0 0 .5 0 4 0 0 .3 2 4 8 0 .3 2 6 5 0 .5 0 4 4 0 .5 6 3 0 0 .1 4 2 1 0 .1 5 1 9 0 .3 4 2 6 0 .7 6 8 0 0 .0 8 7 6 0 .1 0 1 2 0 .2 7 9 6 0 .8 3 5 0. L2SLS 0 .3 8 7 6 0 .3 8 8 3 0 .5 8 0 6 0 .5 3 3 0 0 .3 1 8 9 0 .3 2 0 5 0 .5 3 8 8 0 .5 9 0 0 0 .1 4 0 9 0 .1 5 2 4 0 .3 6 5 6 0 .7 8 1 0 0 .0 8 6 5 0 .1 0 5 1 0 .2 9 3 2 0 .8 3 0 0. P2SLS 0 .3 5 6 7 0 .5 3 3 4 2 .6 3 2 8 0 .7 7 5 0 0 .3 0 3 6 0 .3 9 1 7 1 .5 8 5 4 0 .7 4 2 0 0 .1 5 8 4 0 .1 9 3 8 0 .4 9 7 4 0 .7 7 7 0 0 .0 9 5 3 0 .1 2 2 2 0 .3 3 5 9 0 .8 3 4 0. D 2SLS 0 .4 0 9 3 0 .5 6 3 3 2 .4 6 3 9 0 .7 7 5 0 0 .3 3 8 5 0 .4 0 8 0 1 .7 1 6 7 0 .7 3 8 0 0 .1 9 4 0 0 .2 1 2 3 0 .4 2 9 7 0 .7 4 3 0 0 .1 1 9 8 0 .1 3 4 3 0 .3 0 3 9 0 .8 1 2 0. T L IM L 0 .1 4 4 3 0 .5 6 4 9 3 .2 2 9 4 0 .4 5 0 0 0 .0 3 8 1 0 .3 7 2 2 1 .9 1 9 3 0 .5 6 9 0 0 .0 0 0 3 0 .1 3 3 7 0 .5 3 4 8 0 .8 5 8 0 -0 .0 0 1 0 0 .0 9 2 7 0 .3 4 8 2 0 .9 0 2 0. L L IM L 0 .1 3 7 8 0 .5 6 4 4 2 .9 8 8 9 0 .4 6 6 0 0 .0 3 8 3 0 .3 6 9 8 1 .9 0 1 2 0 .5 7 9 0 0 .0 0 1 3 0 .1 3 5 7 0 .5 4 4 8 0 .8 6 8 0 0 .0 0 2 5 0 .0 9 0 5 0 .3 5 0 5 0 .8 9 6 0. P L IM L 0 .3 4 9 7 0 .7 0 2 1 3 .5 7 8 3 0 .9 0 1 0 0 .2 4 7 1 0 .6 0 8 7 3 .1 5 3 4 0 .9 2 4 0 0 .0 6 6 9 0 .2 7 3 4 1 .4 2 6 8 0 .9 3 4 0 0 .0 2 9 5 0 .1 5 3 3 0 .7 4 7 8 0 .9 3 3 0. D L IM L 0 .3 9 0 7 0 .5 2 0 8 1 .8 4 7 4 0 .7 1 3 0 0 .2 8 5 6 0 .3 8 6 4 1 .3 3 9 5 0 .7 2 1 0 0 .0 3 8 3 0 .1 3 1 6 0 .4 9 0 7 0 .8 6 4 0 0 .0 0 6 1 0 .0 9 2 0 0 .3 4 8 3 0 .8 9 8 0. IV 0 .0 2 1 2 0 .3 4 7 6 1 .5 4 0 7 0 .9 5 2 0 0 .0 0 3 1 0 .2 5 0 6 1 .0 2 1 0 0 .9 5 0 0 0 .0 0 1 2 0 .1 2 1 4 0 .4 5 5 1 0 .9 5 4 0 0 .0 0 0 9 0 .0 8 8 1 0 .3 2 4 0 0 .9 5 5 0. M e d .b ia s M e d .a b s D isp C ov M e d .b ia s M e d .a b s D isp C ov M e d .b ia s M e d .a b s D isp C ov M e d .b ia s M e d .a b s D isp C ov. 0 .4 4 9 3 0 .4 4 9 3 0 .4 0 3 7 0 .2 0 1 0 0 .4 0 1 9 0 .4 0 1 9 0 .3 8 5 7 0 .2 4 7 0 0 .2 3 1 6 0 .2 3 1 6 0 .2 9 3 8 0 .4 8 1 0 0 .1 5 8 6 0 .1 5 8 6 0 .2 5 4 5 0 .6 1 2 0. 0 .4 4 0 1 0 .4 4 0 1 0 .4 3 7 8 0 .2 6 8 0 0 .3 8 9 3 0 .3 8 9 3 0 .4 2 6 2 0 .3 2 7 0 0 .2 1 7 4 0 .2 1 7 6 0 .3 1 9 5 0 .5 5 8 0 0 .1 4 6 6 0 .1 4 7 5 0 .2 7 1 0 0 .6 6 1 0. 0 .3 6 2 2 0 .5 7 3 7 2 .8 4 4 5 0 .7 8 7 0 0 .3 2 6 0 0 .4 6 6 5 2 .4 1 0 1 0 .7 5 3 0 0 .2 2 1 1 0 .2 6 1 7 0 .6 1 2 5 0 .6 8 7 0 0 .1 5 8 2 0 .1 8 6 8 0 .4 2 1 9 0 .7 3 7 0. 0 .4 1 9 7 0 .6 3 7 4 3 .2 9 2 8 0 .7 7 8 0 0 .3 8 0 2 0 .5 4 1 9 2 .6 3 3 3 0 .7 1 8 0 0 .2 6 8 1 0 .2 9 1 2 0 .6 0 9 7 0 .6 2 7 0 0 .2 0 2 6 0 .2 1 8 6 0 .4 5 1 7 0 .6 7 5 0. 0 .2 1 4 7 0 .6 5 1 7 3 .7 4 8 3 0 .2 8 5 0 0 .0 6 1 8 0 .4 4 1 3 2 .4 0 9 0 0 .3 7 9 0 0 .0 0 2 8 0 .1 5 3 0 0 .6 1 1 5 0 .7 1 9 0 0 .0 0 3 8 0 .1 0 2 3 0 .3 9 1 8 0 .8 3 3 0. 0 .1 8 2 5 0 .6 5 5 9 3 .6 3 0 5 0 .3 1 8 0 0 .0 8 0 0 0 .4 5 6 8 2 .2 5 5 8 0 .4 0 1 0 0 .0 0 7 8 0 .1 5 5 8 0 .6 2 7 5 0 .7 3 4 0 0 .0 0 3 7 0 .1 0 2 7 0 .3 9 8 9 0 .8 3 2 0. 0 .3 7 9 2 0 .6 6 2 9 2 .9 9 0 5 0 .8 8 6 0 0 .2 9 1 5 0 .5 9 4 0 2 .7 0 9 5 0 .9 0 0 0 0 .1 0 5 4 0 .3 9 9 1 2 .0 3 1 1 0 .9 5 0 0 0 .0 5 5 8 0 .2 4 4 0 1 .2 3 9 9 0 .9 4 1 0. 0 .4 1 5 6 0 .5 2 6 5 1 .9 8 2 0 0 .7 0 1 0 0 .3 5 9 1 0 .4 5 1 3 1 .4 9 7 6 0 .6 9 6 0 0 .1 2 4 3 0 .1 8 2 7 0 .6 3 8 5 0 .7 1 8 0 0 .0 3 3 5 0 .1 0 2 8 0 .3 9 6 2 0 .8 2 7 0. 0 .0 2 8 0 0 .3 8 2 2 1 .5 8 8 8 0 .9 6 0 0 -0 .0 0 9 6 0 .2 6 2 4 1 .0 1 8 4 0 .9 5 4 0 -0 .0 0 4 6 0 .1 2 0 3 0 .4 3 6 3 0 .9 5 8 0 -0 .0 0 3 2 0 .0 8 8 3 0 .3 1 5 6 0 .9 5 9 0. Table 2 reports simulation results. We use di¤erent strength (measured by the concentration parameter) of instruments and number of instruments. We investigate the case of very weak instruments for example, when CP = 8 and L = 30, the …rst CP stage F-statistic equals = 0:26. L We observe that (a) The performances of the regularized estimators increase with the strength of instruments but decrease with the number of instruments. Providing regularization parameter selection procedure robust to weak instruments would certainly improve these results.. 16.

(19) (b) The bias of regularized LIML is quite a bit smaller than that of regularized 2SLS. (c) The bias of our regularized estimators are smaller that those of the corresponding Donald and Newey’s estimators. (d) LF LIML and T-LIML estimators have very low median bias even in the case of relatively weak instruments (CP=8). (e) The coverage of our estimators deteriorates when the instruments are weak.. 6. Conclusion. This paper illustrate the usefulness of regularization techniques for estimation in many weak instruments framework. We derived the properties of the regularized 2SLS and LIML estimators in the presence of many or a continuum of moments that may be weak. We show that if well normalized the regularized 2SLS and LIML are consistent and reach the semiparametric e¢ ciency bound. Our simulations show that the leading regularized estimators (LF and T of LIML) perform well. In this work we restricted our investigation to 2SLS and LIML with weak instruments. It would be interesting, for future research to study the behavior of regularized version of other k-class estimators, such as FULL or B2SLS or other estimators as GMM or GEL, in presence of many weak instruments. This will help us to have results that can be compared with those of Newey and Windmeijer (2009) and Hansen, Hausman, and Newey (2008). Another topic of interest is the use of our regularization tools to provide version of robust tests for weak instruments as AR, LM, CLR or KLM tests, that can be used with a large number or a continuum of moment conditions.. 17.

(20) References Andrews, D. W., and J. H. Stock (2006): “Inference with Weak Instruments,” in Advances in Economics and Econometrics, ed. by R. Blundell, W. Newey, and T. Persson, vol. 3. Cambridge University Press. Angrist, J. D., and A. B. Krueger (1991): “Does Compulsory School Attendance A¤ect Schooling and Earnings?,”The Quarterly Journal of Economics, 106(4), 979– 1014. Antoine, B., and P. Lavergne (2012): “Conditional Moment Models under SemiStrong Identi…cation,”Discussion Papers dp11-04, Department of Economics, Simon Fraser University. Bai, J., and S. Ng (2002): “Determining the Number of Factors in Approximate Factor Models,” Econometrica, 70(1), 191–221. Bekker, P. A. (1994): “Alternative Approximations to the Distributions of Instrumental Variable Estimators,” Econometrica, 62(3), 657–81. Belloni, A., D. Chen, V. Chernozhukov, and C. Hansen (2012): “Sparse Models and Methods for Optimal Instruments With an Application to Eminent Domain,” Econometrica, 80(6), 2369–2429. Caner, M., and N. Yildiz (2012): “CUE with many weak instruments and nearly singular design,” Journal of Econometrics, 170(2), 422–441. Carrasco, M. (2012): “A regularization approach to the many instruments problem,” Journal of Econometrics, 170(2), 383–398. Carrasco, M., and J.-P. Florens (2000): “Generalization Of Gmm To A Continuum Of Moment Conditions,” Econometric Theory, 16(06), 797–834. (2012): “On the Asymptotic E¢ ciency of GMM,”forthcoming in Econometric Theory.. 18.

(21) Carrasco, M., and G. Tchuente (2013): “Regularized LIML for many instruments,” Discussion paper, University of Montreal. Chao, J. C., and N. R. Swanson (2005): “Consistent Estimation with a Large Number of Weak Instruments,” Econometrica, 73(5), 1673–1692. (2007): “Alternative approximations of the bias and MSE of the IV estimator under weak identi…cation with an application to bias correction,” Journal of Econometrics, 137(2), 515–555. Eichenbaum, M. S., L. P. Hansen, and K. J. Singleton (1988): “A Time Series Analysis of Representative Agent Models of Consumption and Leisure Choice under Uncertainty,” The Quarterly Journal of Economics, 103(1), 51–78. Guggenberger, P., and R. J. Smith (2005): “Generalized Empirical Likelihood Estimators And Tests Under Partial, Weak, And Strong Identi…cation,”Econometric Theory, 21(04), 667–709. Hahn, J., and J. Hausman (2003): “Weak Instruments: Diagnosis and Cures in Empirical Econometrics,” American Economic Review, 93(2), 118–125. Han, C., and P. C. B. Phillips (2006): “GMM with Many Moment Conditions,” Econometrica, 74(1), 147–192. Hansen, C., J. Hausman, and W. Newey (2008): “Estimation With Many Instrumental Variables,” Journal of Business & Economic Statistics, 26, 398–422. Hausman, J., R. Lewis, K. Menzel, and W. Newey (2011): “Properties of the CUE estimator and a modi…cation with moments,”Journal of Econometrics, 165(1), 45 –57, Moment Restriction-Based Econometric Methods. Hausman, J. A., W. K. Newey, T. Woutersen, J. C. Chao, and N. R. Swanson (2012): “Instrumental variable estimation with heteroskedasticity and many instruments,” Quantitative Economics, 3(2), 211–255.. 19.

(22) Kunitomo, N. (1980): “Asymptotic Expansions of the Distributions of Estimators in a Linear Functional Relationship and Simultaneous Equations,” Journal of the American Statistical Association, 75(371), 693–700. Morimune, K. (1983): “Approximate Distributions of k-Class Estimators When the Degree of Overidenti…ability Is Large Compared with the Sample Size,” Econometrica, 51(3), 821–41. Newey, W. K. (1990): “E¢ cient Estimation of Models with Conditional Moment Restrictions,” in Handbook of Statistics, ed. by G. Maddala, C. Rao, and H. Vinod, vol. 11, pp. 419–454. Elsevier. Newey, W. K., and F. Windmeijer (2009): “Generalized Method of Moments With Many Weak Moment Conditions,” Econometrica, 77(3), 687–719. Staiger, D., and J. H. Stock (1997): “Instrumental Variables Regression with Weak Instruments,” Econometrica, 65(3), 557–586. Stock, J. H., and M. W. Watson (2002): “Forecasting Using Principal Components from a Large Number of Predictors,”Journal of the American Statistical Association, 97(460), pp. 1167–1179. Stock, J. H., and M. Yogo (2005): “Asymptotic Distributions of Instrumental Variables Statistics with Many Instruments,” in Identi…cation and Inference for Econometric Models, ed. by D. W. K. Andrews, and J. H. Stock, pp. 109–120. Cambridge University Press. Zivot, E., R. Startz, and C. R. Nelson (1998): “Valid Con…dence Intervals and Inference in the Presence of Weak Instruments,” International Economic Review, 39(4), 1119–46.. 20.

(23) A. Proofs. Proof of Proposition 1: We …rst prove the consistency our estimator. " of # n n n p p 1X 1X 1X Let gn = Zi Wi = Sn Zi fi = n + Zi ui = Sn gn1 = n + gn2 (rememn n n i=1 i=1 i=1 ber that gn is a function indexed by and Zi is also a function of , such a representation can handle both countable and continuum of instruments). Note3 that n p p 1X gn2 = Zi ui = op (1); ngn2 = Op (1) and Sn = n is bounded from assumption n i=1 1(i).. ^ We have Sn0 (^. 0 )= n. 0. = (W 0 P W ) 0. = [Sn 1 W 0 P W Sn 1 ]. 1. 1. W 0P ". [Sn 1 W 0 P "=. n]. and by construction of. P : W 0P W. Sn 1 W 0 P W S n 1. 0. D E = n (Kn ) 1=2 gn ; (Kn ) 1=2 gn0 D E 0 = Sn (Kn ) 1=2 gn1 ; (Kn ) 1=2 gn1 Sn0 D Ep 0 n +Sn (Kn ) 1=2 gn1 ; (Kn ) 1=2 gn2 D E p 0 + (Kn ) 1=2 gn2 ; (Kn ) 1=2 gn1 Sn0 n D E 0 + (Kn ) 1=2 gn2 ; (Kn ) 1=2 gn2 n: =. D. E 0 (Kn ) 1=2 gn1 ; (Kn ) 1=2 gn1 D p 0 E 10 + (Kn ) 1=2 gn1 ; (Kn ) 1=2 ngn2 Sn E D p 0 +Sn 1 (Kn ) 1=2 ngn2 ; (Kn ) 1=2 gn1 D p p 0 E 10 +Sn 1 (Kn ) 1=2 ngn2 ; (Kn ) 1=2 ngn2 Sn :. Hence, 0. Sn 1 [W 0 P W ]Sn 1 = (Kn ) 3. 1 2. gn1 ; (Kn ). 1 2. 0 gn1 + op (1):. Let g and h be two p vectors of functions of L2 ( ). By a slight abuse of notation, g; h0 ; denotes the matrix with elements ga ; hb a; b = 1; :::; p. 21.

(24) At this stage, we can apply the same proof as that of Proposition 1 of Carrasco (2012) which shows that 1 2. (Kn ) in probability as n and n element K. 1 2. 1 2. n. =. 1 2. 0 gn1 ! g1 ; g10. go to in…nity, with g1 ; g10 1 2. E(Z(:; xi )fia ); K. Sn 1 W 0 P ". gn1 ; (Kn ). nSn 1 n. *. K. K. ap. p matrix with (a; b). E(Z(:; xi )fib ) which is assumed to be nonsingular. *. n. 1X (Kn ) 1=2 gn ; (Kn ) 1=2 Zi "i n i=1. +. + n X 1 1 = (Kn ) 1=2 gn1 ; (Kn ) 1=2 p Zi "i n n i=1 + * p X n Sn 1 1=2 n 1=2 p ngn2 ; (Kn ) + (Kn ) Zi "i n n i=1. = op (1) : n. 1 X because p Zi "i = Op (1). This proves the consistency of the regularized 2SLS. n i=1 For the asymptotic normality we write Sn0 (^. 0). = [Sn 1 W 0 P W Sn0 1 ]. 1. [Sn 1 W 0 P "]. We then have n. 1. 0. Sn W P " = nSn. 1. (Kn ). 1. 1X Zi "i gn ; n i=1. + n X 1 Zi "i = (Kn ) 1=2 gn1 ; (Kn ) 1=2 p n i=1 * + p X n 1 1=2 p 1=2 n +Sn (Kn ) ngn2 ; (Kn ) Zi "i n i=1 * + n X 1 = (Kn ) 1=2 gn1 ; (Kn ) 1=2 p Zi "i n *. i=1. +op (1) :. 22.

(25) Moreover, *. n. 1=2. (Kn ). =. gn1 ; (Kn ). 1=2. 1 X p Zi "i n i=1. (Kn ). 1. +. n 1 X 1 p K g1 ; Zi "i n. gn1. (2) (3). i=1. n. +. K. 1. 1 X g1 ; p Zi "i ) : n i=1. The …rst term is negligible since n n 1 X 1 X 1 1 1 1 p p (Kn ) gn1 K g1 ; Zi "i ) k(Kn ) gn1 K g1 kk Zi "i k = op (1)Op (1): n n i=1 i=1 By the functional central limit theorem, we obtain the following result n 1 X K 1 g1 ; p Zi "i ! N (0; 2" g1 ; g10 K ) as n and n , n go to in…nity. n i=1 We then apply the continuous mapping theorem and Slutzky lemma to show that Sn0 (^. 0). d. ! N (0;. 2 ". g1 ; g10. 1 ): K. By assumption, g1a = E(Z(:; xi )fia ) belong to the range of K. Let L2 (Z) be the closure of the space spanned by fZ(x; );. 2 Ig and g1 is an element of this space. If. fi 2 L2 (Z) we can compute the inner product in the RKHS and show that g1a ; g1b. K. = E(fia fib ):. This can be seen by applying Theorem 6.4 of Carrasco, Florens, and Renault (2007). It follows that Sn0 (^. 0). d. ! N 0;. 2 ". E fi fi0. 1. This completes the proof of Proposition 1. Proof of Proposition 2: To prove this proposition we need some lemmas. The …rst lemma corresponds to lemma A0 of Hansen, Hausman, and Newey (2008). Lemma 1: Under assumption 1 if kSn0 (^L 0 )= n k. 0 )= n k. 2. =(1+k^L k2 ) ! 0 then kSn0 (^L. ! 0.. proof: The proof of this lemma is the same as in Hansen, Hausman, and Newey (2008).. 23.

(26) Lemma 2: Let us assume that there exists a constant C such that E(k"i k4 jX). and E(kuai k4 jX). C. C for all i. Then, X C( qj2 );. V ar("0 P ua ). j. "0 P u a. X 1 E("0 P ua jX) = O(( qj2 ) 2 ); j. "0 P " 2 n. 1. = Op. = op (1) :. 2 n. proof: For notational simplicity we suppress the conditioning on X. Let E("2i ) = "ua. and E(u0ai uai ) =. 2 ",. E("i uai ) =. 2 ua ,. V ar("0 P ua ) = E("0 P ua u0a P "). E("0 P ua )E(u0a P "):. Using the spectral decomposition of P , we have n o 1 X 0 0 0 0 0 q q E (" )(u ) (" )(u ) j l l j a l a j n2 j;l nX X 1 X 0 2 "b uab 2jb q q E " u = j i l ai li n2 i b j;l X X + "c u0ac lc jc "d uad jd ld. E("0 P ua u0a P ") =. c. +. X. "2c. lc. jc. c. X. d. u0ad uad. jd. ld. d. X = ( qj )2. 0 "ua "ua. +(. 0 "ua "ua. o. +. 2 2 " ua ). j. X. qj2. j. by the fact that (uai ; "i ) are independent across i and the eigenvectors are orthonormal. E("0 P ua ) = = =. X 1X ql Ef( u0ak n l k 1X 0 ql n "ua n l X 0 qj ): "ua ( j. 24. X. lk )(. i. "i. li )g.

(27) Thus V ar("0 P ua ) = (. 0 "ua "ua. 2 2 " ua ). +. X. X C( qj2 ):. qj2. j. j. The second conclusion follows by Markov inequality.. E "0 P ". = tr P E ""0 X 2 = qj ) = Op (1= ) : "( j. Using the result for "0 P ua with " in place of ua , we obtain X C( qj2 ):. V ar("0 P "). j. It follows that "0 P ". E "0 P ". =. 2 n. 00 11=2 X B = Op @@ qj2 A = j. Hence, the third equality holds.. 1. 2C nA. 0. = op @. X j. qj =. 1. 2A n :. 0 f 0P f ^ = W W with W = [y; W ] , there exist two and B Lemma 3: Let A^ = n n ^ constants C and C 0 such that A^ CIp and kBk C 0:. Proof: By the de…nition of P , we have (see Equation (1)): f 0P f A^ = = (Kn ) n with fn =. 1 2. fn ; (Kn ). 1 2. fn0. 1X Zi fi : n i. By Lemma 5(i) of Carrasco (2012) and the law of large numbers, f 0P f f 0f = + op (1) = E(fi0 fi ) + op (1) n n as. goes to zero. Because E(fi0 fi ) is positive de…nite, there exists a constant C such. that A^. CIp. 25.

(28) with probability one. We have W = [y; W ] = W D0 + "e where D0 = [ 0 ; I],. 0. is the true value of the. parameter and e is the …rst unit vector. W 0W n. ^ = B. p f 0f 0 f 0" p f 0u Sn D0 =n + D00 Sn D0 = n + D00 Sn e= n n n n p u0 f 0 u0 u u0 " + D00 Sn D0 = n + D00 D0 + D00 e n n n p "0 f 0 "0 u "0 " + e0 Sn D0 = n + e0 D0 + e0 e n n n. = D00 Sn. ^ Using the law of large numbers, we can conclude that kBk. C 0 , where C 0 is a constant,. with probability one. Proof of consistency Let us consider ^ ) = (y Q(. W )0 P (y W )= (y W )0 (y W )=n. 2 n. ^L = argminQ( ) For. =. 0,. 0 ^ 0 ) = " P "= Q( "0 "=n. 2 n. : With probability one "0 "=n > C and by lemma 2 "0 P "=. 2 n. = op (1):. ^ 0 ) = op (1). Hence Q( Since 0. ^ ^L ) Q(. ^ 0 ) it is easy to see that Q( ^ ^L ) = op (1). Q(. Let us show that. n. Let D ( ) =. n. 2. (y. 2. (y. W )0 P (y. W )0 P (y. possible to proof that D ( ) =. n. W ). CkSn0 (^L. (1;. n. 2. from lemma 3 that D ^L. 2. :. (1; 0 )W 0 P W (1; 0 )0 and it is f 0P f 0 0 Sn D0 (1; 0 )0 + op (1). It follows )D00 Sn n. W ), D = 2. 0 )= n k. CkSn0 (^L. 26. 0 )= n k. 2. :.

(29) We also have that (y. W )0 (y. W )=n = (1;. kSn0 (^L. (1 +. 0 )= n k k^L k2 ). Then by Lemma 1 we have Sn0 (^L. 0 )= n. 2. 0. ^ )B(1;. 0 02. ) ). Hence,. ^ ^L ) C Q( 2 n. ! 0 in probability as n and. go to. in…nity. This proves the consistency of LIML with many weak instruments. Now let us prove the asymptotic normality. Proof of asymptotic normality: Denote A( ) = (y. W )0 P (y. W )0 (y. W ) and. [A ( ). ( )B ( )]. W )=2 , B( ) = (y A( ) B( ). ( )=. We know that the LIML is ^L = argmin ( ): We calculate the gradient and Hessian ( ) = B( ). 1. [A ( ). ( )B ( )]. ( ) = B( ) B( ). 1. 1. 0. [B ( ). ( ). ( )B 0 ( )]:. Then by a standard mean-value expansion of the …rst-order conditions. (^) = 0;. we have with probability one.. Sn0 (^L. 0). 0 (~)Sn 1 ]. [Sn 1. =. 1. [Sn0. ( 0 )]. where ~ is the mean-value. By the consistency of ^L , ~ !. 0.. It then follows that B (~)=n =. 2. X. Wi~"i =n;. i. =. 2. X. (. i. + ui )~"i =n. i. =. p X 2Sn = n( fi~"i =n) i. =. 2. u". + op (1). X 2( ui~"i =n) i. p under the assumption that Sn = n is bounded, with ~"i = (yi P B(~)=n !. 2 ";. P B (~)=n !. 27. 2. u". Wi0 ~) and. u". = E(ui "i )..

(30) ( )= For. =. 2 n. n;. 0,. ( 0) =. (y. W )0 P (y W )=2n (y W )0 (y W )=n. "0 P "=2n : With probability one "0 "=n > C and with lemma 2 and "0 "=n "0 P "=n = op (1):. We have. ( 0 ) = op (1). Therefore,. P (~) ! 0: By the …rst order condition, we also. have P (~) ! 0: P B (~) = 2W 0 W=n ! 2E(Wi Wi0 ); A (~)=n = W 0 W=n:. We can then conclude that n~ 2". (~) = nB. 1. (~)[A (~)=n] + op (1): Hence. (~) = W 0 P W = Sn (Kn ). 1 2. gn1 ; (Kn ). 1 2. 0 gn1 Sn0 + op (1). = Sn HSn0 + op (1) with H = E(f (xi )f (xi )0 ) and ~ 2" = (y Hence n~ 2" Sn 1. W e)0 (y. W e)=n:. 0 (~)Sn 1 = H + op (1). p W 0" u" Let ^ = 0 , = 2 and v = u " 0 . It is useful to remark that v 0 P " = Op (1= ) "" " p using Lemma 2 with v in place of u and E (ui vi ) = 0. Moreover, ^ = Op (1= n) by p the central limit theorem and the delta method. Hence, ( ^ )"0 P " = Op (1= n). p Furthermore, f 0 (I P ) "= n = Op ( 2 ) = op (1) by Lemma 5(ii) Carrasco (2012) with. = tr(f 0 (I. n~ 2 Sn 1. P )2 f =n) = Op. min( ;2). = op (1) :. W 0" ( 0 ) = Sn 1 (W 0 P " "0 P " 0 ) "" p 0 p 0 = f "= n f (I P ) "= n + Sn 1 v 0 P " Sn 1 ( ^ p p p = f 0 "= n + op (1) + Sn 1 Op (1= ) + Sn 1 Op (1= n) p d = f 0 "= n + op (1) ! N (0; 2" H). 28. )"0 P ".

(31) as n,. 2 n. go to in…nity under the assumption. n Sn. 1. ! S0 .. The conclusion follows from the use of Slutzky theorem.. 29.

(32)