Digitized
by
the
Internet
Archive
in2011
with
funding
from
IVIIT
Libraries
working paper
department
of
economics
¥m^'^'hm-COMPARING SPECIFICATION TESTS AND CLASSICAL TESTS
Jerry A, Hausman William E. Taylor* Number 266 August 1980
massachusetts
institute
of
technology
50 memorial
drive
Cambridge, mass.
02139
First Draft
Comments Welcome
COMPARING SPECIFICATION TESTS AND CLASSICAL TESTS
Jerry A. Hausman
William E. Taylor*
Number 266 August 1980
MIT and Bell Labs, respectively. Hausman
Dewey
COMPARIMa SPECIFICATION TESTS AND CLASSICAL TESTS
by
Jerry A. Hausman
Massachusetts Institute of Technology Cambridge, Massachusetts 02139
and
William E. Taylor
Bell Telephone Laboratories, Inc.
Murray Hill, New Jersey 0797^
ABSTRACT
A parametric restriction is often interesting because
it makes pos^iible simplifications or Improvements in estimators
for the parameters of primary interest. In such cases, a
specification test examines the effect of imposing the
restrictions on the estimator, whereas classical tests exar.ine
the restrictions themselves in light of the data. In some
circumstances, this leads to discrepancies in large sample
behavior between (i) specification tests and (ii) likelihood
ratio, Wald, and Lagrange multiplier tests. We examine this
distinction in three cases of recent interest: exclusion
restrictions in a simple linear model, parametric restrictions
in a general non-linear implicit model, and exogeneity restrictions
1. Introduction
A tenet of large sample statistical theory is the
sufficiency of the trinity of tests: that any reasonable
test of a statistical hypothesis is at least asymptotically
equivalent to a likelihood ratio, Wald, or Lagrange multiplier
test. Recently, a class of mis-specification tests was
introduced (Hausman (1978)) which makes use of the difference
between parameter estimates which impose and do not impose
a null hypothesis; and some speculation has ensued concerning
the relationships among these tests and the trinity. Holly
(1980a,b) in particular has compared the specification test
of (i) parametric restrictions with nuisance parameters, and
(ii) exogenelty in a triangular simultaneous equations system
with the conventional tests and has found - in some
circumstances - significant differences in large sample
behavior. In this paper, we explain these discrepancies in
terms which shed some light on the hypothesis that the
specification test is actually testing.
A parametric restriction is often interesting because
it enables us to simplify or improve our estimator for the
parameters of primary interest. Thus an uninteresting variable
- 2
-treated as endogenous solely to ensure consistent estimates
of the parameters of Interest. In both cases, the restriction
Involved may be tested, but the cost of a type I or II error
depends upon the effect of that error on the point estimates
of the parameters that matter. Thus excluding a variable
from a linear regression is costly only to the extent that
estimates of the remaining slope coefficients change, given
their sampling errors. Similarly, the cost to treating a
variable as predetermined depends upon the difference that
restriction makes in the point estimates Involved.
The specification test is based upon the difference in
the point estimates caused by Imposing the restrictions. In
the cases above, imposing the restrictions gains a little
efficiency (if true) but sacrifices consistency if false.
Thus the specification test detects departures from the
restrictions in an appropriate norm, and this is shown below
to characterize the difference between specification tests
and conventional tests. Indeed, In the cases considered
below, we are able to show that the specification test of a
set of restrictions is identical to a conventional test
of the specification hypothesis that Imposing the restrictions
does not affect the point estimates. Being asymptotically
- 3
-familiar optimal local power properties, so that In
circumstances in which the specification hypothesis is
the relevant hypothesis, the specification test is probably
the test of choice.
In essence, these results point out the importance
of carefully specifying the null hypothesis one wishes to
test. In the linear regression model in section 2, we show
that the common practice of omitting variables whose
estimated coefficients fall an F test is often based on a
test whose nominal size is smaller than its true size. In
such cases, the specification test is uniformly most powerful
among invariant tests of the specification hypothesis and
clearly dominates the F test. These and other results are
extended in section 3 to explain anomalies in the specification
test in non-linear models outlined in Holly (1980a), These principles are applied to exogeneity tests in simultaneous
equations systems (section 4). This generalizes and
explains the difference between the specification test and
the Lagrange multiplier test for recursiveness in
_ n _
2
.
Specification tests In the linear model
2.1. The model and the m test
The basic idea discussed in the introduction can be
demonstrated simply by specification tests involving linear
homogeneous restrictions in the linear model. Suppose
(2.1)' Y = X^6-j_ + X2B2 + e
where 3,, 6- are k jk- vectors of unknown parameters (k +kp=k)
and e is a T vector of Gauss-Markov disturbances. The coefficients
3, are presumed to be of primary interest, and the columns of
X„ are Included in equation (2.1) solely to avoid specification
error in our estimates of 6,. For a practical example, interpret
equation (2.1) as a demand function for the services of a
regulated public utility in which the scalar X, represents the
service price. In a rate hearing, point estimates of the price
elasticity are required in order to set prices in the next period
to achieve an assigned rate of return. Effects of changes in other
prices and income (the columns of Xp) are of interest only insofar
as they influence our estimate of 6-, •
- 5
-against the alternative
'\j
In this case, though, there Is some suggestion that certain
functions of 0p are more important than others. This point will
be developed at length below.
An alternative interpretation of this problem can be
set in the context of limited information simultaneous equations
estimation which we discuss in Section 4. Consider
(2.2) Y = X^e^ + [X^Sg + e] = X^e^ + e
where plim ;=rXIe = (1=1,2). We are concerned with estimating
^, and must determine if
plim ^X'e = plim ix'X 6 =
in order that least squares estimates of B-, be consistent. Again,
one typically examines H : Bp = but recognizes that estimable
functions of the form A'Bp = C'X'X Sp are of particular interest.
Following Hausman (1978), both interpretations of the
problem lead to the same specification test. For the linear model
case, the specification test involves the difference between the
least squares estimates of B-, including and excluding Xp from the
- 6
-q = (Xj^Q2X^)"^Xj^Q2Y - (Xj^X^)~^Xj^Y
where Q^ - I.^, - X^(Xp^)"-'-Xj^ = I^ - P^ for 1=1,2. For the
simultaneous equations Interpretation, we compare the two
stage least squares (2SLS) estimates of B, In equation (2.2)
using Q^ as instruments with those using W = [X, !Qp] as
Instruments. Since the columns of XiQp are uncorrelated with
c under both H and H^ , this comparison represents a test of
the hypothesis that pllm mX'e = 0. This specification test is
based upon
{Xp^X^)-\Q^Y
- (X'P^X^)-^X^P^Y= (Xj^Q2X^)"^Xj_Q2Y - (Xj^X^)~^Xp = q
where P^ represents the orthogonal projection operator onto the
column space of W and P,,X- = X^ .
W 1 1
Under H , E(q|X^,Xp) = so that we reject the null hypothesis
a.
if the length of q differs from by more than sampling error.
Hence the specification test statistic Is
m = q' [Var(q) J q
where [] denotes the Moore-Penrose generalized Inverse of [].
This represents a modification of the procedure proposed by
All results in this paper hold for any consistently defined generalized inverse.
- 7
-2
Hausman (1978) to allow for the possible singularity of the
matrix
Var(q) =
aH
(Xj_Q2X^)~^ - (Xj_X^)"^]where Var(e) = a^l„. If we write
q = [(Xj_Q2X^)"^Xj_Q2-(X£X^)"^Xj_]Y = DY,
note that Var(q) = a^DD'.
Lemma 2.1: (Eaton (1972) Proposition 3.19, p. 3.10). Suppose
Y '^ N(y,Z) where
u is an element of the range space
of Z which is possibly singular. If A is symmetric
and Z = DD'y then Y'AY is distributed as non-central x
with degrees of freedom = rank(D'AD) and non-centrality
parameter = y'Ay if and only if D'AD is idempotent
.
Proposition 2.1: If e is normally distributed, then under H
, '^^^ '^ xi where d = rank(D). Proof: Under H , q = DY = De, so that o a^m = e'D'(DD')'^De.
The proof follows from the lemma since D*(DD*) D is idempotent
for any generalized inverse ( ) .
The assumption of normality can be relaxed; and assuming that
=rX'X, and ifXpXp converge to non-singular matrices, we can derive
8
-Proposition 2.2: Under the 'null hypothesis, o^m converges In
distribution to a x^ random variable.
Proof: Writing
m = /Tq'[TVar(q)]"^/Tq,
/v d «
observe that /Tq -^ N(0,o DD'), where in an abuse of notation
o^DD' denotes lim TVar(q) = lim[ (^X'Q„X^ )~^ - (^X'X, )"'] o^.
TVoo TU-oo J- -L ^ -1- I 1 X
The proof follows from the lemma again, since D'(DD') D is
idempotent
.
Using the lemma, it follows immediately that
Corollary 2.1: Under H^ , a^m is distributed as non-central x^ with non-centrallty parameter
\
= 6^X^X^(Xp^)"^[(X-|_Q2X^)"^-(Xj_X^)~^]"^(Xp[^)"^X|X2B2either asymptotically, or - assuming normality - in
finite samples.
The constant o^ can be removed from the above propositions. For
Proposition 2.2, any consistent estimator can be substituted for
- 2 1
a . For Proposition 2.1, let s denote times the sum of
T-k
squared least squares residuals from equation (2.1).
Corollary 2.2 : Assuming e is normally distributed,
mi = 5!rk
[q'(DD')Vs^]
'\' F(d,T-k)under H .
o
Proof: Let Q = I^-X(X'X)~-'-X ' where X = [X :X2]. Then s^ Is
- 9
-H . Since D' lies In the column space of X, QD'(DD')'^D = 0;
2
3 and m are thus independent by Proposition 3.30 (page 3.15)
of Eaton (1972).
In general, we will assume that the columns of X^ and
Xp are not orthogonal - in particular, that X'X^ has full
rank, so that rank (X'X_) = rank (X'X, ) = min (k.jkp). Moreover,
we will assume we have sufficient observations to estimate all
k parameters; thus T-k. > k. and X!Q X. Is non-singular for
i7^j=l,2. Under these assumptions.
Proposition 2.3: The degrees of freedom for the m test are
given by d = rai
Proof: From the definition of D,
given by d = rank (D) = mln(k ,k ) . -XjX^D = X^ - (Xp^)(Xj^Q2X^)-4'Q2. Since P2+Q2 = I^, -X^X^D = X^P^ + [I - (X'X^)(X'Q2X^)-1]X|Q2 = X^P^ -H CX'Q^X^
-XiX^](XiQ2X^)-\Q2
= X£P2 - CXj_P2X^](Xj_Q2X^)-^Xj[Q2. Thus -D = (X'X^)-lx'P2Cl - X^(X^Q2X^)-4j_Q2]so that d < min [rank(Xj^X^)~^Xj_P2, rank(I-X^(X^Q2X^)"^Xj_Q2) ] .
Since rank(Xj_X2) = min(k^,k2) and I-X^(Xj^Q2X^)"^Xj^Q2 is
- 10
-observe that DP^ «= (X'X.)" X'Pp whose rank equals inin(k^,k
); thus
rank (D) >_ mlnCkj^.k^), which. In conjunction with d _< mln(k ,k )^ completes the proof.
2.2. Comparinr, the m and F tests
For the null hypothesis H : = 0, the F test is based on
the length of the least squares estimate of 6- In equation (2.1):
(2.3) F = t^lY&riQ^)r-^B^ = Y'Q^^X^ (X^Q^X2)"-'-X^Q^Y/a^
From now on, for convenience, we will assume that a^ Is known
to be 1, so that all relevent tests are x^ tests. Since s Is
Independent of both m and F, this simplification will not affect
our results. Note that under H , F Is distributed as v,^ and that
o '^k.2
under H, , it is distributed as non-central x^ with non-centrality
parameter A^ = Q^X^Q^X^B^.
In the context of specification tests, a related null
hypothesis of some interest is
H»: (X'Xt)"-^X'X_6^ =
O
11
12^
with corresponding alternative H* : (X^X^)"-'-Xj^X262 ?^ 0. Note that
H* represents the hypothesis that the bias in the least squares estimate of B-, when 6„ is omitted is zero. In some circumstances,
the potential distinction between H and H* will be quite
important. Heuristlcally, H is a set of restrictions on all
estimable functions of 6^, whereas H* restricts only a subset of
2* o
- 11
-Proposition 2.^ : The restrictions represented by H and
H» are identical if and only if k, > k..
o
1—2
Proof: H^: i^ = always Implies H»: (X'X )"-'-X'X282 = 0, so we
need to verify only the reverse Implication. If k^
^
^p» ^hen ^2^1^2 ^^ non-singular. Premultiplying H* by (X'P X )"-'-X'Xyields
(X^P^X2)"-'-X^X^*(X|X^)"-'-Xj_X262 = =^ Sj = •
If k,< kp, the null space of X|X„ is non-empty, so that
X'XpSp = does not imply that 3^ = 0.
The relationship between H and H* is reflected in the
corresponding F tests of H and H*. Let F* denote the length of
the least squares estimate of (X'X )~ X'X-BpS i.e.,
F« = B^X^X^(Xj_X^)"^{VarC(Xp^)"^Xj_X2e2]}"^(Xj_X^)"^Xj_X202
(2.ii) = Y'Q^X2(X^Q^X2)"-'-X^X^[Xj_X2(X^Q^X2)"-'-X^X^]'^
X Xj^X2(X^Q^X2)"-'-X^Q^Y .
Under H*, F* is distributed as x^ with degrees of freedom equal
parameter can be shown to be
to rank CX|X2(X^Q^X2)~ X^X^]; under H»,lts non-centrallty
- 12
-To compare F and F*, two little-known (to us) matrix
Identities will be useful. Let A,B,C, and D be conformable
matrices with D non-singular (and square). Assume
C'A B + D is non-singular, and denote the column space
of a matrix by M( . )
.
Lemma 2.2: (Rao and Mitra (1971) pp. 70-71). If M(B)
C
M(A) ana /M(C)CM(A),
then[A+BDC]"^ = a"*" - A'^BEC'A'^B + D~-'-]"^C'A"*' .
A version of this result for non-singular A and BDC ' appears
in Smith (1973)
•
Lemma 2.3 : (Rao and Mitra (1971) p. 22). If rank (ABC) = rank (B),
then ClABC]"'"a = B"*".
Botn of these lemmas hold with minor modifications for any
generalized inverse. From Lemma 2.2, we can easily derive a
result which is very useful in linear model manipulations.
Lemma 2.4 : In the notation of equation (2.1):
(Xp^X^)"^ = (X^X^)"^ * (Xj^X^)"^XjXj[XjQ^Xj]~^XjX^(X^X^)"^
where i?^j=l,2.
Proof: Let A = XIX., B=C'=X:X., and D = -(X'.X.)"^. Since A is
1 1 1 J J J
non-singular, M(B) =
M(C')C
M(A); applying Lenma 2.2 withsome algebra concludes the proof. Note that if k > k.,
BDC
musr- 13
-Given the relationship between H and H* in Proposition
2.4, the following relationship between their associated
F tests is obvious:
Proposition 2.5 : If k > kp, P = F*.
Proof: From equation (2.4),
= Y'Q^X2(X^Q^X2)"^[(X^Q^X2)"^]"^(X^Q^X2)"^X^Q^Y
by Lemma 2.3, since rank(Xj^X2(X^Q^X2)"-'-X^X^) = rank[ (X^q'^X2)"-^]
if and only if k >_ k . Thus if k >_ k
,
F» = Y'Q^X2(X^Q^X2)~-^X^Q^Y = F
from equation (2.3).
These tests are then related to the m test for
specification error by the following argument. F* and m can
be thought of as the length of two different estimates of the
bias in the least squares estimate of g. from omitting Q^:
the actual bias is
B = (X£X^)-1X'X262
and the different estimates are
(2.5) §p, = (X^X^)"^Xj^X262 = (Xj_X^)"^Xj^X2(X^Q^X2)"-'-X^Q^Y
and
1^
-Note that B can be regarded as an estimate of the bias since
m
Its expectation Is -(X|X )~-^X'X
Bp-Proposition 2.6 : For any k ,k„, P* = m.
Proof: F« = 6^,eCVar(Bp,^) ]'*'Bp, and m = B^[Var(B^)D'^Bj^, so
that F* = m If B_^ = - B , which we verify below. Using
equation (2.^) and Lemma 2.^, we can write
6p,f = (Xp^)"^Xp2Q-LY + (Xj_X^)~^Xj_P2X^(Xj^Q2X^)"^Xj_P2Q-[^Y.
Replacing P_ by I^-Qp yields
%*
= -(Xj_Q2X^)~^Xj_Q2Q^Y = (Xj_Q2X^)~^X^Q2P^Y - (Xj_Q2X^)"^Xp2Y
= (Xp^)"^Xj_Y - (Xj_Q2X^)"^Xj^Q2Y
= - B .
m
In summary, the specification test statistic m is equal
(for all k-, and k^) to the F test statistic F* for the hypothesis
that certain linear functions of 0p equal zero. If k-;
L
^p'
both m and F* are equal to the F test statistic F for the
hypothesis that all linear functions of 6- are zero. Since the
latter F test is in common use, it is interesting to compare
it with the m (and F*) test when k, < k and they are different.
2.3 F and m compared when k, < k^.
At the outset, we must specify which null hypothesis is
under consideration. Under H , F and m are distributed as o'
- 15
-respectively - recalling that k, < kp in this section. Under
H * however, m Is distributed as central y* with k. degrees
o -*
of freedom but F has a non-central xh ^distribution with
non-cnetrality parameter Xp = QlXl^Q^X^Q^ >_ 0. Thus as a test
statistic for H*, P does not have the usual xt distribution.
If we mistakenly use that distribution - i.e., if we test H
when we should test H* - we obtain a test whose nominal size
o
is smaller than its true size.
Considered, then, as a test of H , m has strictly fewer
degrees of freedom than F. However, under the alternative
hypothesis.
Proposition 2.7: The non-centrality parameter of the m test
is less than or equal to that of the F test;
i.e., when k^< k^, X^ < X^,, for all Bp.
Proof: From Corollary 2.1,
\
= S^X'X^(Xj_X^)-^C(Xj_Q2Xi)"^ - (X£X^)'^]^(x-x^)-^xj_X2e2= e^X^[X^(Xj_P2X^)"^Xj - ^-^^2^2
using Lemma 2.4. Thus
^m = B^X'[P2X^(XiP2X^)-lxiP2 - F^^X^Q^ ^
since P^X (X|P2X )"-'-X'P2 Is idempotent,
8'X'(P2X^(X'P2X3_)-lxiP2)X2B2 1 3^X^X282
for all 62' Thus X^^ <B^X2(I-P^)X282 =
^2^2V2^2
'^- 16
-Thus m has fewer degrees of freedom and smaller non-centrality
than F for the hypothesis H : 62 = 0. To compare m and F,
recall that the power of a x^ test of fixed size (a) increases
with the non-centrality parameter for fixed degrees of freedom, 2 and (b) decreases with the degrees of freedom for fixed X. The
m test will thus be relatively more powerful when k, is much
smaller than k^ and X is close to A„. In general, the relative
2 m F ^ '
power of the tests depends upon the trade-off between degrees
of freedom and non-centrality; this can be calculated numerically
from the tables of the non-central x^ distribution but nothing
much can be said analytically.
Recognizing that the m test does not treat all estimable
functions of 3- symmetrically, one is led to calculate the
direction in which the m test has greatest power. Without loss
of generality, we restrict our attention to 6p of unit length:
Proposition 2.8: X is maximized over 6„ whenever 6- lies in
m 2 ^
the column space of (X^X„)~ X^X^; i.e., for
$2 of the form &^ = iX^X^)~-^X^X^£, for any k-j^
vector i.
This can be inferred from the Pearson and Hartley (1951)
charts of the power of the F test, which are reproduced in
Scheffe (1959), pp. >^3S-^^5. A particularly convenient set of
tables of the distribution of a non-central x^ variate is
Haynan, GovindaraJ ulu, and Leone (1962), part of which is
- 17
-Proof: Making the substitution B^ = (X^X2)"'''X^X^C in
Corollary 2.1, we obtain
^m ' C'Xj_P2[I-Pi]P2XiC = S^X^Q3_X2S2 = A^
and the result follows from Proposition 2.7. It may be of some
interest to note that this direction of maximum power is
precisely the direction in which the least squares estimate
of S>2 Is biased when X, is omitted from equation (2.1).
For the null hypothesis H against the alternative H
there are thus estimable functions of gp against which the m
test has strictly greater power than the F test. On the other
hand, there are functions of 3p against which the F test is
more powerful."^ In light of the optimum properties of the P
test, this ambiguity should not appear surprising. The P
test is uniformly most powerful among invariant tests of
H : 3n = 0: the m test is not invariant for H since it depends
o d ~ o
upon the covariance between X, and
X-.
There are some alternatives against which the m test of H
performs particularly poorly. For k, < k-, the null space of
X'X- is non-empty; for Q lying in that space, the power
function of the m test is flat with power equal to the size of
the test. As this persists in large samples, we conclude that
- 18
-Proposition 2.9 : For k, < kp, the m test is Inconsistent for
H against H^
.
Of course, f:'om the viewpoint of testing for mls-speclflcatlon
in the estimation of 6,, this inconslstancy is irrelevant, since
alternatives in the null space of X'X- do not contribute to
the bias of the least squares estimate of 6-. . If, however,
H : 6p = is of Interest in its own right, then Proposition
2.9 is a serious Indictment of the m test for H .
o
For the linear model, this comparison of the P and m tests
clearly depends upon which null hypothesis is being considered.
From the viewpoint of specification tests, the relevant null
hypothesis is H* : (X'X^)~-'-X'X 6 = 0; the m test is precisely
equivalent to the F test for this hypothesis - and thus is
equivalent to the likelihood ratio, Wald, and Lagrange multiplier
tests for H* . H*, in turn, is equivalent to H for k, > k ,
o o '
o
1—2
so in this case, the m and F tests for H are equivalent. For
o
k,
12'
< k„, the m and F tests for H differ. The m test has smallero
degrees of freedom but also a smaller noncentrallty parameter than
the F test for H ; thus neither test is more powerful than the
other for all alternatives H, .
If interest in the hypothesis H : Q^ = centers around
- 19
-relevant null hypothesis Is the mis-specification hypothesis
H*. The m test is uniformly most powerful invariant for this
o
hypothesis, whereas the F statistic for H defines a test for
H* of the wrong size, despite being UMP invariant for H . For
the mis-specificatlon hypothesis, the m test is clearly
20
-3. The genor.il non-linear model
The characterization of the m test for linear models
in the previous section extends easily to a non-linear
framework. Using a model and some results from Holly (1980^),
we establish that the m test is asymptotically equivalent to
the likelihood ratio, Wald and Lagrange multiplier tests
of the hypothesis analogous to H* in the previous section; i.e.,
that the asymptotic bias in maximum likelihood estimates of a
subset of parameters is zero when the remaining parameters
are constrained.
Following Holly(1980a), consider a family of models having
log-likelihood L(e,Y) for T observations, where (6,y) are (p,q)
vectors of unxnown parameters respectively. A null hypothesis of
interest is
H : e = e'
o
against a sequence of local alternatives
H^: e = e^ = e° + 6//T.
Deviating from Holly, we assume the framework of a specification
test: that we are primarily interested in estimating y and
are concerned about H only insofar as it affects that estimation .
o
- 21
-and not Imposing H . For large T, these estimators are the
solution of
|^(S) '
respectively, where 6' = (0'*.y')' and the true parameter vector
is 6"' = (e°':Y°')'.
Under suitable regularity conditions. Holly (1980a) shows
that
(3.1) /t(9-y") '
i\,-^,,^',\\,yh-i^,^i\^ji^^')
-tIIt^^")^'
(3.2) /T(y°-Y°) ^ -I'-'-I fi6 + I"^ -^ 1^(6°),
YY y6 yY *'T 3y
and using his equations (3) and (4), one can show that
Note that 1 3*L - pli:n T 36^6^(6°) = Y9 yY
and that sufficient regularity is assumed so that
1 a^L
- 22
-converges almost surely to the Information matrix as 6 -» 6°;
see Holly (1980a) for details.
From equations (3.1) and (3.2), one can show that
/TCy-y") -^ N(0,[I -I .i~ii. ]~-^)
\ I
1 ' X » I.
YY y6 66 6y
and
/T(y°-y": - N(-I^^I^q6.I-^).
By the argument in Hausman (1978) ,
(3.^) /T(y-Y°)-^^' ' N( I"^I ^6, [I -I .iZllc l"""" - I""^)
YY Y0 YY y6 96 6Y YY
which confirir.s Holly's algebraic derivation of equation (6)
Under some circumstances, the limiting covarlance matrix in
equation (3.^) may be singular. Accordingly, we define the
m test statistic as
m = /T(9-9°)'[Var(Y-Y'')]"' /T(9-Y°)
since under H : 6 = 6°, /T(y-y'') converges to a random variable
having zero mean and under H-, the mean of the limiting distribution
is I~^I .6.
YY y6
In general, we assume a minimal structure for the information
matrix. In particular, in contrast to Holly's (1980a) equation
(10), we assume that rank(I „) = rank(I^ ) = min(p,q), so that
Yt) DY
all parameters provide Information useful for estimating any
- 23
-n
-I l""^! I""'" = t""^ + I~'^T FT -I I~'^I 1~"^I i"-^YY y6 99 QY-" YY YY Y9'- 99 "^Oy YY Y9-' Sy YY *
rank[Var(Y-Y )] = rank(I .) = mln(p,q) under our assumptions.
Thus
Proposition 3.1: Under H , m converges In distribution to a
X^ random variable with mln(p,q) degrees
of freedom.
An alternative hypothesis of some interest in the
context of specification tests Is that
H»: l"-"-! -6 = I"^I -9° <=> I"^I .6 = O YY Y9 YY Y9 YY Y9
I.e., that the asymptotic bias In y' - the estimator which
uses the Information that 9 = 9° - Is zero. Note from
equation (3. 4) that the limiting distribution of m under K*
is the same as that under H .
o
The Wald test of H* is based on the length
o
of the vector of unconstrained estimates
/TI'^I „(e-9°)
YY y9
which, from equation (3-3)j converges in probability to
T~ T Ft T t""^! i" T T~ =• ( s.'^ \
- 2i^
-Proposition 3-2 : For any (q,p), /T I -^I ^(e-e") converges
YY Yb in probability to /T(y-Y°).
Proof: Combining equations (3.1) and (3.2)*,
(3.6) /tCy->») J i;li^,6 -
[I^^-I^,i;^l,^r\ele6
!?(«")Using the identity
YY y6 66 6y YY YY Y9 66 Qy yY Y6 9y YY
the
—
terms in equations (3-5) and (3.6) are equal. For the9L
jr- terms, begin with the identity
'^ee~-'-eY"'"YY^Y9-"-'''e6"^eY''"YY^Ye-' ° -"-q
and premultlply both•r r- J sides byJ I~yyI „i7q to obtain upon rearranger.ent
YQ 9
9
'^ =
(3.7)
YY y6 66 6y YY Y6 YY Y6 99
YY y9 99^eY YY Y9'- 96 9y YY yS^
^^""^ 3.1:
i;^9Y\^Y9^^ee-^eYSYSe^"'
=^^6e-^6YS^Ye^''
"
'•9y"''YY^Y9^99
•
- 25
-'^^9e"^9Y^YY"^Y9-''^ee-^0Y"^YY'^Ye " ^eY^'^YY^Ye^ee'^^ee'-^eY'^YY'^Ye-'*
and the lemma follows by pre and post multiplying by
•^^ee'-^SY-'-YY'^Ye-'
*
Substituting the lemma into the second term in equation (3.7),.
we obtain
^YY^Ye'-^QQ'^QY'^YY^Ye-' " ^YY''"Ye-^ee
YY yq"- ee ^eY YY Ye-" -^9y yy Ye^ee
=
^^YY'^'''YY^Y9^^ee~''"eY'^YY'^Y9-' '•eY''"Yr ^Y9"^99
'
^^Y'Se^ge^BY^'^Se^ee.'
which establishes the proposition.
Since /Tl'^^I ^(9-9°) has the same limiting distribution
YY y9
as /T(y-y°), the m test statistic and the Wald test statistic
for H* have the same limiting distribution. Thus, asymptotically,
the m test is equivalent to a Wald, likelihood ratio (LR), and
Lagrange multiplier test of H* - the hypothesis that imposing
H :9=9° leaves the maximum likelihood estimator for y
asymptotically unbiased.
The relationship between the m test and the Wald test of
H is perfectly analogous to that discussed for the linear case
in sections 2.2 and 2.3- Briefly, assuming rank(I ^) = mln(p,q),
- 26
-I'^i^gCe-e") = iff (e-e") = o
for q >_ p, so that
Proposition 3' 3
- If q
L
P ^"^ rank(I ) = p, the m teststatistic has the same limiting distribution
as the LR (or Wald) test statistic for H .
Moreover, for q < p, the m test has fewer degrees of freedom and
a smaller noncentrallty parameter than the LR test of H . Neither
test for H dominates the other for all estimable functions of
o
(6-6°); there exist directions in which the power of the m
test equals Its size and there exist directions in which the n
test has strictly greater power than the LR test.
If the mls-specificatlon hypothesis H* is the correct
hypothesis, the m test is the correct test. The LR test
statistic for H has the wrong size for H* (when q < p), and
o o
the m test - being asymptotically equivalent to the LR test cf
H* - possesses the usual local power properties. Echoing the
o
conclusion of Section 2, when interest in H :e=6° derives fro.T the o
desire to impose this restriction when estimating y> the relevant
null hypothesis is H*:I~ I „(e-e'')= and the relevant test is
o YY Y^
the m test.
For (8-6°) in the null space of l""'"! „ and (6-6") in the
YY y9
column space of I~qI„ respectively.
- 27
-4 . Testing the legitimacy of Instruments
In this section, we derive specification tests of
overldentlfylng assumptions for a single structural equation.
Specifically, we develop a test of the hypothesis that certain
variables are uncorrelated with the structural disturbance term.
This hypothesis, as we shall show, includes both overldentlfylng
exclusion restrictions of the Cowles Commission type and
restrictions on the structural disturbance covarlance matrix.
Let
(4.1) y^ = y^8^ + Z^Yi + e^ = X^6^ + e^
be the first structural equation in a system of simultaneous
equations denoted
YB + Zr = e, cov(e) = Z.
As usual, assume there are g, endogenous variables Y, and k,
predetermined variables Z present in the first equation and
that we can use no coefficient restrictions from equations
other than the first. To identify and estimate the parameters
of this equation, we need g-i+k, instruments; and we may use (i)
the Ic predetermined variables Z, , (11) any other predetermined
variables which are correlated with Y , and (ill) any endogenous
variables y-,,...,y^ which are correlated with Y, but asymptotically
d It 1
uncorrelated with e, . The prior Information that certain variables
are uncorrelated with the disturbance in the first equation is
- 28
-test such information.
Let W denote a Txw matrix of observations on w instruments
which we maintain to be uncorrelated with e, under all
circum-stances. Moreover, we must assume that w >_ g-,+k so that
equation (^.1) is at least Just-identified. We need not,
however, include all of the Z, in W; if the exogeneity of a
particular Z, is in doubt, it may be tested, provided the
equation is at least just-identified under both the null and
alternative hypotheses.
Let W denote a Txw matrix of observations on w > w
instruments, which Include all the Instruments in W. Specifically,
we assume that the column space of W is a proper subspace of the
column space of W^ ; the difference in dimensions will be denoted
W--W = w* > 0, and a set of vectors spanning the column space of
W will be denoted [WiW*]. The orthocomplement of the column space of W In the column space of W, is a w* dimensional subspace
spanned by the columns of w"*" = Q,,W», where Q,, = I - W(W'W)~-^W'
w w
= ^ -
^w
A null hypothesis of some Interest is that the w* "extra"
instruments W* are asymptotically uncorrelated with the structui'al
disturbance
:
H : plim i W*'£ = ,
^ T-*- °°
against the alternative H : pllm ^ W* 'c / 0. This is irr.pcrlant
for tv;o reasons. If we treat the columns of W* as uncorrelaled
with c and they are not, the resulting Instrumental variables
- 29
-with e^ and it is not, the corresponding Instrumental variables
estimator will be inefficient in the following (presumably
well-known) sense. Let 6, and 6
J denote the
two stage least
squares (2SLS) estimates of 5, using W and W, as Instruments
respectively.
Proposition 4.1; If pllra
^
We,
= 0, the difference betweenthe covariance matrices of the limiting
distributions of 6, and 5| Is a non-negative
definite matrix.
Proof: The proof follows directly from the observation that
X'P,, X,-X'P,,X^ = X'P,,tX, is non-negative definite, where P.
1 W, 1 1 w 1 1 V.'' 1
° A
denotes the orthogonal projection operator onto the column
space of A.
The hypothesis H is somewhat unusual among tests of
assumptions used in simultaneous equations estimation. For
exogenous variables among the columns of W*, H tests for
exogeneity. For endogenous columns of W* (e.g., y. ) , H imposes the restriction that
(B"^Z)^^ =
which is a complicated combination of disturbance covariance and
coefficient restrictions. In general, for y, and e to be
uncorrelated, cov(e ,e ) must equal 0, equations (1,1) must
. be relatively triangular, and cov(e,,e,) must equal for all
- 30
-Hausman-Taylor (1980b) for definitions and details. If
W« = Y^ and W = Z^, we have the situation treated
\
by Wu (197J); for B triangular, we have the limited Information
version of Holly's (1980b) test for the diagonallty of I.
In the spirit of specification tests, we assume we are
not Interested in H directly. Rather, we wish to know the
consequences for estimating 6, of imposing H ; this is given
by the length of the vector
= Cy^.
Note that under H , q converges to a normal random variable with
mean and variance a^
elWl
[ (X'F,,X, )~^ - (X'P,, X^)"^]; under H, , theIW,
1 1mean vector is no longer 0. Thus clgnifleant deviations of q
from the zero vector cast doubt upon H .
o
Using results in Section 2, it is easy to verify that
Proposition ^.2 : If rank(C) = c, then under H
m = ir Q'[CC']"^q =
^
q' [ (X^P^X^)"^ - (Xj_P^^, X^)-l]''qe e 1
is asymptotically distributed as x^ with c
degrees of freedom, where 6^ is any consistent
estimate of a^
.
e
- 31
-Proposition ^.3: Rank (C) = c = min [rank(X'Q
) ,w*]
Proof: Repeating steps of the proof of Proposition 2.3, we
obtain
Since T-k^-g^ > It^."*"^!* ^^"^ ^^^ "^ ^^"^
^^l^Wt^' where
The result then follows by noting that
X'P^W*
= X'Q .Since some columns of X, may be columns of W in some
applications, rank (X'Q^) may be strictly less than k,+g,.
H*: plim (X|P^
^i^'^iV^l
" °This will happen whenever we accept the exogeneity of an I
explanatory variable in equation (4.1) rather than subject it
to test.
As in Sections 2 and 3, consider the null hypothesis
\
<=> plim (Xj^P^ X^)"^Xj_Q^W»(W*'Q^W»)"^W*'e^ = 0,
which states that the asymptotic bias in the 2SLS estimator for
6, is zero when the columns of W* are used as instruments in
addition to those of W. As before, the null hypothesis H
restricts all linear functions of the w* vector plim ="
W*'e-to be zero, whereas H* restricts only a subset of those functions,
- 32
-rank (X'Q ) < w*, they differ. Thus particular tests (e.g.,
Wald or LR tests) of H and H« will be Identical If rank (X'Q,J
>_ w* and will differ otherwise, as we observed for the linear
model in Proposition 2.5.
Moreover, in the limited information framework, we can
relate the m test in Proposition ^.2 to the familiar trinity
of asymptotically equivalent tests of H*. We argued elsewhere
(Hausman-Taylor (1980a), section 4.2) that the 2SLS estimator
for 6-, in equation 4.1 is asymptotically equivalent to the
full information maximum likelihood estimator (FIML) for 6-.
in the system
(4.3)
y-, = Y^6, -H Z^y^ + e^
Y^ = zn + V
where the correlations between the columns of V and e^ are
unrestricted and all instruments are columns of Z. If we
do not impose the restrictions H* , the FIML estimate of 6,
o* 1
is 6-, and the Wald test of H* is based on the length of the
1 o
vector
= - q
- 33
-2SLS residuals, we have
Proposition ^.3- The m test statistic Is identical to the
Wald test statistic for the null hypothesis
H*.
o
Asymptotically, then, the m test for the legitimacy of
instruments is equivalent to a Lagrange multiplier or LR
test of the mis-specification hypothesis H* . Since W and W*
can be chosen arbitrarily, there are a number of interesting
special cases of this test, involving both coefficient restrictions
of the Cowles Commission type and disturbance variance and
covariance restrictions of the type discussed by Fisher (1966),
chapters 3 and ^. We discuss these applications elsewhere
(Hausman- Taylor (1980b)); the point developed here is the
same as that of Sections 2 and 3 - that the m test is asymptotically
equivalent to the usual tests of the mis-specification hypothesis
that imposing H causes no asymptotic bias in the maximum
R-1
REFERENCES
Eaton, M.L. , Multivariate Statistical Analysis , Copenhagen:
Institute of Mathematical Statistics, University of
Copenhagen, 1972.
Fisher, F.M., The Identification Problem In Econometrics , New
York: McGraw-Hill, I966.
Harter, H.L., and D.B. Owen (eds). Selected Tables in
Mathematical Statistics , Chicago: Markham, 1970.
Hausman, J .A. ,"Specification Tests In Econometrics,"
Econometrica kS (1978), pp. 1251-1272.
Hausman, J.A., and W.E. Taylor, "Panel Data and Unoboervable Individual Effects," M.I.T. Discussion Paper, (1980a).
, "Identification, Estimation, and Testing in Simultaneous Equations Models with Disturbance Covariance Restrictions," unpublished manuscript, (1980b).
Haynam, G.E., Govindaraj ulu, Z., and Leone, F.C., Tables of the
Cumulative Ncn-Central Chi-Square Distribution, Case Statistical Laboratory Publication No. 104, I962.
Holly, A., "A Remark on Hausman's Specification Test," Harvard
Institute of Economic Research, Discussion Paper No. 763j
(198Qa).
- , "Testing Recursiveness in a Triangular Simultaneous Equations Model", unpublished manuscript, (1980b).
Pearson, E.3., and H.O. Hartley, "Charts of the Power Function
of the Analysis of Variance Tests Derived from the
Non-Central F Distribution," Blometrlka 38 (1951), PP- 112-130
Rao, C.R., and S.K. Mitra, Generalized Inverse of Matrices and
its Applications, New York: John Wiley & Sons, 1971.
Scheffe, H., The Analysis of Variance, New York: John Wiley &
Sons, 1959.
Smith, A.F.ri., "A General Bayesian Linear Model," Journal of the
Royal I'tatistical Society , Series B, 35 (1973), PP. 67-75.
Wu, D., "Alternative Tests of Independence Between Stochastic Repressors and Disturbances," Econometrica 41
(^Hau^
DateDue
FEB. 2'7 199'^ ifinjj 17 '6S APR 03 1^92 I5
1@S2 Lib-26-67ACME
bookbino;ng co., inc.
SEP
15
1983'igo cambridge street cha;?lestown, mass.
auiu
r
3 TDflD DDM
ms
TIE4^3 TDSD C
MIT LIDRAI11E5 OUPl
X'^<
1
MmS
TED 2.L.1 3 TDfiD DDM M7fl bDS MilLii'.««r;iES 3-1 I 3 TD60 DDM MMb fiSTj. 315 -i-lU 3 TDflD DD4 4Ht. fl7SMIT LIBRARIES DUPL
3 TDflD DDM M Mt. flfi3
r MIT LIBRARIES