Comparing specification tests and classical tests

(1)

(2)

(3)

Digitized

by

the

Internet

2011

with

funding

from

IVIIT

Libraries

(4)

(5)

working paper

department

of

economics

¥m^'^'hm-COMPARING SPECIFICATION TESTS AND CLASSICAL TESTS

Jerry A, Hausman William E. Taylor* Number 266 August 1980

massachusetts

institute

of

technology

50 memorial

drive

Cambridge, mass.

02139

(6)

(7)

First Draft

Comments Welcome

COMPARING SPECIFICATION TESTS AND CLASSICAL TESTS

Jerry A. Hausman

William E. Taylor*

Number 266 August 1980

MIT and Bell Labs, respectively. Hausman

(8)

Dewey

(9)

COMPARIMa SPECIFICATION TESTS AND CLASSICAL TESTS

by

Jerry A. Hausman

Massachusetts Institute of Technology Cambridge, Massachusetts 02139

and

William E. Taylor

Bell Telephone Laboratories, Inc.

Murray Hill, New Jersey 0797^

ABSTRACT

A parametric restriction is often interesting because

it makes pos^iible simplifications or Improvements in estimators

for the parameters of primary interest. In such cases, a

specification test examines the effect of imposing the

restrictions on the estimator, whereas classical tests exar.ine

the restrictions themselves in light of the data. In some

circumstances, this leads to discrepancies in large sample

behavior between (i) specification tests and (ii) likelihood

ratio, Wald, and Lagrange multiplier tests. We examine this

distinction in three cases of recent interest: exclusion

restrictions in a simple linear model, parametric restrictions

in a general non-linear implicit model, and exogeneity restrictions

(10)

(11)

1. Introduction

A tenet of large sample statistical theory is the

sufficiency of the trinity of tests: that any reasonable

test of a statistical hypothesis is at least asymptotically

equivalent to a likelihood ratio, Wald, or Lagrange multiplier

test. Recently, a class of mis-specification tests was

introduced (Hausman (1978)) which makes use of the difference

between parameter estimates which impose and do not impose

a null hypothesis; and some speculation has ensued concerning

the relationships among these tests and the trinity. Holly

(1980a,b) in particular has compared the specification test

of (i) parametric restrictions with nuisance parameters, and

(ii) exogenelty in a triangular simultaneous equations system

with the conventional tests and has found - in some

circumstances - _significant _differences _in large sample

behavior. In this paper, we explain these discrepancies in

terms which shed some light on the hypothesis that the

specification test is actually testing.

A parametric restriction is often interesting because

it enables us to simplify or improve our estimator for the

parameters of primary interest. Thus an uninteresting variable

(12)

(13)

- 2

-treated as endogenous solely to ensure consistent estimates

of the parameters of Interest. In both cases, the restriction

Involved may be tested, but the cost of a type I or II error

depends upon the effect of that error on the point estimates

of the parameters that matter. Thus excluding a variable

from a linear regression is costly only to the extent that

estimates of the remaining slope coefficients change, given

their sampling errors. Similarly, the cost to treating a

variable as predetermined depends upon the difference that

restriction makes in the point estimates Involved.

The specification test is based upon the difference in

the point estimates caused by Imposing the restrictions. In

the cases above, imposing the restrictions gains a little

efficiency (if true) but sacrifices consistency if false.

Thus the specification test detects departures from the

restrictions in an appropriate norm, and this is shown below

to characterize the difference between specification tests

and conventional tests. Indeed, In the cases considered

below, we are able to show that the specification test of a

set of restrictions is identical to a conventional test

of the specification hypothesis that Imposing the restrictions

does not affect the point estimates. Being asymptotically

(14)

- 3

-familiar optimal local power properties, so that In

circumstances in which the specification hypothesis is

the relevant hypothesis, the specification test is probably

the test of choice.

In essence, these results point out the importance

of carefully specifying the null hypothesis one wishes to

test. In the linear regression model in section 2, we show

that the common practice of omitting variables whose

estimated coefficients fall an F test is often based on a

test whose nominal size is smaller than its true size. In

such cases, the specification test is uniformly most powerful

among invariant tests of the specification hypothesis and

clearly dominates the F test. These and other results are

extended in section 3 to explain anomalies in the specification

test in non-linear models outlined in Holly (1980a), These principles are applied to exogeneity tests in simultaneous

equations systems (section 4). This generalizes and

explains the difference between the specification test and

the Lagrange multiplier test for recursiveness in

(15)

_ _n _

2

.

Specification tests In the linear model

2.1. The model and the m test

The basic idea discussed in the introduction can be

demonstrated simply by specification tests involving linear

homogeneous restrictions in the linear model. Suppose

(2.1)' Y = X^6-j_ + X2B2 + e

where 3,, 6- are k jk- vectors of unknown parameters (k +kp=k)

and e is a T vector of Gauss-Markov disturbances. The coefficients

3, are presumed to be of primary interest, and the columns of

X„ are Included in equation (2.1) solely to avoid specification

error in our estimates of 6,. For a practical example, interpret

equation (2.1) as a demand function for the services of a

regulated public utility in which the scalar X, represents the

service price. In a rate hearing, point estimates of the price

elasticity are required in order to set prices in the next period

to achieve an assigned rate of return. Effects of changes in other

prices and income (the columns of Xp) are of interest only insofar

as they influence our estimate of 6-, •

(16)

- ₅

-against the alternative

'\j

In this case, though, there Is some suggestion that certain

functions of _0p are more important than others. This point will

be developed at length below.

An alternative interpretation of this problem can be

set in the context of limited information simultaneous equations

estimation which we discuss in Section 4. Consider

(2.2) Y = _X^e^ + [X^Sg + e] = _X^e^ + e

where plim ;=rXIe = (1=1,2). We are concerned with estimating

^, and must determine if

plim ^X'e = _{plim ix'X} 6 =

in order that least squares estimates of B-, be consistent. Again,

one typically examines H : Bp = but recognizes that estimable

functions of the form A'Bp = C'X'X Sp are of particular interest.

Following Hausman (1978), both interpretations of the

problem lead to the same specification test. For the linear model

case, the specification test involves the difference between the

least squares estimates of B-, including and excluding Xp from the

(17)

- 6

-q = _{(Xj^Q2X^)"^Xj^Q2Y} - (Xj^X^)~^Xj^Y

where _Q^ - I.^, - X^(Xp^)"-'-Xj^ = I^ - P^ for 1=1,2. For the

simultaneous equations Interpretation, we compare the two

stage least squares (2SLS) estimates of B, In equation (2.2)

using _Q^ as instruments with those using W = _[X, _!Qp] _as

Instruments. Since the columns of XiQp are uncorrelated with

c under both H and H^ , this comparison represents a test of

the hypothesis that pllm mX'e = 0. This specification test is

based upon

{Xp^X^)-\Q^Y

- (X'P^X^)-^X^P^Y

= (Xj^Q2X^)"^Xj_Q2Y - _(Xj^X^)~^Xp = _q

where P^ represents the orthogonal projection operator onto the

column space of W and P,,X- = X^ .

W 1 1

Under H , E(q|X^,Xp) = so that we reject the null hypothesis

a.

if the length of q differs from by more than sampling error.

Hence the specification test statistic Is

m = q' [Var(q) J q

where [] denotes the Moore-Penrose generalized Inverse of [].

This represents a modification of the procedure proposed by

All results in this paper hold for any consistently defined generalized inverse.

(18)

- 7

-2

Hausman (1978) to allow for the possible singularity of the

matrix

Var(q) =

aH

(Xj_Q2X^)~^ - (Xj_X^)"^]

where Var(e) = _a^l„. _If _we _write

q = _{[(Xj_Q2X^)"^Xj_Q2-(X£X^)"^Xj_]Y} = DY,

note that Var(q) = a^DD'.

Lemma 2.1: (Eaton (1972) Proposition 3.19, p. 3.10). Suppose

Y '^ _N(y,Z) _where

u is an element of the range space

of Z which is possibly singular. If A is symmetric

and Z = DD'y then Y'AY is distributed as non-central x

with degrees of freedom = _rank(D'AD) and non-centrality

parameter = y'Ay if and only if D'AD is idempotent

.

Proposition 2.1: If e is normally distributed, then under H

, '^^^ '^ xi where d = rank(D). Proof: Under H , q = DY = De, so that o a^m = e'D'(DD')'^De.

The proof follows from the lemma since D*(DD*) D is idempotent

for any generalized inverse ( ) .

The assumption of normality can be relaxed; and assuming that

=rX'X, and ifXpXp converge to non-singular matrices, we can derive

(19)

8

-Proposition 2.2: Under the 'null hypothesis, o^m converges In

distribution to a _x^ random variable.

Proof: Writing

m = /Tq'[TVar(q)]"^/Tq,

/v d «

observe that /Tq -^ N(0,o DD'), where _in _an abuse of notation

o^DD' denotes lim TVar(q) = _{lim[ (^X'Q„X^} )~^ - (^X'X, )"'] o^.

TVoo TU-oo J- -L ^ -1- I ₁ X

The proof follows from the lemma again, since D'(DD') D is

idempotent

.

Using the lemma, it follows immediately that

Corollary 2.1: Under H^ , a^m is distributed as non-central _x^ with non-centrallty parameter

\

= 6^X^X^(Xp^)"^[(X-|_Q2X^)"^-(Xj_X^)~^]"^(Xp[^)"^X|X2B2

either asymptotically, or - assuming normality - _in

finite samples.

The constant o^ can be removed from the above propositions. For

Proposition 2.2, any consistent estimator can be substituted for

- 2 1

a . For Proposition 2.1, let s denote times the sum of

T-k

squared least squares residuals from equation (2.1).

Corollary 2.2 : Assuming e is normally distributed,

mi = 5!rk

[q'(DD')Vs^]

'\' F(d,T-k)

under H .

o

Proof: Let Q = I^-X(X'X)~-'-X ' where X = [X :X2]. Then s^ Is

(20)

- 9

-H . Since D' lies In the column space of X, QD'(DD')'^D = _0;

2

3 and m are thus independent by Proposition 3.30 (page 3.15)

of Eaton (1972).

In general, we will assume that the columns of X^ and

Xp are not orthogonal - in particular, that X'X^ has full

rank, so that rank (X'X_) = _rank (X'X, ) = min (k.jkp). Moreover,

we will assume we have sufficient observations to estimate all

k parameters; thus T-k. > k. and X!Q X. Is non-singular for

i7^j=l,2. Under these assumptions.

Proposition 2.3: The degrees of freedom for the m test are

given by d = rai

Proof: From the definition of D,

given by d = _rank _(D) = mln(k ,k ) . -XjX^D = X^ - (Xp^)(Xj^Q2X^)-4'Q2. Since P2+Q2 = I^, -X^X^D = X^P^ + [I - _{(X'X^)(X'Q2X^)-1]X|Q2} = X^P^ -H CX'Q^X^

-XiX^](XiQ2X^)-\Q2

= X£P2 - CXj_P2X^](Xj_Q2X^)-^Xj[Q2. Thus -D = (X'X^)-lx'P2Cl - X^(X^Q2X^)-4j_Q2]

so that d < min [rank(Xj^X^)~^Xj_P2, rank(I-X^(X^Q2X^)"^Xj_Q2₎ ] .

Since rank(Xj_X2) = min(k^,k2) and I-X^(Xj^Q2X^)"^Xj^Q2 is

(21)

- 10

-observe that DP^ «= (X'X.)" X'Pp whose rank equals _inin(k^,k

); thus

rank (D) >_ mlnCkj^.k^), which. In conjunction with d _< mln(k ,k _)^ completes the proof.

2.2. Comparinr, the m and F tests

For the null hypothesis H : = 0, the F test is based on

the length of the least squares estimate of 6- In equation (2.1):

(2.3) F = _{t^lY&riQ^)r-^B^} = Y'Q^^X^ (X^Q^X2)"-'-X^Q^Y/a^

From now on, for convenience, we will assume that a^ Is known

to be 1, so that all relevent tests are x^ tests. Since s Is

Independent of both m and F, this simplification will not affect

our results. Note that under H _, F Is distributed as v,^ and that

o '^k.2

under H, , it is distributed as non-central _x^ with non-centrality

parameter A^ = Q^X^Q^X^B^.

In the context of specification tests, a related null

hypothesis of some interest is

H»: (X'Xt)"-^X'X_6^ =

O

11 12^

with corresponding alternative H* : (X^X^)"-'-Xj^X262 ?^ 0. Note that

H* represents the hypothesis that the bias in the least squares estimate of B-, when 6„ is omitted is zero. In some circumstances,

the potential distinction between H and H* will be quite

important. Heuristlcally, H is a set of restrictions on all

estimable functions of 6^, whereas H* restricts only a subset of

2* _o

(22)

- 11

-Proposition 2.^ : The restrictions represented by H and

H» are identical if and only if k, > k..

o

1—2

Proof: H^: i^ = always Implies H»: (X'X )"-'-X'X282 = 0, so we

need to verify only the reverse Implication. If k^

^

^p» ^hen ^2^1^2 ^^ non-singular. Premultiplying H* by (X'P X )"-'-X'X

yields

(X^P^X2)"-'-X^X^*(X|X^)"-'-Xj_X262 = =^ Sj = •

If k,< kp, the null space of X|X„ is non-empty, so that

X'XpSp = does not imply that 3^ = 0.

The relationship between H and H* is reflected in the

corresponding F tests of H and H*. Let F* denote the length of

the least squares estimate of (X'X )~ _X'X-BpS _i.e.,

F« = B^X^X^(Xj_X^)"^{VarC(Xp^)"^Xj_X2e2]}"^(Xj_X^)"^Xj_X202

(2.ii) = Y'Q^X2(X^Q^X2)"-'-X^X^[Xj_X2(X^Q^X2)"-'-X^X^]'^

X Xj^X2(X^Q^X2)"-'-X^Q^Y .

Under H*, F* is distributed as _x^ with degrees of freedom equal

parameter can be shown to be

to rank CX|X2(X^Q^X2)~ X^X^]; under H»,lts non-centrallty

(23)

- ₁₂

-To compare F and F*, two little-known (to us) matrix

Identities will be useful. Let A,B,C, and D be conformable

matrices with D non-singular (and square). Assume

C'A B + D is non-singular, and denote the column space

of a matrix by M( . )

.

Lemma 2.2: (Rao and Mitra (1971) pp. 70-71). If M(B)

C

M(A) ana /

M(C)CM(A),

then

[A+BDC]"^ = a"*" - A'^BEC'A'^B + D~-'-]"^C'A"*' .

A version of this result for non-singular A and BDC ' appears

in Smith ₍₁₉₇₃₎

•

Lemma 2.3 : (Rao and Mitra (1971) p. 22). If rank (ABC) = rank (B),

then ClABC]"'"a = B"*".

Botn of these lemmas hold with minor modifications for any

generalized inverse. From Lemma 2.2, we can easily derive a

result which is very useful in linear model manipulations.

Lemma 2.4 : In the notation of equation (2.1):

(Xp^X^)"^ = _(X^X^)"^ * (Xj^X^)"^XjXj[XjQ^Xj]~^XjX^(X^X^)"^

where i?^j=l,2.

Proof: Let A = XIX., B=C'=X:X., and D = -(X'.X.)"^. Since A is

1 1 1 J J J

non-singular, M(B) =

M(C')C

_M(A); _applying _Lenma _2.2 _with

some algebra concludes the proof. Note that if k > k.,

BDC

musr

(24)

- ₁₃

-Given the relationship between H and H* in Proposition

2.4, the following relationship between their associated

F tests is obvious:

Proposition 2.5 : If k > kp, P = F*.

Proof: From equation (2.4),

= _{Y'Q^X2(X^Q^X2)"^[(X^Q^X2)"^]"^(X^Q^X2)"^X^Q^Y}

by Lemma 2.3, since rank(Xj^X2(X^Q^X2)"-'-X^X^) = rank[ (X^q'^X2)"-^]

if and only if k >_ k . Thus if k _{>_} k

,

F» = Y'Q^X2(X^Q^X2)~-^X^Q^Y = F

from equation (2.3).

These tests are then related to the m test for

specification error by the following argument. F* and m can

be thought of as the length of two different estimates of the

bias in the least squares estimate of g. from omitting _Q^:

the actual bias is

B = (X£X^)-1X'X262

and the different estimates are

(2.5) _§p, = (X^X^)"^Xj^X262 = (Xj_X^)"^Xj^X2(X^Q^X2)"-'-X^Q^Y

and

(25)

1^

-Note that B can be regarded as an estimate of the bias since

m

Its expectation Is -(X|X )~-^X'X

Bp-Proposition 2.6 : For any k ,k„, P* = m.

Proof: F« = _6^,eCVar(Bp,^) ]'*'Bp, and m = B^[Var(B^)D'^Bj^, so

that F* = m If B_^ = - B , which we verify below. Using

equation (2.^) and Lemma 2.^, we can write

6p,f = (Xp^)"^Xp2Q-LY + (Xj_X^)~^Xj_P2X^(Xj^Q2X^)"^Xj_P2Q-[^Y.

Replacing P_ by I^-Qp yields

%*

= -(Xj_Q2X^)~^Xj_Q2Q^Y = (Xj_Q2X^)~^X^Q2P^Y - (Xj_Q2X^

)"^Xp2Y

= (Xp^)"^Xj_Y - (Xj_Q2X^)"^Xj^Q2Y

= - _B .

m

In summary, the specification test statistic m is equal

(for all k-, and k^) to the F test statistic F* for the hypothesis

that certain linear functions of _0p equal zero. If k-;

L

_^p

'

both m and F* are equal to the F test statistic F for the

hypothesis that all linear functions of 6- are zero. Since the

latter F test is in common use, it is interesting to compare

it with the m (and F*) test when k, < k and they are different.

2.3 F and m compared when k, < k^.

At the outset, we must specify which null hypothesis is

under consideration. Under H , F and m are distributed as o'

(26)

- 15

-respectively - _recalling that k, < kp in this section. Under

H * _however, _m _Is _distributed as central y* with k. degrees

o -*

of freedom but F has a non-central xh ^distribution with

non-cnetrality parameter Xp = QlXl^Q^X^Q^ >_ 0. Thus as a test

statistic for H*, P does not have the usual _xt distribution.

If we mistakenly use that distribution - i.e., if we test H

when we should test H* - we obtain a test whose nominal size

o

is smaller than its true size.

Considered, then, as a test of H , m has strictly fewer

degrees of freedom than F. However, under the alternative

hypothesis.

Proposition 2.7: The non-centrality parameter of the m test

is less than or equal to that of the F test;

i.e., when k^< k^, X^ < X^,, for all Bp.

Proof: From Corollary 2.1,

\

= S^X'X^(Xj_X^)-^C(Xj_Q2Xi)"^ - (X£X^)'^]^(x-x^)-^xj_X2e2

= e^X^[X^(Xj_P2X^)"^Xj - _^-^^2^2

using Lemma 2.4. Thus

^m = B^X'[P2X^(XiP2X^)-lxiP2 - F^^X^Q^ ^

since P^X (X|P2X )"-'-X'P2 Is idempotent,

8'X'(P2X^(X'P2X3_)-lxiP2)X2B2 ₁ 3^X^X282

for all 62' Thus X^^ <B^X2(I-P^)X282 =

^2^2V2^2

'^

(27)

- 16

-Thus m has fewer degrees of freedom and smaller non-centrality

than F for the hypothesis H : ₆₂ = 0. To compare m and F,

recall that the power of a x^ test of fixed size (a) increases

with the non-centrality parameter for fixed degrees of freedom, 2 and (b) decreases with the degrees of freedom for fixed X. The

m test will thus be relatively more powerful when k, is much

smaller than k^ and X is close to A„. In general, the relative

2 m F ^ '

power of the tests depends upon the trade-off between degrees

of freedom and non-centrality; this can be calculated numerically

from the tables of the non-central _x^ distribution but nothing

much can be said analytically.

Recognizing that the m test does not treat all estimable

functions of 3- symmetrically, one is led to calculate the

direction in which the m test has greatest power. _Without loss

of generality, we restrict our attention to _6p of unit length:

Proposition 2.8: X is maximized over 6„ whenever 6- lies in

m 2 ^

the column space of (X^X„)~ X^X^; i.e., for

$2 of the form &^ = iX^X^)~-^X^X^£, for any k-j^

vector i.

This can be inferred from the Pearson and Hartley (1951)

charts of the power of the F test, which are reproduced in

Scheffe (1959), pp. >^3S-^^5. A particularly convenient set of

tables of the distribution of a non-central x^ variate is

Haynan, GovindaraJ ulu, and Leone (1962), part of which is

(28)

- 17

-Proof: Making the substitution B^ = (X^X2)"'''X^X^C in

Corollary 2.1, we obtain

^m ' C'Xj_P2[I-Pi]P2XiC = S^X^Q3_X2S2 = A^

and the result follows from Proposition 2.7. It may be of some

interest to note that this direction of maximum power is

precisely the direction in which the least squares estimate

of S>2 Is biased when X, is omitted from equation (2.1).

For the null hypothesis H against the alternative H

there are thus estimable functions of _gp against which the m

test has strictly greater power than the F test. On the other

hand, there are functions of _3p against which the F test is

more powerful."^ In light of the optimum properties of the P

test, this ambiguity should not appear surprising. The P

test is _{uniformly most powerful among invariant tests of}

H : 3n = 0: the m test is not invariant for H since it depends

o d ~ o

upon the covariance between X, and

X-.

There are some alternatives against which the m test of H

performs particularly poorly. For k, < k-, the null space of

X'X- is non-empty; for Q lying in that space, the power

function of the m test is flat with power equal to the size of

the test. As this persists in large samples, we conclude that

(29)

- ₁₈

-Proposition 2.9 : For k, < kp, the m test is Inconsistent for

H against H^

.

Of course, f:'om the viewpoint of testing for mls-speclflcatlon

in the estimation of 6,, this inconslstancy is irrelevant, since

alternatives in the null space of X'X- do not contribute to

the bias of the least squares estimate of 6-. . If, however,

H : 6p = is of Interest in its own right, then Proposition

2.9 is a serious Indictment of the m test for H .

o

For the linear model, this comparison of the P and m tests

clearly depends upon which null hypothesis is being considered.

From the viewpoint of specification tests, the relevant null

hypothesis is H* : (X'X^)~-'-X'X 6 = 0; the m test is precisely

equivalent to the F test for this hypothesis - and thus is

equivalent to the likelihood ratio, Wald, and Lagrange multiplier

tests for H* . H*, in turn, is equivalent to H for k, > k ,

o o '

o

1—2

so in this case, the m and F tests for H are equivalent. For

o

k,

12'

< k„, the m and F tests for H differ. The m test has smaller

o

degrees of freedom but also a smaller noncentrallty parameter than

the F test for H ; thus neither test is more powerful than the

other for all alternatives H, .

If interest in the hypothesis H : _Q^ = centers around

(30)

- 19

-relevant null hypothesis Is the mis-specification hypothesis

H*. The m test is uniformly most powerful invariant for this

o

hypothesis, whereas the F statistic for H defines a test for

H* of the wrong size, despite being UMP invariant for H . For

the mis-specificatlon hypothesis, the m test is clearly

(31)

20

-3. The genor.il non-linear model

The characterization of the m test for linear models

in the previous section extends easily to a non-linear

framework. Using a model and some results from Holly (1980^),

we establish that the m test is asymptotically equivalent to

the likelihood ratio, Wald and Lagrange multiplier tests

of the hypothesis analogous to H* in the previous section; i.e.,

that the asymptotic bias in maximum likelihood estimates of a

subset of parameters is zero when the remaining parameters

are constrained.

Following Holly(1980a), consider a family of models having

log-likelihood L(e,Y) for T observations, where (6,y) are (p,q)

vectors of unxnown parameters respectively. A null hypothesis of

interest is

H : e = e'

o

against a sequence of local alternatives

H^: e = e^ = e° + 6//T.

Deviating from Holly, we assume the framework of a specification

test: that we are primarily interested in estimating y and

are concerned about H only insofar as it affects that estimation .

o

(32)

- 21

-and not Imposing H . For large T, these estimators are the

solution of

|^(S) '

respectively, where 6' = (0'*.y')' and the true parameter vector

is 6"' = _{(e°':Y°')'.}

Under suitable regularity conditions. Holly (1980a) shows

that

(3.1) /t(9-y") '

i\,-^,,^',\\,yh-i^,^i\^ji^^')

_-tIIt^^")^'

(3.2) /T(y°-Y°) ^ -I'-'-I _fi6 + I"^ _-^ 1^(6°),

YY y6 yY *'T _3y

and using his equations (3) and (4), one can show that

Note that 1 3*L - pli:n T 36^6^(6°) = Y9 yY

and that sufficient regularity is assumed so that

1 a^L

(33)

- 22

-converges almost surely to the Information matrix as 6 -» _6°;

see Holly (1980a) for details.

From equations (3.1) and (3.2), one can show that

/TCy-y") -^ N(0,[I _-I _.i~ii. ]~-^)

\ I

1 ' X » I.

YY y6 66 6y

and

/T(y°-y": - N(-I^^I^q6.I-^).

By the argument in Hausman (1978) ,

(3.^) /T(y-Y°)-^_^' ' N( I"^I _^6, [I -I .iZllc l"""" - I""^)

YY Y0 YY y6 96 6Y YY

which confirir.s Holly's algebraic derivation of equation (6)

Under some circumstances, the limiting covarlance matrix in

equation (3.^) may be singular. Accordingly, we define the

m test statistic as

m = /T(9-9°)'[Var(Y-Y'')]"' /T(9-Y°)

since under H : 6 = 6°, /T(y-y'') converges to a random variable

having zero mean and under H-, the mean of the limiting distribution

is I~^I .6.

YY y6

In general, we assume a minimal structure for the information

matrix. In particular, in contrast to Holly's (1980a) equation

(10), we assume that rank(I „) = _rank(I^ ) = min(p,q), so that

Yt) D_Y

all parameters provide Information useful for estimating any

(34)

- 23

-n

-I l""^! I""'" = t""^ + I~'^T FT -I I~'^I 1~"^I i"-^

YY y6 99 QY-" YY YY Y9'- 99 "^Oy YY Y9-' Sy YY *

rank[Var(Y-Y )] = rank(I .) = mln(p,q) under our assumptions.

Thus

Proposition 3.1: Under H _, m converges In distribution to a

X^ random variable with mln(p,q) degrees

of freedom.

An alternative hypothesis of some interest in the

context of specification tests Is that

H»: l"-"-! -6 = I"^I -9° <=> I"^I .6 = O _YY _Y9 _YY _Y9 _YY _Y9

I.e., that the asymptotic bias In y' - the estimator which

uses the Information that 9 = 9° - Is zero. Note from

equation (3. 4) that the limiting distribution of m under K*

is the same as that under H .

o

The Wald test of H* is based on the length

o

of the vector of unconstrained estimates

/TI'^I „(e-9°)

YY y9

which, from equation (3-3)j converges in probability to

T~ T Ft T t""^! i" T T~ =• ( s.'^ \

(35)

- 2i^

-Proposition 3-2 : For any (q,p), /T I -^I ^(e-e") converges

YY Yb in probability to /T(y-Y°).

Proof: Combining equations (3.1) and (3.2)*,

(3.6) /tCy->») J i;li^,6 -

[I^^-I^,i;^l,^r\ele6

!?(«")

Using the identity

YY y6 66 6y YY YY Y9 66 Qy yY Y6 9y YY

the

—

terms in equations (3-5) and (3.6) are equal. For the

9L

jr- terms, begin with the identity

'^ee~-'-eY"'"YY^Y9-"-'''e6"^eY''"YY^Ye-' ° -"-q

and premultlply both•r r- J sides byJ I~_yyI „i7q to obtain upon rearranger.ent

YQ 9

9

'^ =

(3.7)

YY y6 66 6y YY Y6 YY Y6 99

YY y9 99^eY YY Y9'- 96 9y YY yS^

^^""^ 3.1:

i;^9Y\^Y9^^ee-^eYSYSe^"'

=

^^6e-^6YS^Ye^''

"

'•9y"''YY^Y9^99

•

(36)

- ₂₅

-'^^9e"^9Y^YY"^Y9-''êe-^0Y"^YY'^Ye " êY^'^YY^Yeêe'^êe'-êY'^YY'^Ye-'*

and the lemma follows by pre and post multiplying by

•^^ee'-^SY-'-YY'^Ye-'

*

Substituting the lemma into the second term in equation (3.7),.

we obtain

^YY^Ye'-^QQ'^QY'^YY^Ye-' " ^YY''"Ye-^ee

YY yq"- _ee _^eY _YY _Ye-" -^9y _yy _Ye^ee

=

^^YY'^'''YY^Y9^^ee~''"eY'^YY'^Y9-' '•eY''"Yr ^Y9"^99

'

_{^^Y'Se^ge^BY^'^Se^ee.'}

which establishes the proposition.

Since /Tl'^^I ^(9-9°) has the same limiting distribution

YY y9

as /T(y-y°), the m test statistic and the Wald test statistic

for H* have the same limiting distribution. Thus, asymptotically,

the m test is equivalent to a Wald, likelihood ratio (LR), and

Lagrange multiplier test of H* - _the _hypothesis that imposing

H :9=9° leaves the maximum likelihood estimator for y

asymptotically unbiased.

The relationship between the m test and the Wald test of

H is perfectly analogous to that discussed for the linear case

in sections 2.2 and 2.3- Briefly, assuming rank(I ^) = mln(p,q),

(37)

- 26

-I'^i^gCe-e") = _iff _(e-e") = o

for q >_ p, so that

Proposition 3' 3

- If q

L

P ^"^ rank(I ) = p, the m test

statistic has the same limiting distribution

as the LR (or Wald) test statistic for H .

Moreover, for q < _p, the m test has fewer degrees of freedom and

a smaller noncentrallty parameter than the LR test of H . Neither

test for H dominates the other for all estimable functions of

o

(6-6°); there exist directions in which the power of the m

test equals Its size and there exist directions in which the n

test has strictly greater power than the LR test.

If the mls-specificatlon hypothesis H* is the correct

hypothesis, the m test is the correct test. The LR test

statistic for H has the wrong size for H* (when q < _p), _and

o o

the m test - being asymptotically equivalent to the LR test cf

H* - possesses the usual local power properties. Echoing the

o

conclusion of Section 2, when interest in H :e=6° derives fro.T the o

desire to impose this restriction when estimating y> the relevant

null hypothesis is H*:I~ I „(e-e'')= and the relevant test is

o _YY _Y^

the m test.

For (8-6°) in the null space of l""'"! „ and (6-6") in the

YY y9

column space of I~qI„ respectively.

(38)

- 27

-4 . Testing the legitimacy of Instruments

In this section, we derive specification tests of

overldentlfylng assumptions for a single structural equation.

Specifically, we develop a test of the hypothesis that certain

variables are uncorrelated with the structural disturbance term.

This hypothesis, as we shall show, includes both overldentlfylng

exclusion restrictions of the Cowles Commission type and

restrictions on the structural disturbance covarlance matrix.

Let

(4.1) _y^ = _y^8^ + _Z^Yi + e^ = X^6^ + e^

be the first structural equation in a system of simultaneous

equations denoted

YB + Zr = e, cov(e) = Z.

As usual, assume there are g, endogenous variables Y, and k,

predetermined variables Z present in the first equation and

that we can use no coefficient restrictions from equations

other than the first. To identify and estimate the parameters

of this equation, we need g-i+k, instruments; and we may use (i)

the Ic predetermined variables Z, _, (11) any other predetermined

variables which are correlated with Y , and (ill) any endogenous

variables y-,,...,y^ which are correlated with Y, but asymptotically

d It 1

uncorrelated with e, . The prior Information that certain variables

are uncorrelated with the disturbance in the first equation is

(39)

- 28

-test such information.

Let W denote a Txw matrix of observations on w instruments

which we maintain to be uncorrelated with e, under all

circum-stances. Moreover, we must assume that w >_ g-,+k so that

equation (^.1) is at least Just-identified. We need not,

however, include all of the Z, in W; if the exogeneity of a

particular Z, is in doubt, it may be tested, provided the

equation is at least just-identified under both the null and

alternative hypotheses.

Let W denote a Txw matrix of observations on w > w

instruments, which Include all the Instruments in W. Specifically,

we assume that the column space of W is a proper subspace of the

column space of W^ ; the difference in dimensions will be denoted

W--W = w* > _0, _and _a _set _of _vectors _spanning _the _column _space _of

W will be denoted [WiW*]. The orthocomplement of the column space of W In the column space of W, is a w* dimensional subspace

spanned by the columns of w"*" = Q,,W», where _Q,, = I - W(W'W)~-^W'

w w

= ^ -

^w

A null hypothesis of some Interest is that the w* "extra"

instruments W* are asymptotically uncorrelated with the structui'al

disturbance

:

H : plim i W*'£ = ,

^ _T-*- °°

against the alternative H : pllm ^ W* 'c / 0. This is irr.pcrlant

for tv;o reasons. If we treat the columns of W* as uncorrelaled

with c and they are not, the resulting Instrumental variables

(40)

- 29

-with e^ and it is not, the corresponding Instrumental variables

estimator will be inefficient in the following (presumably

well-known) sense. Let 6, and 6

J denote the

two stage least

squares (2SLS) estimates of 5, using W and W, as Instruments

respectively.

Proposition 4.1; If pllra

^

We,

= 0, the difference between

the covariance matrices of the limiting

distributions of 6, and 5| Is a non-negative

definite matrix.

Proof: The proof follows directly from the observation that

X'P,, X,-X'P,,X^ = X'P,,tX, is non-negative definite, where P.

1 W, 1 1 w 1 1 V.'' 1

° _A

denotes the orthogonal projection operator onto the column

space of A.

The hypothesis H is somewhat unusual among tests of

assumptions used in simultaneous equations estimation. For

exogenous variables among the columns of W*, H tests for

exogeneity. For endogenous columns of W* (e.g., y. ) , H imposes the restriction that

(B"^Z)^^ =

which is a complicated combination of disturbance covariance and

coefficient restrictions. In general, for y, and e to be

uncorrelated, cov(e ,e ) must equal 0, equations (1,1) must

. be relatively triangular, and cov(e,,e,) must equal for all

(41)

- ₃₀

-Hausman-Taylor (1980b) for definitions and details. If

W« = Y^ and W = Z^, we have the situation treated

\

by Wu (197J); for B triangular, we have the limited Information

version of Holly's _(1980b) test for the diagonallty of I.

In the spirit of specification tests, we assume we are

not Interested in H directly. Rather, we wish to know the

consequences for estimating 6, of imposing H ; this is given

by the length of the vector

= _Cy^.

Note that under H , q converges to a normal random variable with

mean and variance a^

elWl

[ (X'F,,X, )~^ - (X'P,, X^)"^]; under H, , the

IW,

1 1

mean vector is no longer 0. Thus clgnifleant deviations of _q

from the zero vector cast doubt upon H .

o

Using results in Section 2, it is easy to verify that

Proposition ^.2 : If rank(C) = c, then under H

m = _ir Q'[CC']"^q =

^

q' _[ (X^P^X^)"^ - (Xj_P^^, X^)-l]''q

e e 1

is asymptotically distributed as x^ with c

degrees of freedom, where 6^ is any consistent

estimate of a^

.

e

(42)

- 31

-Proposition ^.3: Rank (C) = c = min [rank(X'Q

) ,w*]

Proof: Repeating steps of the proof of Proposition 2.3, we

obtain

Since T-k^-g^ > It^."*"^!* ^^"^ _^^^ "^ ^^"^

^^l^Wt^' where

The result then follows by noting that

X'P^W*

= _X'Q .

Since some columns of X, may be columns of W in some

applications, rank (X'Q^) may be strictly less than k,+g,.

H*: plim (X|P^

^i^'^iV^l

" °

This will happen whenever we accept the exogeneity of an I

explanatory variable in equation (4.1) rather than subject it

to test.

As in Sections 2 and 3, consider the null hypothesis

\

<=> plim (Xj^P^ X^)"^Xj_Q^W»(W*'Q^W»)"^W*'e^ = _0,

which states that the asymptotic bias in the 2SLS estimator for

6, is zero when the columns of W* are used as instruments in

addition to those of W. As before, the null hypothesis H

restricts all linear functions of the w* vector plim ="

W*'e-to be zero, whereas H* restricts only a subset of those functions,

(43)

- ₃₂

-rank (X'Q ) < w*, they differ. Thus particular tests (e.g.,

Wald or LR tests) of H and H« will be Identical If rank (X'Q,J

>_ w* and will differ otherwise, as we observed for the linear

model in Proposition 2.5.

Moreover, in the limited information framework, we can

relate the m test in Proposition ^.2 to the familiar trinity

of asymptotically equivalent tests of H*. We argued elsewhere

(Hausman-Taylor (1980a), section 4.2) that the 2SLS estimator

for 6-, in equation 4.1 is asymptotically equivalent to the

full information maximum likelihood estimator (FIML) for 6-.

in the system

(4.3)

y-, = Y^6, -H Z^y^ + e^

Y^ = zn + V

where the correlations between the columns of V and e^ are

unrestricted and all instruments are columns of Z. If we

do not impose the restrictions H* , the FIML estimate of 6,

o* 1

is 6-, and the Wald test of H* is based on the length of the

1 o

vector

= - _q

(44)

(45)

- ₃₃

-2SLS residuals, we have

Proposition ^.3- The m test statistic Is identical to the

Wald test statistic for the null hypothesis

H*.

o

Asymptotically, then, the m test for the legitimacy of

instruments is equivalent to a Lagrange multiplier or LR

test of the mis-specification hypothesis H* . Since W and W*

can be chosen arbitrarily, there are a number of interesting

special cases of this test, involving both coefficient restrictions

of the Cowles Commission type and disturbance variance and

covariance restrictions of the type discussed by Fisher (1966),

chapters 3 and ^. We discuss these applications elsewhere

(Hausman- Taylor (1980b)); the point developed here is the

same as that of Sections 2 and 3 - that the m test is asymptotically

equivalent to the usual tests of the mis-specification hypothesis

that imposing H causes no asymptotic bias in the maximum

(46)

(47)

R-1

REFERENCES

Eaton, M.L. , Multivariate Statistical Analysis , Copenhagen:

Institute of Mathematical Statistics, University of

Copenhagen, 1972.

Fisher, F.M., The Identification Problem In Econometrics , New

York: McGraw-Hill, I966.

Harter, H.L., and D.B. Owen (eds). Selected Tables in

Mathematical Statistics , Chicago: Markham, 1970.

Hausman, J .A. ,"Specification Tests In Econometrics,"

Econometrica kS _(1978), pp. 1251-1272.

Hausman, J.A., and W.E. Taylor, "Panel Data and Unoboervable Individual Effects," M.I.T. Discussion Paper, (1980a).

, "Identification, Estimation, and Testing in Simultaneous Equations Models with Disturbance Covariance Restrictions," unpublished manuscript, (1980b).

Haynam, G.E., Govindaraj ulu, Z., and Leone, F.C., Tables of the

Cumulative Ncn-Central Chi-Square Distribution, Case Statistical Laboratory Publication No. 104, I962.

Holly, A., "A Remark on Hausman's Specification Test," Harvard

Institute of Economic Research, Discussion Paper No. 763j

(198Qa).

- , "Testing Recursiveness in a Triangular Simultaneous Equations Model", unpublished manuscript, (1980b).

Pearson, E.3., and H.O. Hartley, "Charts of the Power Function

of the Analysis of Variance Tests Derived from the

Non-Central F Distribution," Blometrlka 38 (1951), PP- 112-130

Rao, C.R., and S.K. Mitra, Generalized Inverse of Matrices and

its Applications, New York: John Wiley & Sons, 1971.

Scheffe, H., The Analysis of Variance, New York: John Wiley &

Sons, 1959.

Smith, A.F.ri., "A General Bayesian Linear Model," Journal of the

Royal I'tatistical Society , Series B, 35 (1973), PP. 67-75.

Wu, D., "Alternative Tests of Independence Between Stochastic Repressors and Disturbances," Econometrica 41

(48)

(49)

(50)

(51)

(52)

(53)

(54)

(^Hau^

Date

Due

FEB. 2'7 199'^ ifinjj 17 '6S APR 03 1^92 I

5

1@S2 Lib-26-67

ACME

bookbino;ng co., inc.

SEP

15

1983'

igo cambridge street cha;?lestown, mass.

(55)

auiu

r

3 TDflD DDM

ms

TIE

4^3 TDSD C

MIT LIDRAI11E5 OUPl

X'^<

1

M

mS

TED 2.L.1 3 TDfiD DDM M7fl bDS MilLii'.««r;iES 3-1 I 3 TD60 DDM MMb fiSTj. 315 -i-lU 3 TDflD DD4 4Ht. fl7S

MIT LIBRARIES DUPL

3 TDflD DDM M Mt. flfi3

r MIT LIBRARIES

(56)

Comparing specification tests and classical tests

Digitized

by

the

Internet

Archive

2011

with

funding

from

Libraries

working paper

department

of

economics

massachusetts

institute

of

technology

50 memorial

drive

Cambridge, mass.

02139

{Xp^X^)-\Q^Y

aH

\

[q'(DD')Vs^]

-XiX^](XiQ2X^)-\Q2

11

12^

1—2

^

C

M(C)CM(A),

M(C')C

BDC

%*

)"^Xp2Y

L

\

^2^2V2^2

1—2

12'

i\,-^,,^',\\,yh-i^,^i\^ji^^')

-tIIt^^")^'

-n

[I^^-I^,i;^l,^r\ele6

—

i;^9Y\^Y9^^ee-^eYSYSe^"'

^^6e-^6YS^Ye^''

^^Y'Se^ge^BY^'^Se^ee.'

L

^w

^

We,

elWl

IW,

^

X'P^W*

^i^'^iV^l

(^Hau^

Due

5

ACME

15

ms

1

mS

_-tIIt^^")^'

_{^^Y'Se^ge^BY^'^Se^ee.'}