• Aucun résultat trouvé

MULTIPLE COMPARISON PROCEDURE THROUGH LATENT ROOTS

N/A
N/A
Protected

Academic year: 2022

Partager "MULTIPLE COMPARISON PROCEDURE THROUGH LATENT ROOTS"

Copied!
7
0
0

Texte intégral

(1)

MULTIPLE COMPARISON PROCEDURE THROUGH LATENT ROOTS

EMILIANA URSIANU and RADU URSIANU

The null hypothesisH0 testing the equality of several means of the general linear regression model (GLM) against the alternative that some means of normal pop- ulations are different is considered. Least squares estimates (l.s.e.) of parameters are removed by the modified l.s.e. using the latent roots of the correlation matrix of GLM when the independent variables matrix of the model is near singular. A special case of GLM is considered, namely, the analysis of variance (ANOVA).

Furthermore, ifH0 is rejected, methods can be obtained for making simultaneous tests on the means, i.e., the multiple comparison procedure (MCP) for means.

Two procedures with a fixed Type I error probability for testing hypotheses on all sets of the means (i.e., for all comparisons on means) are theT andS methods based upon the studentized range of means extended to all contrasts of means, respectively theF distribution of statistics corresponding to the ratio of the sum of squares due toH0, that is the residual sum of squares for all contrasts of means.

The relative efficiencyS-interval to that of theT-interval is obtained from the ra- tio of squared lengths of the corresponding confidence intervals. Another approach to this problem leads to the Rayleigh quotient of the l.s.e. using the eigenvalues (or latent values) of the design matrix.

AMS 2000 Subject Classification: 62G10, 62J05, 62J10.

Key words: general linear regression model with intercept, multiple comparison procedure, latent roots of the design matrix.

1. INTRODUCTION

The rejection of the null hypothesis when testing the equality of several means in the general linear model (GLM) leads to the multiple comparison procedure (MCP).

The relative efficiency of theS and T methods obtained from the ratio of squared lengths of the corresponding confidence intervals is studied in [6].

Another approach to this problem leads to the Rayleigh quotient of the least squares estimators using the eigenvalues of the design matrix [1].

The best approximation of the latent vectors and latent roots is obtained using the modified l.s. estimates.

MATH. REPORTS9(59),3 (2007), 319–325

(2)

2. RELATIVE EFFICIENCY OF TWO MCPs

The l.s. estimates (l.s.e.) for a fixed effects l.m. in the underlying as- sumptions

(2.1) Ω





y=+e E(ee) =σ2I E(e) = 0

are determined into the special case of Ω, i.e. the one-way balanced layout, namely

(2.2)

yij =βi+eij, i= 1, I , j = 1, J eij ∼N(0, σ2) are independent r.v. Let ψ =

i ciβi,

i ci = 0 be a contrast among means and R the ratio of the squared lengths of the (1−α)% simultaneous confidence intervals for this contrast given by Scheff´e’s S-method and Tukey’s T-method (see [7]), defined as

(2.3) R= L2S

L2T = 4(I 1)F1−α;I−1,I(J−1)

q21−α;I,I(J−1) ·f, where, forI fixed, we have

(2.4) f =

i c2i

i |ci|2 with

I i=1

ci= 0

and

(2.5) Q= 4(I1)F1−α;I−1,I(J−1)

q2I,I(J−1) .

Note. The ratio R >1 favours the T-method and vice versa. Moreover, Q= Q(I, ν = I(J 1);α) remains nearly constant as α varies between 0,01 and 0,10 and decreases very slightly asν increases.

When the T-method will have the maximum advantage over the S- method?

Theorem2.1. Let the bounds (2.6) Rmax= 2Q, Rmin=

1 n++ 1

n

Q

forRestablished in[2],where n+= card{ci|ci>0}; n= card{ci|ci <0}. Then the T-method will have the maximum advantage over the S-method

(3)

for [1,1]-contrast, i.e., for pairwise comparison and, conversely, for [n+, n]- comparisons, i.e.,“the higher” order contrasts.

Proof. In fact, the problem is to derive the bounds forf in (2.4) which are

(2.7) fmax= 1/2

(2.8) fmin= 1

4 1

n++ 1 n

. The absolute maximum is

fmax= 1 2.

The bounds can be also established by using the Birnbaum inequality (2.9)

1 2

I i=1

|ci| m

1

2 |ci|m, m1 form= 2 as mentioned in [2].

More about the connection between the algebraic inequalities and ANOVA models can be found in [4] and [6] and can be summarized as

(2.10) fmax= max f =

f(1, I 1), I 13 f(2, I 2), I 60

withn++n=I when R=Rmax.The absolut maximum for f is fmax= 1

2

for [1,1]-contrast when the T method is preferable to the S method, respec- tively,

(2.11) fmin= minf =

fI

2,I2

, I even, fI−1

2 ,I+12

, Iodd,

whenR=Rmin given the contrasts for which theS method is better than the T method.

3. EIGENVALUE APPROACH (ANOVA THROUGH LATENT ROOTS)

An alternative approach can be obtained through the latent roots theory, i.e., the eigenvalues approach to regression or ANOVA models.

There exist various straightforward applications in statistics of the latent roots theory.

(4)

The numerical methods to solve the linear equations system, i.e., the corresponding normal equations system, leads to the minimization of reziduals under some restrictions in (2.1), namely,

(3.1) X =Xy

(see [1]). This is the l.s. technique used in the statistical treatment of the sample data. The solution from (3.1) is

(3.2) βˆ= (XX)−1Xy

withE( ˆβj) =βj and Var ˆβj =cjjσ2, wherecjj is the jth diagonal element of (XX)−1. This gives

Theorem 3.1. If the correlation matrix of the standardized dependent and independent variables from(2.1)are near singular (i.e.,ANOVA model is ill-conditioned), then the latent roots and latent vectors can be used to obtain the modified l.s. estimators for (2.1).

Proof. The following stepwise procedures lead to the proof.

i) For the case of full-column rank the solution (3.2) is unique, see [3]

and [7].

ii) Consider now the one-way ANOVA when the matrix model is not full column rank as a particular form of (2.1), i.e., the one-way ANOVA model with “intercept”, general mean µand the fixed treatment effects α:

(3.3)





yij =µ+αi+eij, i= 1, I; j= 1, J {eij} independent r.v. distributedN(0, σ2) eij ∼N(0, σ2).

Using matrix notation, model (3.3) can be written as (3.3) y=µ1 ++e=

1...X

β+e=

1...X β0 α

+e,

wherey is an (n,1)-vector of observable random variable (for short r.v.), with n=IJ, µandα = (α1, . . . , αI) are unknown parameters, 1 is an (IJ,1)-vector of ones,Xis an (IJ, I)-matrix (of rankI) of known independent variables with values “0” or “1”,eis an (IJ,1) vector of unobservable uncorrelatedN

0, σ2 r.v.-errors.

The rank of the matrix model is less than (I+ 1), the number of param- eters. It follows that the normal equations have no unique solution, i.e.,XX is a singular matrix. The modified l.s. estimates of parameters are obtained using the latent rootsλj and the latent vectors γj defined by

(3.4) det(AA−λjI) =AA−λjI= 0, j= 0,1, . . . , J,

(5)

and

(AA−λjI)γj = 0, j = 0,1, . . . , J,

of the matrixA, i.e., the ((I + 1))-matrix withn=IJ (see [4]).

The matrix AA is the “correlation matrix” of standardized dependent and independent variables, with

(3.5) A= [y...X], wherey = (yi−y)/s withs2 = J

r=1(yi−y)2. The estimates of the parame- ters are

µ=y =

i

yi∗/I =

i

j

yij/J

/I =

i j

yij/IJ, j = 0, . . . , J.

The solution of (3.4) is

(3.6) λj = (j)j =

I i=1

Yiγ0j+

J r=1

Xirγrj 2

,

where γj = (γ0j γ1j· · ·γkj) are the corresponding latent vectors, i.e., of the standardized dependent and independent variables (see [10]).

The residual sum of squares and those corresponding to the linear para- metric function can be calculated using the modified l.s. estimators. The (1−α) level simultaneous confidence intervals for all contrasts ψ are those established for model (2.1). Using the matrix notation (2.1), ifX is full rank, settingy = (yi−y)/swith s2 = I

i=1(yi−y)2,we have

µ=y=

i

yi∗/I =

i

J j=1

yij/J

/I =

i j

yij/IJ, j = 0, J.

From the GLM balanced one-way we have the following result.

Theorem 3.2. For the ANOVA-model with intercept (3.3) in the gene- ral form

y= (1...X)β+e,β = (β0 :β) = (µ:αi), i= 1, I, if the null hypothesis concerning the true (general) mean

Hµ:µ= 0

and the null hypothesis concerning the factors fixed-effects Hα:αi= 0, i= 1, I

are true, then the F-procedures testing Hα and Hµ are summarized in Tables 1and 2 (see [7], [9]) below.

(6)

Table1

SS fixed-

effects

SSα=J

i (yi∗y)2 να=I1 SSα/(I1) =

=SSα Fα=SSα

SSe

reziduals SSe=

=

i j (yijyi∗)2 νe=I(J1) SSe/I(J1) =

=SSe Decision:

Reject Hαif Fα> Fα;υαe

Table2

SS general

mean (in- tercept)

SSµ=

=J

i yi∗

2 νµ= 1 SSµ=SSµ Fµ= SSµ

SSe

reziduals SSe=

=

i j (yijyi∗)2 νe=I(J1) SSe=

=SSe/I(J1) Decision:

Reject Hµif Fµ> Fα;1,υe

Comments. 1. The exact (1-α)-level simultaneous confidence interval re- quires the restriction

i αi = 0, or the use of modified l.s. estimation procedure for removing the effect of the near singularity.

2. An alternative technique in ANOVA models is the use of generalized inverse of the normal equations system. A shortcoming of this approach is that several alternatives for generalized inverses are proposed, i.e., they are not unique.

REFERENCES

[1] B. Dumitrescu, C. Popeea ¸si B. Jora, Metode de calcul numeric matriceal. Algoritmi fundamentali. Ed. All, Bucure¸sti, 1998.

[2] F. Gonzacenco, E. M˘arg˘aritescu and V. Vod˘a,Statistical consequences of some algebraic inequalities. Rev. Roumaine Math. Pures Appl.37(1992),10, 877–886.

[3] Y. Hochberg and A.C. Tamhane, Multiple Comparison Procedures. Wiley, New York, 1987.

[4] M. Iosifescu, Gh. Mihoc and R. Theodorescu,Teoria probabilit˘at¸ilor ¸si statistic˘a matem- atic˘a.Ed. Tehnic˘a, Bucure¸sti, 1966.

[5] E. M˘arg˘aritescu,Metode de comparat¸ie multipl˘a. Ed. Academiei, Bucure¸sti, 1981.

[6] E. M˘arg˘aritescu and Emiliana Ursianu,On the comparison of S and T methods in the analysis of variance. Rev. Roumaine Math. Pures Appl.22(1977),4, 525–535.

[7] H. Scheff´e,The Analysis of Variance. Wiley, New York, 1959.

[8] Emiliana Ursianu and R. Ursianu,Multiple comparison procedure through latent roots.

A 9-a Conferinta SPSR, Univ. Bucure¸sti-Academia Romˆan˘a, 28-29 aprilie 2006, p. 36.

(7)

[9] I. V˘aduva,Analiza dispersional˘a. Ed. Tehnic˘a, Bucure¸sti, 1970.

[10] J.T. Webster, R.F. Gunst and R.L. Mason, Latent root regression analysis. Techno- metrics16(1974), 513–522.

Received 7 January 2007 Romanian Academy

“Gheorghe Mihoc-Caius Iacob”

Institute for Mathematical Statistics and Applied Mathematics Calea 13 Septembrie nr. 13 050711 Bucharest 5, Romania

emiliana.ursianu@k.ro and

“POLITEHNICA” University of Bucharest Faculty of Automatics

Splaiul Independent¸ei nr. 313 Bucharest Romania

Références

Documents relatifs

Most of the methods that have been proposed for variable selection in linear regression deal with the case where the covariates are assumed to be nonrandom; for this case,

For the study of multiple roots of sparse polynomials and, in particular, to prove Theorem 2.6, it is not enough to dispose of a version of Theorem 2.2 with an explicit dependence

In the general statistical experiment model we propose a procedure for selecting an estimator from a given family of linear estimators.. We derive an upper bound on the risk of

Keywords: Discrete Stochastic Arithmetic, floating-point arithmetic, numerical vali- dation, rounding errors, polynomial, multiple roots, Newton’s method, modified New- ton’s

The classical Newton operator associated to the deflation system df l(f) of ε-rank n is named the singular Newton operator of the initial system f.. Rather than to compute the

The blue graph corresponds to the initial param- eters, the green one corresponds to the optimized parameters at the generation 2320 (the best of the c 4d criterion), the black dots

It should be noted that the application of the parameter identification approach based on Yule-Walker equations to determining the order and correlation parameters of AR models

In order to construct AR models of images with multiple roots of characteristic equations having given statistical characteristics, we perform an analysis of the data of the