Inferences on homogeneity of between-family components of variance and covariance among environments in balanced cross-classified designs

(1)

HAL Id: hal-00894022

https://hal.archives-ouvertes.fr/hal-00894022

Submitted on 1 Jan 1994

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Inferences on homogeneity of between-family

components of variance and covariance among

environments in balanced cross-classified designs

Jl Foulley, D Hébert, Rl Quaas

To cite this version:

(2)

Original

article

Inferences

on

homogeneity

of

between-family

components

of

variance and

covariance

_among

environments

in

balanced cross-classified

_designs

JL

_Foulley

D

Hébert

2 RL

_Quaas

₃

1

Institut National de la Recherche

_Agronomique,

Station de

_{Génétique Quantitative}

et

Appliquée,

Centre de Recherches de

_{Jouy-en-Josas,}

78352

_{Jouy-en-Josas Cedex;}

2 Domaine

Expérimental

Agronomie d’Auzeville,

Centre de Recherches de

_Toulouse,

BP

_27,

31326 Castanet Tolosan

_{Cedex, h}

_b

_ance;

3 Cornell

_{University, Department of Animal Science, Ithaca,}

NY

14853,

USA

(Received

4

_August

_1993;

_accepted

29 November

₁₉₉₃₎

Summary - Estimation and

_testing

of

homogeneity

of

_{between-family}

_componentsof variance and covariance among environments are

_investigated

for balanced cross-classified

designs.

The variance-covariance structure of the residuals is assumed to be

_diagonal

and heteroskedastic. The

testing

procedure

for

_homogeneity

of

_family

_componentsis based on the ratio of maximized

_{log-restricted}

likelihoods for the reduced

_(hypothesis

of

_homogeneity)

and saturated models. An

_{expectation-maximization}

_(EM)

_algorithm

is

_proposed

for

_calculating

restricted maximum likelihood

_(REML)

estimates of the residual and

_{between-family}

_componentsof variance and covariance. The EM formulae to

implement

this are iterative and use the classical

_analysis

of variance

(ANOVA)

_statistics,

ie the between- and

_{within-family}

sums of _squaresand

_{cross-products. They}

can be

_applied

both to the saturated and reduced models and _guaranteethe solutions to be in the parameter space. Procedures

presented

in this paper are illustrated with the

analysis

of 5

_vegetative

and

_reproductive

traits recorded in an

_experiment

on 20 full-sib families of

black medic

_(Medicago

_lupulina

_L)

tested in 3 environments.

_Application

_{to pure}maximum likelihood

_procedures,

extension to unbalanced

_designs

and

comparison

with

approaches

relying

on alternative models are also discussed.

genotype X environment interaction

_/

_{heteroskedasticity}

_/

expectation-maxi-mization

_/

restricted maximum likelihood

_/

likelihood ratio test

(3)

La

_procédure

de test

_{d’homogénéité}

des composantes

familiales

repose sur le _rapportdes vraisemblances restreintes maximisées sous les modèles réduit

_(hypothèse

_{d’homogénéité)}

et saturé. Un

_{algorithme d’espérance-maximisation}

_(EM)

est

proposé

pour calculer les

estimations du maximum de vraisemblance restreinte

_(REML)

des composantes résiduelles et

_familiales

de variance et de covariance. Les

_formules

EM à

appliquer

sont itératives et utilisent les

_statistiques

_classiques

de

_l’analyse

de variance

_(ANOVA),

c’est-à-dire les sommes de carrés et

_coproduits

inter- et

_{intrafamilles.}

Elles

s’appliquent

à la

fois

aux modèles réduit et saturé et garantissent

l’appartenance

des solutions à

_l’espace

des

paramètres.

Les méthodes

_présentées

dans cet article sont illustrées _par

_l’analyse

de 5 caractères

_végétatifs

et

_reproductifs

mesurés lors d’une

_expérience

_portantsur 20

_familles

de

pleins frères

testées dans 3 milieux chez la minette

_(Medicago

_lupulina

_L).

_{L’application}

au maximum de vraisemblance stricto _sensu,la

_{généralisation}

à des

_{dispositifs déséquilibrés}

ainsi _quela

_comparaison

à des

approches

reposant sur d’autres modèles sont

_également

discutées.

interaction _génotypex milieu

_/

hétéroscédasticité

_/

_{espérance-maximisation}

_/

maxi-mum de vraisemblance restreinte

_/

_rapportde vraisemblance

INTRODUCTION

There is a

_great

deal of interest

_today

in

_quantitative

and

applied genetics

in

heterogeneous

variances.

_Ignoring

such

_{heterogeneity,}

as is

_usually

_done,

_may

substantially

affect the

_reliability

of

_genetic

evaluation and thus reduce the

_efficiency

of selection

_(Hill,

_1984;

Visscher and

_Hill,

_1992).

There is concern not

_only

about

_{estimating dispersion}

_parameters

for hetero-skedastic

_models,

but also about

_testing

_hypotheses

for the real

_degree

of

hetero-geneity

which can be

expected

from

_experimental

results. In this

_respect,

Visscher

(1992)

investigated

the statistical _powerof the likelihood ratio test in balanced half-sib

_designs

for

_{detecting heterogeneity}

of

_phenotypic

variance and intra-class correlation between environments.

In that

_approach,

the

_(family)

correlation between environments

_(p)

is assumed to be

_equal

to

_1,

and

_{heterogeneity}

of

_{between-family}

_components

of covariance among environments in

only

due to

_scaling

of variances.

The aim of this paper is to extend that

_approach

to the case of true

genotype

by

environment interactions

_{(p #}

_1).

Our attention will be focused on:

i)

cross-classified balanced

_designs;

and

_ii)

the null

_hypothesis

_{involving homogeneity}

of

_{between-family}

_components

of variance and covariance between environments. This variance-covariance structure has been

_widely

used for

_{analyzing family}

data recorded in different

_{environments,}

in

_particular

due to its close link with a

2-factor classification model

_(ie

_family

and

_environment)

with interaction

_(Mallard

et

_{al, 1983; Foulley}

and

_Henderson,

_1989).

_Moreover,

even for balanced

_designs,

the estimation of the 2

_parameters

involved in this

_simple

structure via maximum likelihood

_procedures

has no

_analytical

solution in the

_general

case when no

assumption

is made about the residual variances. This motivated the

_proposal

made in this

_study

to use the

_{expectation-maximization}

_(EM)

_algorithm

_(Dempster

et

(4)

THEORY

Generalities

Let us assume that the records from the balanced cross-classified

layout family

(or

genotype)

x environment can be written as:

where _y2!!is the

_performance

of the kth progeny

(or individual) (k

=

1, 2, ... ,

n)

of the

_jth

family

(or

genotype) ( j

=

1, 2, ... ,

s)

evaluated in the ith environment

(i

=

1, 2, ... , p) ; b

ij

is the random effect of the

_{jth family}

in the ith

_environment,

assumed

_normally

distributed,

such that

_Var(b

_ij

₎

=

a

B

i,

Cov(bij,bi

’

j)

=

O’Bii

,,

for

i

_{! i’,}

and

_{Cov(bi!,bi!!!)}

= 0

for j # j’

and

_any

i and

i’;

and

e2!k is a residual

effect

_pertaining

to the kth progeny in the subclass

ij,

assumed

A!77D(0, <7!)

viz,

normally

and

_{independently}

distributed with mean zero and variance

_U2

_wi

Using

vector

_notation,

ie _y_jk =

{

Yijk

}, !!

=

Ig

il

,

bj =

{bij}

and ej k =

{e2!k}

for i =

1, 2, ... ,

p, the model

[1]

can

_{alternatively}

be written as:

where

_b

_j

_rv

_N(0,!3)

and _e_jk_rv

_N(0,E!),

with

E

B

=

{a-Bii’

}

’

standing

for the

(p

x

_p)

matrix of

_{between-family}

components

of variance and covariance between environments and

£w

=

Diag{

Q

w

i

} for

the

_(p

x

_p)

_diagonal

matrix of residual

components

of variance.

&dquo;

Actually,

this

approach

consists of

considering

the

_expression

of the trait in different environments

_{(i, i’)}

as that of 2

genetically

related traits with a coefficient

of correlation _pii,=

asi!!/!B!!s!&dquo;

(Falconer, 1952).

In a

_given

environment

_(i),

this

_1-way

linear model

generates

the classical ANOVA

statistics,

ie the

_{between-family}

_(SB!,

_B

_i

₎

and

within-family

(Swi, W

i

)

sums of _squaresand mean _squares,

respectively,

whose distributions are

propor-tional to

_chi-squares:

Due to the cross-classified structure of the

_design,

one also has to consider a sum

(S

B

..,)

and a mean

(BZi!)

between-family cross-product

for each

(ii’)

combination

(5)

If we let yj_ =

_{lyij. 1,}

_Y_..=

{Yi..},

then the matrix

S

B

=

{BBi&dquo;}

} with

elements from

_[3a]

and

_[4],

such that:

&dquo;&dquo;

has a Wishart

_{distribution,}

denoted

_W(r,

s -

1),

with

_parameters

_{(s — 1)}

and

r =

Eyv

₊

_nE

_B

_,

thus

generalizing

to a matrix of

_{between-family}

sums of _squares

and

_{cross-products}

the

_{(o,2 w}

_i

_{+naBi)X!s_1!}

distribution

_arising

in

_(3a!.

In the 1-dimensional _case,the set of

_SB!

and

_S

_w

_.

are

independent,

location

invariant sufficient statistics for

_a-

₁

and

_w

_i

_similarly

the matrices

_S

_B

_and

Syv

=

Diag{5w,} have

the _same

property

for

E

B

and

_{Eyv. Hence,}

one can write

the

_density

of

_[5]

as

Similarly,

Using

[6a]

and

_[6b]

in the

_expression

for the

_log

_likelihood,

where Ct is a constant. This leads to:

with W =

Diag{W

d

=

Syv/s(n - 1)

and

tr(.)

= the trace

_operator.

Notice that maximization of

_[7]

_yields

REML estimators of

_E

_B

and

_Eyj,

because the

_marginally

sufficient statistics

_S

_B

and

_Syir

are used in the

_{log-likelihood}

function. Under the saturated model

_{(a wi 2 7!}

₀

_{,2 Wi,}

and

_UBij

_{, 0}

_!8.!!.!!!

_{for any}

i, i’, i&dquo;

and

_i&dquo;’),

the

_partial

derivatives

with respect

to

_g

_o

_B

_and

_0’

₂

_wi

of minus twice the

_log

likelihood

_(-2L)

are:

(6)

8F/8aB;;,

is a

_(p

x

_p)

matrix

_{having n}

as the

_{(i, i’)}

element and 0

_elsewhere,

so that the

_equation

[8a]

= 0

_gives

f

= B.

Similarly,

8£w /

8 a

/

j

has 1 as the ith

diagonal

_element

and 0 elsewhere. Given that

E

w

and W are

diagonal

matrices and that

f

=

B,

the solutions to

equations

[8a]

= 0 and

[8b]

= 0 _are

provided

that B — W is

_positive

definite. The maximum of the

_{log-likelihood}

function is then

_(apart

from a

constant):

Otherwise,

REML estimates of

E

B

and

E

W

are no

_longer

identical to ANOVA

estimates and

_require

the use of another

algorithm

for their calculation

(see

Appendix

A).

The null

_hypothesis

consists of

_assuming

the

_homogeneity

of the

_{between-family}

components

of variance

_(’di,

_or

₂

₌

_CT

₂

₎

and covariance

_{(Vi #}

_i’,

a-Bii’ =

C

B

)

_as

postulated

in many

analyses

of genotype

by

environment

_experiments

_(Dickerson,

1962; Yamada, 1962;

Mallard et

_al,

_1983).

The

_{approach presented}

in this paper allows us to test this

_simplified

structure of

_E

_B

_against

Falconer’s saturated model for any structure of the residual variances. The nulle

hypothesis

(H

o

)

considered here can be written as:

where

_Ip

=

identity

matrix of order

p and

Jp

=

(p

x

p)

matrix of ones.

Under

H

o

,

REML estimation of

_E

_B

and

_Eyv

becomes much more

_complex.

Here

8F/8a

§

=

nIp

and ar/aC

B

=

n(J

P

-

Ip)

result in the

following equations:

were

lp = (p x 1)

vector of ones.

Since

_r-

₁

_-

_I

_-

_F-’BF-

₁

_,

the REML solution for the residual

components

([8b])

is no

_longer

Êw

= W and the

system

of

_equations

_{[8b] (see}

also

_{(B11!), [12a]}

and

[12b]

has no

analytical

solutions in the

general

case. This was the reason

_motivating

our search for another

approach

for

computing

REML solutions to

E

B

and

£

w

under

_Ho.

An EM

_{diagonalization approach}

The

_{expectation-maximization approach}

is a _veryefficient

_concept

in maximum

(7)

(auaas,

1992).

The basic

_principle

is to treat the unobservable random variables

_b

_ij

and _e2!!as

_missing

data.

_Actually,

the EM

_algorithm

will not be

_applied

_directly

to the model described in

_[1]

and

_[2]

but after a

_{spectral decomposition}

of

E

B

according

to its

_eigenvalues

and

_vectors,

ie:

In this

_formula,

A =

Diagf6

il

is the

(p

x

_p)

matrix of

_{eigenvalues 6}

_i

_,

_{with 6}

_i

repeated

as _manytimes as its

_multiplicity

_order,

and U =

(U

1 ,

U

2 , ... ,

Ui, ... ,

Up)

is the

_(p

x

_p)

matrix of the

corresponding

p normed

eigenvectors

U

i

of

_E

_B

_(U’U

=

I

P

).

Under the

_special

form shown in

_!11!,

_E

_B

has

_only

2 distinct

_eigenvalues:

with

_multiplicity

orders 1 and

_{(p — 1)}

_{respectively.}

_Moreover,

the matrix U of

eigenvectors

does not

_depend

on the values of

6

1

and

₆

₂

_,

U’

_being

the Helmert

matrix of order _p,see for

_{example Searle,}

1982

_(p

71 and

₃₂₂₎

for more details about such matrices. For

_instance,

for p =

_3,

Due to the invariance

_property

of

_(RE)ML

_estimators,

the one-to-one transfor-mation in

_[14a]

and

_[14b]

allows us to

_change

the

_{parameterization}

from

_(<

_T1

_,C

_B

₎

to

₍₆

₁

_,b

₂

_),

or more

_conveniently

to

_(<!i,T)

where T =

6

1

₊

(p — 1)6

2 ,

the back transformation is:

From the

_{spectral decomposition}

of

_E

_$

_,

the model in

_[2]

can be written as:

where U is

_defined

as before and

the vector

_f

_j

=

{ fi!

is

such

that

_f

_j

N

N(0,A).

Using

the

_Dempster

et al

₍₁₉₇₇₎

_terminology,

a

_complete

data set x can be

constructed from /-1,

f

j

,

jke

for j

=

1,2,...,s

and k =

1, 2, ... ,

n,

whereas the

incomplete

data set is the vector y of

observations.

’

Let us first consider the case of

E

B

.

If the

_f

_j

_’s

were

known,

sufficient statistics

(8)

REML would then be obtained

by equating

the

_expectation

of these sufficient

statistics,

ie:

to their calculated values

_{(M step).}

_Actually,

these sufficient statistics are not

directly

observable and the EM

_{algorithm proceeds}

first

_{by estimating}

them

_by

taking

their conditional

_expectation

_given

the observed data set

_{(E step).}

Since such an estimation

_depends

on the value of the unknown

_parameters,

the

procedure

is iterative and consists of

_implementing

the 2 usual

_steps:

E

_step:

at iteration

_(t!,

calculate

M

step: compute

6!&dquo;’I

and

T[

t+

1 ]

from the

_following

equations:

As shown in

_{Appendix A,}

the

_(p

x

_p)

matrix

A’

l

can be

expressed

as:

where

_U,

B and r are defined as above

(see !13!, [5a]

and

[5b] respectively)

and

C!t!

is the matrix of variance of

_prediction

errors of

[

t]

=

E(f

j

)

y,

8!t] ,

r

[t]

,

_S!),

the best

predictor

of

_f

_j

at iteration

_[t]

such that:

Similarly,

sufficient statistics for

E

w

under the

complete

data set x are:

and the E and

_{M ’steps}

are

as follows:

(9)

For the E

_step,

at iteration

_[t],

calculate:

using

the

_following

formula based on the same

_reasoning

as

_previously

_(see

Ap-pendix

A):

sums of _squaresand

cross-products

(y..

being

defined as

For the M

_step

compute

the next value of

_Eyv

from:

Formulae

_[25]

and

_[24a]-[24b]

define the E and M

_steps,

_{respectively,}

of an EM

(10)

parameters.

Notice that in this scheme

_{tr(P)/p (average}

_diagonal

element of

_P)

and

((1’P1) -

_{tr(P!!/p(p -}

1)

(average

off-diagonal

element of

_P)

behave as sufficient

statistics for _a_B and

_C

_B

with

respect

to the

complete

data set.

Formulae for the residual

_components

are

_unchanged

with

UC

[t]

U’

=

:E

W

1 E!y!

and

M[

t

]

₌

_{n£ §}

_I

_(r

_iti

1 . For

the saturated

_model,

the formulae to

_apply

are the same for

_E

_W

_{and, simply,}

&dquo;

+

t

f

E

=

Il

ltl

ls

for

E

B

.

Testing

procedures

Hypotheses

of interest concern the vector 0 of

_parameters

involved in the matrices of

_{between-family}

_(E

_B

₎

and

_{within-family}

_(Ey!)

_components

of variance and covariance between environments. The

_theory

of the

_generalized

likelihood ratio

can be

applied

to that purpose, as

already proposed by Foulley

et al

_(1990,

_1992),

Shaw

₍₁₉₉₁₎

and Visscher

₍₁₉₉₂₎

_amongothers.

Let

_o

_:

_H

0 E

8

0

be the null

_hypothesis

and

_:

_H

_I

0 6 8 -

8

0

its

_alternative,

where 8 refers to the

_complete

_parameter

space and

O

o

,

a subset of it

_pertaining

to

_H

_o

_.

Under

H

o

,

the statistic

where

_L(0;y)

is defined as in

!7!,

has an

_asymptotic

chi-square

distribution with r

degrees

of

_freedom,

r

_being

the difference in the numbers of estimable

_parameters

involved in e and

O

o

(Mood

et

al,

1974).

Here e contains

_p(p

+

_3)/2

parameters

corresponding

to _presidual

_components

of variance and

_p(p

+

_2)/2

_{between-family}

_components

of variance and covariance

between environments whilst

Oo

has

_p+2

_{2 parameters}

_only

_(p

residual

_components,

<T1 and

C

B

),

so that r =

!p(p+ 1)/2! -

2.

In the

_{Neyman-Pearson approach}

of

_hypothesis

_{testing, H}

_o

is

_rejected

at the

a level if the calculated value of

_A(y)

exceeds a critical value

À

c

such that

Pr(xr >

À

c

)

= _a.

_However,

the likelihood ratio statistic

.!(y)

in

[24]

_canalso

be

_interpreted

as the difference in

_degree

of fit via maximum likelihood

_procedures

by

2 models: a reduced

_model(R)

with

parameter

vector 0 E

_O

_o

and a full model

(F)

with 0 E O

_encompassing

both the null

_hypothesis

and its alternative. In the

theory

of

_significance

_testing

_(Kempthorne

and

_Folks,

_1971),

this statistic is also used as a measure of

_strength

of evidence

_against

the reduced model or the null

hypothesis.

The lower the

_probability

under

_H

_o

of

_exceeding

this statistic evaluated

from the data

_(also

referred to as the P-value or

_significance

level or size of the

test),

the

stronger

the evidence

_against

H

o

.

Example

Data used here to illustrate the

procedures

are from an

_experiment

carried out in

Montpellier

(south

west of

_France)

on 20 full-sib families of black medic

(Medicago

lupulina

L)

tested in 3 different environments

_(control,

_harvesting

and

competition

treatments).

The

_{experimental design}

was described in detail

_by

H6bert

(1991).

There were

(11)

replicate. Thirty-six

traits were recorded and the variable used was the mean of the 5

_plants

cultivated in each

_replicate

so that _p=

_3,

_s= 20 and _n= 2.

Basic ANOVA statistics for the

_{between-family}

and

_{within-family}

sums of squares and

cross-products

are

_given

in table I for a subset of 5 traits.

Firstly,

the null

_assumption

that the

diagonal

terms of

_E

_w

were

_equal

was tested

via a Bartlett’s test based on ANOVA mean _squaresstatistics. P values were

_0.007,

0.08, 1.4

x

10-

7 ,

8 x

10’!

and 0.04 so that this

_assumption

can be

_{reasonably rejected}

(except

perhaps

for trait

_2).

Test statistics about

E

B

and estimates of

E

B

and

£w

under both the reduced and saturated models are

_given

in table II. P-values for

_vegetative

yield

_traits,

represented

here

_{by dry}

matter

_weight

_(trait

No

₃₎

and

_dry

matter

_weight/max

plant

size diameter

_(trait

No

_4),

were _very

low,

indicating

a

large heterogeneity

in

genetic

variation between evironments with full-sib variances

_{substantially}

reduced in the

_harvesting

_(i

=

1)

and

competition

(i

=

3)

environments

compared

with the control

_(i

=

2).

In

_contrast,

the

harvesting

and

_competition

environments do not

generate

a

meaningful

level of stress

_compared

with the control for the

_expression

of

_genetic

variation of

_days

to 1st

_{ripe pod}

_(trait

No

₂₎

and relative

_{pod weight}

(trait

No

_5).

These 3 environments then behave as

_{’exchangeable’,}

as statisticians would _say.In this

_example,

_genetic

correlations between environments were rather

high

and it would have been

_interesting

to test for some traits

_(eg

No 1 and

₅₎

using

the

_assumption

that these correlations are

equal

to

_unity

_by

Visscher’s

₍₁₉₉₂₎

(12)

(13)

DISCUSSION

This _paperdescribes a further contribution to the solution of the

_problem

of

testing

homogeneity

of

_{between-family}

components

of variance and covariance between environments in the case of balanced cross-classified

designs.

The

_testing

procedure

is based on the likelihood ratio test as

_already

advocated

_by

Shaw

(1987)

in

_{quantitative genetics.}

This

_study

extends that of Visscher

_(1992),

which was

restricted to the case of _pure

_scaling

effects between environments

(ie

all

_genetic

correlations between environments

_equal

to

_one).

The choice of an EM

_algorithm

for

_computing

REML estimates of

E

B

and

E

w

under the null

_hypothesis

allows us to make

_explicit

the

_equations

of the iterative process to

_implement

via formulae based on the usual ANOVA statistics. This

algorithm

does not

_require

_anyconstraint on the value of these ANOVA statistics

( eg

B - W can be

_non-positive

_definite)

_provided

the

_starting

values for

_E

_B

are within

the

_parameter

space. A

simple

reason for this is that the E

phase

under the restricted

model involves the conditional

expectation

(given

the

_data)

of sums of _{squares, eg}

£ f£

and

_e

_{e !!}

as estimators of variance. Because E

(L

f

l

f

ly,

A = 8!t! )

is

j 7 k j

,

always positive

definite

_(Foulley

et

_al,

_1987),

this

_property

of the EM

_algorithm

is also true under the saturated

model;

it can then be used to

_provide

REML estimates of

_E

_B

and

_E

_w

when ANOVA estimators are not

_permissible

_(eg

for traits

_1,

_3,

4 and 5 in the

_example).

Some authors such as Anderson

(1984)

and Shaw

(1987)

advocate the use of

ML rather than REML

_procedures

to test

_hypotheses

about variance covariance matrices. Our EM

_algorithm

can be

easily adapted

to obtain such ML estimates of

E

B

and

_E

_w

_.

It suffices to

_replace

_(s-1)B+r

in formulae

_[19]

and

_!21!

for A and Q

respectively by

(s-1)B

corresponding

to the

change

in the conditional

_expectation

(given

the

_data)

_{of n (y}

_j

_{. -}

_{p) (yj_ - !)’}

_according

to

_{whether g}

is considered as

j

a

_parameter

of interest

_(ML)

or a nuisance factor to be

_integrated

out

_(REML).

This

_algorithm

can also retrieve the usual ANOVA estimates for

_E

_B

and

_Ew

using

the

_2-way

crossed-mixed model:

involving

fixed environmental effects

_(h

_i

_),

random

_family

effects

_[s

_j

_{- NIID (0, a/ ) ] ,}

random interactions

_[hs

_ij

_-

_{NIID(0,afl!)]}

and residuals

_[e

_ijk

_{°° NIID(0,a}

_£

_)].

In

_{fact, Foulley}

and Henderson

₍₁₉₈₉₎

showed that this is an

equivalent

model

for a

_simplified

version of the

_multiple

trait model in

_[1]

restricted to

_E

_B

=

(

U2

B -

C

B

)Ip

+ C

B

Jp

and

_E

_w

=

Q!2,yIP

with

0,2 B

=

Q

s

+

or2!’, h

C

B

= or

.

and

_ow

₂

=

O

re.

2

Notice however that this

_{simplified multiple}

trait model differs from that considered

throughout

this

_study

_(see

_[11])

not

_{only by}

the

_assumption

of homoskedastic

residual variances but also

_by

its restriction to a

_positive

covariance

(C

B

)

between

environments.

For

_instance,

for trait 1

_(days

to

_flowering),

EM-REML estimates of variance and covariance

_components

obtained from the

_algorithm

described in

_[17]

to

_[22]

with

(14)

values can

easily

be checked with ANOVA

_estimatps:

a2

=

79.89;

&2,

=

0.97;

and

a

2

= 26.39. Here

again,

the EM

algorithm provides

estimates within the

parameter

space, which is not

_{always the}

case with ANOVA estimators as shown for instance

with trait 5 :

_{&1 =}

_{79.50, C}

_B

= 79.43 and

a2

= 23.23 with EM

versus as

=

79.65,

!hs

= -1.02 and

Qe

=24.07 with ANOVA.

The EM

_algorithm

is a first-order

_procedure

and therefore has close

_{relationships}

with a maximization

procedure

based on zeroed first derivatives. As shown in

Appendix

B in the case of

_E

_B

_,

the difference between the formulae to

_implement

in the 2 iterative

_procedures

consists of

_replacing

_{(s - 1)B}

+

r!t!

in

_[19]

_by

sB in the 1st derivative

_algorithm

_(B

_being

a sufficient statistic for r in the saturated

model).

Again,

the use of EM

_guarantees

_staying

in the

_parameter

space whereas there is no obvious

_proof

of that for the 1st derivative

_algorithm.

_Moreover,

the EM

approach

turns out to be easier to understand and to use than the other one, which relies

_mainly

on

_algebraic

tricks. As far as REML estimates of

_£w

are

_concerned,

a functional iteration

algorithm

based on first derivatives was also

proposed

in

Appendix

B due to the lack of an obvious

_analogue

of the EM formulae.

Finally,

this EM

_reasoning

can be extended to an unbalanced structure of data and to additional nuisance fixed effects cross-classified with

_family

effects,

ie to an

extended version of model

_[1]

such as

where

₁₃

is a vector of fixed effects

_including

the ith classification for environment and effects of other factors to be

_adjusted

for,

and

_x’ij

_k

is the

_{corresponding}

incidence

_(row)

vector. Under that

model,

the E and M

steps

defined in

_[17]

and

[18a]

and

_[18b]

for

_E

_B

and in

_[21]

and

_[23]

for

E

w

are still valid.

_However,

the

E-statistics in

_{!17!, [21]}

and

_[24b]

are not evaluated as functions of ANOVA statistics but

_directly

from the numerical values of the BLUP and the variance of

_prediction

errors of the

_f

_j

or

_b!’s.

In this

_{situation, also,}

the EM

_algorithm

should be

_applied

systematically

to both the reduced and saturated model for

Ey!.

Another

approach

would be to write

_[28]

under its

_equivalent

form

_(for

_C

_B

>

_0):

where

_{hi, sj}

and

_hsjj

are defined as in

_[25]

and _{e2!! N}

_NID(O,

_o,2w.).

Under such

a mixed-model

_structure,

one can then use the methods

developed

_{by Foulley}

et al

_{(1990, 1992)}

and San Cristobal et al

₍₁₉₉₃₎

for

_calculating

REML estimates of variances in the _presenceof

_{heterogeneous}

residual

_components.

_However,

the

procedure

derived in this _{paper remains}

_definitively

more

_general,

for

_instance,

it

can also be

easily applied

to a

non-diagonal

structure of

E

w

using

formulae

_[17]

to

[22]

unchanged

and

_[23]

_slightly

modified into

E!+l]

=

S2!t!/ns.

This _paperdeals with a null

_hypothesis

of constant

_{between-family}

variance and covariance. In some

_instances,

a more

_appropriate

null

_hypothesis

would be a

constant

_{between-family}

correlation

_(p)

between environments

_(a-

_B

_ii

_’

=

/9<?’B,<!B,/)

and/or

of constant intraclass correlation

_[a-1i

₌

_t(a-1i

₊

_or2

_wi

_)].

_Testing

procedures

for these

_assumptions

will be

_reported

in a

separate

article.

(15)

ACKNOWLEDGMENTS

The authors are

_grateful

to I Olivieri

_{(INRA, Montpellier)}

for

_stimulating

discussions which motivated this

_study,

to J Ruane for the

_English

revision of the

_manuscript

and to the _anonymousreferees for their valuable comments.

_Special

thanks are

_expressed

to C Robert and M San Cristobal who read the

_{manuscript thoroughly}

and checked the numerical

_example.

REFERENCES

Anderson TW

₍₁₉₈₄₎

An Introduction to Multivariate Statistical

_Analysis.

J

Wiley

and

Sons,

New York

Dempster

AP,

Laird

_NM,

Rubin DB

₍₁₉₇₇₎

Maximum likelihood from

_incomplete

data via the EM

_algorithm.

J R Statist Soc B

_39,

1-38

Dickerson GE

₍₁₉₆₂₎

_Implications

of

_{genetic-environmental}

interaction in animal

breeding.

Anim Prod

4, 47-63

Falconer DS

₍₁₉₅₂₎

The

_problem

of environment and selection. Amer Nat

_86,

293-298

Foulley

JL,

Henderson CR

₍₁₉₈₉₎

A

_simple

model to deal with sire

_by

treatment interactions when sires are related. J

_Dairy

Sci

_72,

167-172

Foulley JL,

Im

S,

Gianola

_D,

Hoeschele I

₍₁₉₈₇₎

_{Empirical Bayes}

estimation of

parameters

for n

_{polygenic binary}

traits. Genet Sel

_{Evol 19,}

197-224

Foulley JL,

Gianola

D,

San Cristobal

M,

Im S

₍₁₉₉₀₎

A method for

_assessing

extent

and sources of

_{heterogeneity}

of residual variances in mixed linear models. J

_Dairy

Sci

_73,

1612-1624

Foulley

JL,

San Cristobal

_M,

Gianola

_D,

Im S

₍₁₉₉₂₎

_Marginal

likelihood and

Bayesian

approaches

to the

_analysis

of

_{heterogeneous}

residual variances in mixed linear Gaussian models.

_Comput

Stat Data Anal

_13,

291-305

H6bert D

₍₁₉₉₁₎

Plasticite

_ph6notypique

et interaction

_genotype

milieu chez Medi-cago

Lupulina.

These Doctorat Sciences. Univ Sciences

Techniques

du

_Languedoc,

Montpellier

Hill WG

₍₁₉₈₄₎

On selection among groups with

heterogeneous

variance. Anim Prod

_39,

473-477

Kempthorne

0,

Folks L

₍₁₉₇₁₎

_Probability,

statistics and data

_analysis.

The Iowa State

_University

Press,

Ames

₍₁₀₎

Mallard

_J,

Masson

_JP,

Douaire M

₍₁₉₈₃₎

Interaction

_genotype

x milieu et mod6le mixte. I. Mod6lisation. Genet Sel Evol

15,

379-394

Meyer

K

₍₁₉₉₀₎

Present status of

_knowledge

about statistical

procedures

and

algorithms

to estimate variance and covariance

_components.

In:

_4th

World

_Congress

on Genetics

_Applied

to Livestock

_Production,

_Edinburgh,

23-27

_July

1990,

vol

_13,

(WG

Hill,

R

_Thompson,

JA

Wooliams,

eds),

407-418

Mood

_{A, Graybill}

_FA,

Boes DC

₍₁₉₇₄₎

Introduction to the

_{Theory of}

Statistics. Mc Graw-Hill

Inc,

London

(auaas

RL

₍₁₉₉₂₎

REML NoteBook.

_{Mimeo, Dep}

Anim

_Sci,

Cornell Univ Ithaca

(NY)

San Cristobal

M, Foulley JL,

Manfredi E

₍₁₉₉₃₎

Inference about

multiplicative

heteroskedastic

_components

of variance in a mixed

_linear

Gaussian model with an

(16)

Searle SR

₍₁₉₈₂₎

Matrix

_Algebra

_Useful

to Statistics. J

_Wiley

and

_Sons,

New York

Shaw RG

₍₁₉₈₇₎

Maximum likelihood

_{approaches applied}

to

_{quantitative genetics}

of natural

_populations.

Evolution

41,

812-826

Shaw RG

₍₁₉₉₁₎

The

_comparison

of

_{quantitative genetic}

_parameters

between

populations.

Evolution

_45,

143-151

Visscher PM

₍₁₉₉₂₎

On the _powerof likelihood ratio tests for

_detecting

heterogene-ity

of intra-class correlations and variances in balanced half-sib

_designs.

J

_Dairy

Sci

75,

1320-1330

Visscher

_PM,

Hill WG

₍₁₉₉₂₎

_{Heterogeneity}

of variance and

_dairy

cattle

_breeding.

Anim Prod

_55,

321-329

Yamada Y

₍₁₉₆₂₎

_{Genotype by}

environment interaction and

_genetic

correlation of the same trait under different environments.

_Jap

J Genet

_37,

498-509

APPENDIX

An EM

_algorithm

for REML estimations

_{of E}

_B

and

_E

_w

_{(part A)}

E

B

and

_Eyv

under

_H

_o

An

_explicit

formula for A can be obtained

_{by successively conditioning}

and

deconditioning

the

_expression

in

_[17]

with

_respect

to the mean vector _{p, ie:}

where 0 stands for the vector of

parameters

involved in the matrices of

between-family

(E

B

)

and

_{within-family}

_(E

_w

₎

_components

of variance and covariance between environments and

0

1 ’

is the current estimate of 0 at iteration

_!t!.

s

Now, conditionally

on _{y, p and 0}=

0’

l

,

the

expectation

of

Ef

j

f

can

be j= 1

decomposed

into:

This

_{decomposition}

is

_{especially helpful}

because it

_allows

us to introduce the usual statistics of Gaussian

_models,

ie the conditional mean

f

j

=

E(f!

ly,

_1-1,

e)

and its

_prediction

error variance

_C

_j

=

(17)

The next

_step

is to

_specify

the

_expression

for:

with

respect

to the distribution of

₄₁

_y,

e. On account of

assumptions

made in

_!1),

the distribution of this random variable is

_{N[y.., (E}

_B/

_s)

+

_(Ew/ns)),

so that:

The formula for the variance of

_prediction

error in

[A3b]

does not

_depend

on _p, so that the

_expression

of

_(Al]

reduces to:

where

C!t!

is defined as in

[A3b]

_using

0!

as a current estimate of 0 in

Ew

and

E

B

,

the matrices B and U

_being

constant over rounds of iteration.

The next values of the unknown

_components

of

_E

_B

_(ie

_B

_a

and

C

B

or

6

1

and

T

)

are

_computed

at the M

_step

from the

_diagonal

elements of

A’

l

_(!18a!

and

_!18b!).

The

_diagonal

terms of the matrix

E

w

under the

_complete

data set x

since:

Because this statistic is not

_observable,

it is

_{replaced by}

its conditional expec-tation

_given

the data y and e =

9!.

As for

_{B-components,}

this

_expectation

is calculated after

_conditioning

and

_{deconditioning}

with

_respect

to the mean vector !, ie:

(18)

where

&dquo;

0

and C are as before.

From

_[A9a],

the

_quadratic

_{L êjkêjk}

can be written as

jk

with M =

nUCU’E-1

The next

_step

is to take the

_expectation

of each element in the

_right-hand

side with

_respect

to the distribution of

_gly,

0. Then

where T

_designates

the matrix of total mean _squaresand mean

cross-products,

Similarly,

Placing

[All], [A12]

and

_[A13]

in

_[A8]

and

_noting

that the conditional variance in

_[A9b]

does not

_depend

on _!,

_gives

In

_{fact, ft}

is evaluated

_{conditionally}

on 0 =

0!,

ie

by taking

F =

F

l

’

]

,

C =

Cl’l

and M =

M’

l

in

[A. 14].

The _nextvalue of

or

2

(19)

E

B

and

_Eyjr

under the saturated model

Actually,

the EM

_algorithm

described

_previously

can be

_easily

accomodated to deal with the saturated model. This is

_especially

_helpful

when the ANOVA estimates of

E

B

fall outside the

parameter

space.

Nothing

is

_changed

with

_respect

to

_Eyv,

which has the same

_diagonal

structure with p different elements in both situations. As far as

E

B

is

concerned,

a sufficient

statistic under the

_complete

data set is now the

(p

x

_p)

matrix

_bjb

_’

_.

₃

j

U ( ! f! f! ) U’.

However,

for a

_{given U,}

the

general

_expression

of the conditional j

expectation

of

_L

_:

_{fj f)}

_given

the data was

_already

derived

(see

[19]

and

[A5],

so that

j

the E

_step

remains the same.

Because all the elements of A are now

_required,

the

changes

to

_implement

at the M

_step

are the

_following:

_compute

the next value

I:!+1]

of

_E

_B

_(saturated

_model)

by

where

A’

l

is obtained from

_[A5]

with

U

M

being

the matrix of normed

_eigenvectors

of

E

W

.

Notice that here U is

_updated

at each iteration from the

_equation

E[t]U

lt]

=

U[t]![t].

Algorithms

based on first derivatives

_(part

_B)

Between

_family

_{components E}

B

Using

the

spectral

decomposition

in

_!13!,

ie r = nU’!U ₊

E

w

,

we obtain

where

_Uip

_Ui

₂

_{>&dquo;&dquo;}

_{Uie, ... ,}

_Ui

_r-

are

the ri

normed

_eigenvectors

of

E

B

correspond-ing

to the

_{eigenvalue 6}

_i

with

multiplicity

order ri. Remember that under the reduced

model,

rl = 1 and _r₂ =

p — 1 for

6

1

and

6

2

defined in

[14a]

and

[14b],

respectively.

Substituting

8F /861

by

its

_expression

_[B1]

in the

equation

<9(-2L)/<9<*’t

= 0 leads

to:

Let L =

vn:U ! 1/2

with L

partitioned

in the same _wayas

U,

ie

_Lie

=

vn

-

6iUi,,

(20)

where

_(A)

_i£ie

stands for the

_i

_f

_th

_diagonal

element of the A matrix. Now: and with

Then,

from and Furthermore: or so that and

Using

[B7]

and

_!B8!,

the

_system

_[B3]

reduces to

The EM

_procedure

described in

_(18a!,

_[18b]

and

_[19]

can be

_{alternatively}

written

(21)

The

_parallel

between

_[B9]

and

_[B10]

is

_{straightforward.}

Here B

_replaces

((s - 1)B

+

_r!/s,

since these 2

_quantities

have the same

_expectation

F under the

saturated model.

Residual

_components

_Ew

<9F

OEW

_{(OEW ! OEW}

_{..!v!. !}

Here,

ar - a!2 , C a!2 J -

1

and 2

= 0 for _anyi

_{! i’.}

_Then,

the

a!W -

0,W!

a

B!!.A.’

equation

[8b]

= 0 _canbe written _as

Equation

!B11!

defines a non-linear

_system

that can be solved

_iteratively

_using,

for

Inferences on homogeneity of between-family components of variance and covariance among environments in balanced cross-classified designs

HAL Id: hal-00894022

https://hal.archives-ouvertes.fr/hal-00894022

Submitted on 1 Jan 1994

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Inferences on homogeneity of between-family

components of variance and covariance among

environments in balanced cross-classified designs

Jl Foulley, D Hébert, Rl Quaas

To cite this version:

Original

article

Inferences

on

homogeneity

of

between-family

components

of

variance and

covariance

among

environments

in

balanced cross-classified

designs

JL

Foulley

D

Hébert

2

RL

Quaas

3

Agronomique,

Génétique Quantitative

Appliquée,

Jouy-en-Josas,

Jouy-en-Josas Cedex;

2

Domaine

Expérimental

Agronomie d’Auzeville,

Toulouse,

27,

Cedex, h

b

ance;

3

Cornell

University, Department of Animal Science, Ithaca,

14853,

(Received

August

accepted

1993)

testing

homogeneity

between-family

investigated

designs.

diagonal

testing

procedure

homogeneity

family

log-restricted

(hypothesis

homogeneity)

_among

_designs

_Foulley

_Quaas

₃

_Agronomique,

_{Génétique Quantitative}

_{Jouy-en-Josas,}

_{Jouy-en-Josas Cedex;}

_Toulouse,

_27,

_{Cedex, h}

_b

_ance;

_{University, Department of Animal Science, Ithaca,}

_August

_accepted

₁₉₉₃₎

_testing

_{between-family}

_investigated

_diagonal

_homogeneity

_family

_{log-restricted}

_(hypothesis

_homogeneity)

_{expectation-maximization}

_(EM)

_algorithm

_proposed

_calculating

_(REML)

_{between-family}

_analysis

_statistics,

_{within-family}

_{cross-products. They}

_applied

_vegetative

_reproductive

_experiment

_(Medicago

_lupulina

_L)

_Application

_procedures,

_designs

_/

_/

_/

_/

_procédure

_{d’homogénéité}

_(hypothèse

_{d’homogénéité)}

_{algorithme d’espérance-maximisation}

_(EM)

_(REML)

_familiales

_formules

_statistiques

_classiques

_l’analyse

_(ANOVA),

_coproduits

_{intrafamilles.}

_l’espace

_présentées

_l’analyse

_végétatifs

_reproductifs

_expérience

_familles

_(Medicago

_lupulina

_L).

_{L’application}

_{généralisation}

_{dispositifs déséquilibrés}