ECM approaches to heteroskedastic mixed models with constant variance ratios

(1)

HAL Id: hal-00894170

https://hal.archives-ouvertes.fr/hal-00894170

Submitted on 1 Jan 1997

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

ECM approaches to heteroskedastic mixed models with

constant variance ratios

Jl Foulley

To cite this version:

(2)

Original

article

ECM

_approaches

to

heteroskedastic

mixed models with

constant

variance ratios

JL

_Foulley

Station de

_{génétique quantitative}

et

_appliquée,

Institut national de la recherche

_agronomique,

78352

_{Jouy-en-Josas cedex,}

France

(Received

6

_February

_1997;

_accepted

28

_May

₁₉₉₇₎

Summary - This paper presents

techniques

of parameter estimation in heteroskedastic

mixed models

_having

constant variance ratios and

_{heterogeneous log}

residual variances

that are described

_by

a linear model. Estimation of

_dispersion

parameters is

_by

standard

(ML)

and residual

_(REML)

maximum likelihood.

_{Estimating equations}

are derived

_using

the

_{expectation-conditional}

maximization

_(ECM)

_algorithm

and

_simplified

versions of it

(gradient ECM).

Direct and indirect

_approaches

are

_proposed

with the latter

_allowing

hypothesis testing

about the variance ratios. The

analysis

of a small

example

is outlined to illustrate the

_theory.

heteroskedasticity

/

mixed

_{model / maximum}

likelihood

_/

EM _algorithm

Résumé - Approches ECM des modèles mixtes hétéroscédastiques à _rapportsde variances constants. Cet article

présente

des

techniques

d’estimation des

paramètres

intervenant dans des modèles mixtes _ayantdes _rapportsde variance constants et des variances résiduelles décrites par un modèle linéaire de leurs

logarithmes.

Les

_paramètres

de

_dispersion

sont estimés _parle maximum de vraisemblance

classique

(ML)

et restreint

(REML).

Les

équations

à résoudre pour obtenir ces estimations sont établies à _partirde

l’algorithme d’espérance-maximisation

conditionnelle

_(ECM)

et d’une version

_simplifiée

dite du

_gradient

ECM. Des

_approches

directe et indirecte sont

_proposées,

cette dernière conduisant à un test

_{d’hypothèse}

sur le _rapportde variances. La théorie est illustrée _par

l’analyse numérique

d’un petit

exemple.

hétéroscédasticité

_/

modèle mixte

_/

maximum de vraisemblance / algorithme EM

INTRODUCTION

Heteroskedasticity

has

_recently

_generated

much interest in

_{quantitative genetics}

and animal

_breeding.

To

_{begin with,}

there is now a

_large

amount of

_experimental

evidence of

_{heterogeneous}

variances for most

_important

livestock

_production

traits

(3)

of

_{heterogeneous}

variances

_arising

in univariate mixed models

_(Foulley

et

_{al, 1990;}

Gianola et

_{al, 1992; Weigel}

et

_al,

_1993;

_DeStefano,

_1994;

_Foulley

and

_Quaas,

_1995).

For _manyreasons

_(accuracy

of

_estimation,

ease of

_{handling large}

data

_sets),

a

major

objective

in this area lies in

_making

models as

_parsimonious

as

_possible.

This can be

_accomplished

in at least two _ways:

_i)

_{by modelling}

variances in the

case of

_potentially

numerous sources of

_{heteroskedasticity,}

and

_ii)

_by

_assuming

that

some functions of those

_parameters

(eg,

intra-class correlation or

heritability)

are

constant. The first

aspect

corresponds

to the so-called structural

approach

in which the

_{heterogeneity}

of the

_log

_components

of variances is described via a linear model

structure similar to that used for means

(Foulley

et

_{al; 1990,}

_1992;

San

_Cristobal,

1993).

Restrictions as in

ii)

were considered

by

Meuwissen et al

₍₁₉₉₆₎

and Robert et al

_(1995a,b).

Meuwissen et al

₍₁₉₉₆₎

introduced a

_{multiplicative}

mixed model to estimate

_breeding

values and

_{heteroskedasticity}

factors

_assuming

_heritability

_(h

₂

₎

constant across

_herd-years.

Robert et al

_(1995a,b)

_developed

estimation and

_testing

procedures

for

_homogeneity

of

_heritability

within

_and/or

_genetic

correlations across

environments. But Meuwissen’s

_study

postulates

known

h

2 and

Robert’s research

applies

to

_only

a

_single

classification of

_{heteroskedasticity.}

The _purposeof this _{paper is}to _proposea

_complete

inference

_approach

for

parameters

having

both features

_i)

and

_ii),

ie,

for continuous data described

by

mixed models with constant variance ratios and

_{heteroskedasticity analyzed}

via

a structural

_approach.

For

_simplicity,

the

_theory

will be

_{presented using}

a

one-way random mixed model for data and afterwards it will be

generalized

to several

u-components.

Inference is based on likelihood

_procedures

(REML

and

ML)

and

estimating equations

derived from the

_{expectation-maximization}

_(EM)

_theory,

more

precisely

the

expectation/conditional

maximization

(ECM)

algorithm

_recently

introduced

_{by Meng}

and Rubin

_(1993).

THEORY

Statistical model

As

_usual,

it is assumed that the

population

can be structured into strata

_(i

=

1, 2, ....,1)

corresponding

to

_potential

factors of

_{heterogeneity.}

Let the _one-way random model be written as:

where _y_i is the

_(n

₂

x

1)

data vector for stratum

_i;

_j3

is a

(p

x

1)

vector of unknown fixed effects with incidence matrix

X

i

,

and ei is the

(n

i

x

₁₎

vector of residuals. The contribution of random effects is

_expressed

as in

_Foulley

and

_Quaas

₍₁₉₉₅₎

as

_{O&dquo;uiZiU’}

where u* is a

_(q

x

₁₎

vector of standardized

deviations, Z

i

is the

corresponding

incidence matrix and au, is the square root of the

u-component

of variance the value of which

_depends

on stratum i. Classical

_assumptions

are

made

_for

the distributions of u* and ei,

ie,

Nu*

_{N(0, A),}

eiN

N(0, ae.In! ),

and The notation in

_[1]

is unusual as

compared

to that used in the statistical literature

(4)

an

_expression

of the random

_part

_especially

in animal

_breeding.

For instance the between sire variance _{may vary}

_according

to the environment in which the progeny

of the sires are raised. Note also that (_J_Uican be viewed as a

_regression

coefficient of

any element of yi on the

corresponding

element of

Z

i

u

*

.

Thus,

in animal

_breeding,

a

u

, acts as a

scaling

factor of a vector u* of standardized sire values on

which,

for

instance,

selection can be based.

A structure is

_hypothesized

on the residual variance so as to model the influence of factors

_{causing heteroskedasticity.}

This is carried out

_along

the lines

presented

in

_Foulley

et al

_{(1990, 1992)}

via a linear

_regression

on

_{log-variances:}

where 5 is an unknown

(r

x

1)

real-valued vector of

parameters

and

_p’

is the

corresponding

(1

x

r)

row incidence vector of

_qualitative

or continuous covariates.

Furthermore,

the

_assumption

of a constant intra-class correlation

_{(or heritability)}

implies

setting

EM-REML estimation

Use is made here of the EM

_algorithm

of

_Dempster

et al

₍₁₉₇₇₎

to

_compute

REML estimates of

_parameters

involved in variance

components

(Patterson

and

Thompson, 1971;

Searle et

_al,

_1992).

The basic

_procedure

_{proposed by Foulley}

and

Quaas

(1995)

is

_applied

here after some

_adjustment

of the

_M-step

_taking

_advantage

of the ECM

_algorithm

of

_Meng

and Rubin

_(1993).

-the ECM

_algorithm

is based on a

_complete

data set defined

_by

x =

(0’, u

*

’,

e’)’

and its

_{log-likelihood}

_{L(y; x).}

The iterative process takes

place

as follows.

The

E-step

is defined as

usual,

_ie,

at iteration

_[t],

calculate the conditional

expectation

of

_{L(y; x)}

_given

the data _yand _y=

y!t!

which,

as shown in

_Foulley

and

_Quaas

(1995),

reduces to

where

E!t]

_(.)

is a condensed notation for a conditional

_expectation

taken with

respect

to the distribution of

_{x!y, y}

=

-yf

t

l.

Since the

_parameters

to be estimated are

_{heterogeneous,}

the

_{estimating equations}

are derived at the maximization

_stage

from a

_slightly

different version of the EM

_algorithm,

the so-called ECM

_algorithm.

As

explained

in detail in

_Meng

and Rubin

_(1993),

a CM

_stage

_replaces

the

_M-step

_by

a _sequenceof several conditional

(5)

ascent maximization

_procedure

_(Zangwill,

_1969).

We

_suggest

here the

_following

procedure:

-

maximize

_Q

over _yto

_get

6 [tH]

with T set at its last value

_T

_[t]

_,

ie

-

then,

maximize

_Q

over Tto obtain

T!’+’l

with 5 in _yof

1’[

])

t

Q(

I

’

1

set to

5!!!,

ie,

Thus,

the maximization

_step

consists of two

_CM-steps

within the same

_E-step

in order to reduce the need to

_compute

the conditional

_expectation

of

_eie

_i

_,

and its

components

more than once. The

algebra

of differentiation is

given

in

Appendix

A.

The iterative

_system

for

_computing

formulae 5 can be written as

with the elements of the

_right-hand

side

_being

Note that for this

_algorithm

to be a true

_ECM,

one would have to iterate the NR

algorithm

in

_[7]

within an inner

_cycle

_{(index £)}

until _convergenceto the conditional

maximizer

_y[

_tH]

=

yl’,’]

_ateach

M-step

[t].

In

practice

it may be

advantageous

to

reduce the number of inner

_iterations,

even _upto

_only

_one,

_ie,

_{by solving}

_just

once

However,

caution should be exercised when

_applying

such a

_{hybrid algorithm}

that no

_longer

_guarantees

the monotonic convergence in likelihood values

(Lange,

(6)

The formula to

_update

Treduces to

mimicking

the form of a scaled

_regression

coefficient

_pooled

over strata.

The elements to

_compute

at the

E-step

can be

expressed

as functions of the sums

X’yi, Z’yi,

the sums of _squares

yiyi

within

_strata,

and GLS-BLUP solutions of

Henderson’s mixed model

_equations

and of their _accuracy

_{(Henderson, 1984),}

ie

Thus,

deleting

[t]

for the sake of

_simplicity,

one has:

where

_{(3 and}

u* are mixed model

_equations

for

₁₃

and

u

*

,

and C - _

[Cf

3f

3

_Cuf3

C

f

3 u

]

Cuu

J

is the

_partitioned

inverse of the coefficient matrix.

Expressions

in

_[12a-c]

can

_easily

accommodate

_grouped

data

(see

Appendix

B).

The close connection between the

_system

of

_equations

_[7]

for residual

_parameters

and formula

_[12]

_given

in

_Foulley

et al

₍₁₉₉₀₎

can be observed. There is also a

remarkable

_similarity

between formula

_[9]

for the ratio and formula

_[7]

in

_Foulley

and

_Quaas

_(1995).

This means that the

_computations

can be

_implemented

with

very little

change

in the code used

previously.

True or

_gradient

EM could also have

(7)

Extension to several

_u-components

Formulae

_{(7!, [8ab]}

and

_[9]

can

_easily

be

_generalized

to a mixed model

including

several

_(k

=

1,

2,...,

K)

independent

u-components

with Tk =

a

Uik

/a

ei

_constant_over_stratai.

Letting

y =

(b’,

T

’)’

as

_previously

but now with T=

I

Tk

being

a vector of ratios

of standard

deviations,

the

_Q

function to be maximized has the same form as in

[4]

with ei

expressed

from

!13!.

One can

_perform

the

CM-steps using

either

i)

the

sequence

6, ’r

l

, T

2 ....

I

Tk

, - - - ,

, KT

ie,

each Tk one

_by

_one,the

_remaining

ones

_being

held

constant,

or

ii)

the _sequence

/5,

and T as a whole with all the Tks maximized

jointly.

In both cases, the

algorithm

for

computing

5 is

formally

the same as in

[7]

with

_only

a

_slight

_change

in the definition of the elements of

W

bb

,

vb

being

unchanged

If the conditional maximization of

the T

k

s

takes

place

one

_by

one

_{(case i),}

formula

[9]

still

_applies

for each of them. Otherwise

_{(case ii),}

one has to solve the

following

system:

An indirect

_approach

The

_original

model with a constant T ratio

specified

in

[1-3]

can be viewed as a

special

case of a more

general

model

with,

as

_previously,

fno, 2 -

p§5,

but also with a linear structure on

_log-ratios

(8)

Letting

y =

(6’, 71’)’

_here,

the

sequence of the

CM-steps

are

The

_algorithm

for S is the same as in

_[7].

The

_algebra

for A is shown in the

Appendix,

and leads to a

_system

that can be written under a similar form as that

of 6

1--J For

_practical

_reasons,one _mayalso wish to limit the number of inner iterations

(index

£)

even to

_only

one in order to reduce the volume of

_computation

but the

application

of this ECM

_{gradient algorithm}

should be

performed

carefully.

Further

empirical

simplifications

for the elements of

_[22]

can be

_proposed

_along

the same

lines as in

_Foulley

et al

_(1990).

Again,

these results can be extended to a model with several random

_independent

factors

_(k

=

1, 2,...,

K)

by

_setting

Actually,

if the

_CM-steps

are

_performed

for each vector

₇₁

_k

_separately,

the same

formulae as in

[20], [21]

and

_[22]

_{apply: just replace}

_Ti_,

_,

_i

_Z

u*

_by

_Ti_,_k_,

_Zi

_k

_,

_{uk and}

ML estimation

It _maybe

_interesting

in some instances to use ML rather than REML for

_estimating

variance

_components

_{(see Discussion).}

The ECM

_{procedure developed}

in this paper

can be

_easily

_adapted

to obtain ML

_parameter

estimates.

₁₃

is now

_part

of the

(9)

respect

to the distribution of u*

_given

_{y, y}=

y!t!,

and

13

=

13 [

’

I

.

Maximization with

respect

to

₁₃

can be based on the

_equation

<9Q/<9j3

=

0,

ie

One can

proceed

as

_previously,

_ie,

run two

_CM-steps

for the

_dispersion

parameters

based on the same

_E-step

so as to obtain

6!t+

and

T

]

t+1

]

_(or

_!ft+1]),

and then

perform

an additional

_CM-step

for

_computing

¡3

[t+l]

based on

!23!,

ie

l &dquo;&dquo; ’ -!J

Alternatively,

it _maybe

_advantageous

to

_perform

the

_CM-step

for

_j3

and the

_{jointly by solving}

Henderson’s mixed model

_equations

in

I3

[Hl]

and

u*[

t

+

i

]

=

E!u*!y,

6

1 tH

], rrl

t

H

]

)

based _on

6[

Hl

]

and

T

f

c

+

1

1.

Formulae for the two

_CM-steps

do not

_change.

The

_only

additional modification results from

taking

the conditional

_expectation

of

components

of

_e!e,

given

y, y =

y[

t

],13

=

l3

[t]

instead of

y, y =

y

[t]

.

Formulae in

[12]

reduce to

where

_M

_uu

is the u

_by

u block of the coefficient matrix

!11!.

Note that the trace terms inside those formulae have

disappeared

or have been

greatly simplified owing

to

_conditioning

with

respect

to

₍₃

=

l3

[t]

.

More

generally,

for models

_[13]

_involving

several

_{u-components,}

_[25c]

becomes

where

_{(M§) )}

_k£

is the block

_pertaining

to random factors k and in the inverse of the random

_part

of the coefficient matrix.

Numerical

_example

The

_{procedures presented}

in this paper are illustrated with a small data set obtained from simulation. Data were

_generated

according

to a cross-classified model

having

two

_{(environmental)}

fixed factors

_{(A =}

_{2 levels;}

B = 3

levels)

and one

(genetic)

random factor

_(S

= 9

levels).

The

_genetic

contribution consists of sire and maternal

grand

sire

_effects,

the latter

_being

assumed to have half the value of the first one.

(10)

where _pis a

_general

_mean,_a_i the effect

of

environmental factor A

(i

=

1, 2), b!

the effect of environmental factor B

_(j

=

1, 2, 3), s* the

standardized contribution of male k as a

_sire,

and

1/2se

the standardized contribution of male as a maternal

grand

sire,

and _eZ!w&dquo;,the residual term.

Values chosen for the fixed effects were

(using

a full-rank

parameterization):

¡.

.t

+al

+b1

=

_100;

_az-_o_n =

_20;

_b2

_-

_b

_,

=

_-10;

_b3

_-

_bl

= -20. The vector s* =

fs

*

kl

}

of sire effects is assumed to be

_{N(0, A)}

with elements of the

relationship

matrix A shown at the bottom of table I.

Residual variances were obtained from

with a base line value

_(]&dquo;!11

=

exp(p

*

+ai +bl)

=

400,

and

multiplicative adjustment

factors:

_{exp(a2 - a*)}

=

2;

exp(b2 - bi)

=

1/2

and

exp(b3 - b*)

=

3/2.

The ratio

T =

(

]

&dquo; 8ij

/ (]&dquo;

eij

of the square root of the sire to the residual variance was taken as

constant over A x B cells and set to

8.75-

1/2

_{(heritability}

_equal

to

_0.41).

There were 267 observations distributed among 18 different AB x sire x maternal

grand

sire subclasses. The data structure is

_displayed

in table I as well as cell size

(11)

Tests of

_hypotheses

about the location

parameters

{3,

the residual

_dispersion

parameters

5 and the ratios

_r

_ij

were carried out via the likelihood ratio statistic as

described in

_previous

studies

_(Foulley

et

_al,

1990

_1992;

San Cristobal et

_{al, 1993;}

Meyer

et

_{al, 1993; Foulley}

and

_Quaas,

_1995).

Formulae

by

Quaas

(1992)

were used

to

_compute

maximized likelihood functions

_(Ln,

_aX

_).

Results can be

arranged

as an

analysis

of variance

(or

deviance)

table: see

table II for

hypothesis testing

about

_{3,

and table III for residual

_(b)

and ratio

(A)

parameters.

Note also that the test statistic for

₁₃

relies on

-2L

n

,aX

evaluated

from the ML estimates of all

parameters,

whereas a maximized residual likelihood can be better

_employed

for 5 and 7!.

Interaction effects on location

_parameters

are

constantly rejected

under different

assumptions

for the other

_parameters.

The

_hypothesis

of residual variance

homo-geneity

is

_strongly

_rejected

as well as

single

factor

descriptions

of

heterogenity.

The

assumption

of a constant ratio Tturns out to be a reasonable one. The test results

eventually

agree with the simulation

model; they

support

the

practical

conclusion that the _{p +}A + B model is the most

_appropriate

to account for variation both in

location and in

_log-residual

variances,

the ratio T

_being

constant.

The estimation

_procedure

for l5 and T

_{(or J!)}

is illustrated in table IV for this model and an alternative one

_using

both standard and residual maximum likelihood

methods of estimation. ML and REML estimates of residual variances do not differ

very

much;

on the

_contrary,

the ML estimates of the ratio T turns out to

be,

as

expected,

lower than the REML ones, the values of the latter

being

close to the

true value.

DISCUSSION AND CONCLUSION

The main _purposeof this _paperwas to extend the

_general

structural

_approach

to

heteroskedasticity

in mixed models

proposed by Foulley

et al

_{(1990, 1992)}

to the

case of

homogeneous

ratios of u to e variance

_components.

In a sire

by

environment

interaction,

this is

equivalent

to

postulating

homo-geneous intra-class correlations or heritabilities. This seems to be a reasonable

assumption

in

_practice,

or at least serves as a suitable

compromise

between the existence of

_{heteroskedasticity}

and

_parsimony

of models. Less restrictive assump-tions

_might

also be

_investigated

_(Quaas,

1995,

pers

comm).

This paper also

provides

a

_{generalization}

of LR tests of this

_assumption

to unbalanced data and

_complex

model structures: see the

(1992)

on a _one-wayrandom

balanced

_design,

and that

by

Robert et al

_(1995a,b)

for

_{heterogeneous}

variances

due to a

_single

classification.

The EM

_algorithm

turns out to be a convenient and

powerful

tool for

solving

variance

_component

estimation

_problems.

The ECM

_algorithm

allows us to

_simplify

the

estimating equations,

in

_particular

the ECM

_gradient

version. The

_advantage

of this

_algorithm

was

especially

clear here in the case of the indirect

approach.

A few

_examples

of this for the mixed model have been

_already

mentioned

_(Meng

and

Rubin,

1993

_{example 1; Walker,}

_1996).

It offers

_great

_flexibility

in

_defining

the

sequence of the conditional maximization

steps,

all the alternatives of which have

(12)

(13)

(14)

generated by

the EM

_algorithm

are

_strikingly

natural

(see

Appendix

B)

thus

_giving

a flavour of

_simplicity

to the whole

_procedure.

It also makes it

_possible

to switch from REML to ML or vice versa with _verylittle

change

in

_implementing

_computations

_(Foulley

et

_al,

_1994).

Some authors such as

Leonard

_(1975),

Denis

₍₁₉₈₃₎

and Anderson

₍₁₉₈₄₎

in statistics and Shaw

₍₁₉₈₇₎

in

quantitative genetics

have advocated the use of ML rather than REML

_procedures

to estimate variance

_components.

_Although

the interest of ML versus REML in

that case remains

_{questionable,}

ML estimates remain

_mandatory

for

_hypothesis

testing

about

13

via the LR statistic

_(see

the numerical

_example).

Bayesian point

estimators can also be envisioned via EM

_(Foulley

et

_{al, 1992;}

Gianola et

_{al, 1992;}

San Cristobal et

_al,

_1993;

_Weigel

and

_Gianola,

_1993;

_Foulley

and

_Quaas,

_1995).

In

_addition,

as

already pointed

out

_by

Denis et al

_(1996),

a LR test about

₍₃

requires

5 and T

_{(or 71)}

_being

the same for the null

_hypothesis

and its

_alternative;

the same rule holds in

_{hypothesis testing}

about 5

_{(or 71)}

by keeping

the other

(15)

and difficult

_problem

of

_joint

_modelling

of means and

_variances,

which is related to such issues as the Behrens-Fisher

_problem

and

_{multi-stage hypothesis testing,}

and which needs further consideration.

Another area that deserves caution and further

development

is that of

estima-bility.

Difficulties are

expected

when all the cells

_contributing

to an

element j

of

5 or A

(or

to a linear combination of

them)

have a

_{weight tending}

towards zero.

This _mayarise due to

_i)

purely overparameterization problems,

or due to

_ii)

pa-rameter values

_becoming

extreme

_(eg,

ratios _Ti

_tending

to zero

_implying

elements

of 71

_being

_infinite).

This last

phenomenon

is similar to what

happens

in the

anal-ysis

of

_binary

and ordinal data with latent variable models

_(Misztal

et

_{al, 1989;}

Fahrmeir and

_Tutz,

_1994).

Such difficulties can be avoided

_{by reparameterization}

(i),

or

_by

_setting

lower bounds to the

diagonal

elements of the

_system

of

_equations

to solve

_(ii).

Finally,

asymptotic

accuracy can also be worked out

_numerically

within the EM framework via the so-called SEM

_algorithm

_(Meng

and

_Rubin,

_1991).

_Although

it

only requires

the code of the

complete

data variance covariance matrix and of the EM or ECM

_outputs,

the burden of calculations is then

_heavier,

so that we

_suggest

that it is restricted to the

_simplest

models. Other

_computing

alternatives should also be

_considered,

_eg,the _averageinformation-restricted maximum likelihood

_algorithm

(AI-REML)

of Gilmour et al

_(1995).

ACKNOWLEDGMENTS

The author is

_grateful

to Elinor

_Thompson

_{(INRA, Jouy-en-Josas)}

and Dr Max Rothschild

_(ISU,

_Ames)

for the

_English

revision of the

manuscript

and to the two anonymous referees for their valuable comments

_especially

about some technicalities

of the EM

_theory.

REFERENCES .

Anderson TW

₍₁₉₈₄₎

An Introduction to Multivariate Statistical

_Analysis.

J

_Wiley

and

Sons,

New York

Dempster AP,

Laird NM, Rubin DB

₍₁₉₇₇₎

Maximum likelihood from

incomplete

data via the EM

_algorithm.

J R Statist Soc B _39,1-38

Denis JB

₍₁₉₈₃₎

Extension du mod6le additif

d’analyse

de variance par modélisation

multiplicative

des variances. Biometrics

39,

849-856

Denis

_{JB, Piepho HP,}

van

_Eeuwijk

A

(1996)

Mixed models for _genotype

_by

environment

tables with an

_emphasis

on

_{heteroskedasticity.}

Technical

_Report,

Laboratoire de

Biom6trie

DeStefano AL

₍₁₉₉₄₎

_Identifying

and

_quantifying

sources of

_{heterogeneous}

residual and

sire variances in

_{dairy production}

data. PhD

_thesis,

Cornell

_{University, Ithaca,}

New York

Fahrmeir L, Tutz G

₍₁₉₉₄₎

Multivariate Statistical

Modelling

Based on Generalized Linear

Models.

_{Springer Verlag,}

Berlin

Foulley JL,

Gianola

_D,

San Cristobal

_M,

Im S

₍₁₉₉₀₎

A method for

_assessing

extent and

(16)

Foulley JL,

San Cristobal

_M,

Gianola

_D,

Im S

₍₁₉₉₂₎

_Marginal

likelihood and

_Bayesian

approaches

to the

_analysis

of

_{heterogeneous}

residual variances in mixed linear Gaussian models.

_Comput

Stat Data Anal

_13,

291-305

Foulley

JL, H6bert

D,

Quaas

RL

₍₁₉₉₄₎

Inferences on

_homogeneity

of between

family

components of variance and _{covariance among environments in}balanced cross-classified

designs.

Genet Sel

_{Evol 26,}

117-136

Foulley JL, Quaas

RL

₍₁₉₉₅₎

_{Heterogeneous}

variances in Gaussian linear mixed models. Genet Sel

Evol 27,

211-228

Garrick

_DJ,

Pollak

_{EJ, Quaas RL,}

Van Vleck LD

₍₁₉₈₉₎

Variance

_{heterogeneity}

in direct and maternal

_weight

traits

_by

sex and _percent

_purebred

for Simmental sired calves.

J Anim Sci 67, 2515-2528

Gianola

_{D, Foulley JL,}

Fernando

_RL,

Henderson

_{CR, Weigel}

KA

₍₁₉₉₂₎

Estimation of

heterogeneous

variances

_{using empirical Bayes}

methods: theoretical considerations. J

_Dairy

Sci

75,

2805-2823

Gilmour

_{AR, Thompson}

R, Cullis BR

₍₁₉₉₅₎

_Average

information REML: an efficient

algorithm

for variance parameter estimation in linear mixed models. Biometrics 51, 1440-1450

Henderson CR

₍₁₉₈₄₎

Applications of

Linear Models in Animal

Breeding. University

of

Guelph, Guelph, Ontario,

Canada.

Lange

K

₍₁₉₉₅₎

A

_{gradient algorithm locally equivalent}

to the EM

_algorithm.

J R Statist Soc B

_57,

425-437

Laird

NM, Lange N,

Stram D

₍₁₉₈₇₎

Maximum likelihood

_computations

with

_repeated

measures:

_application

to the EM

_algorithm.

J Am Statist Assoc 82, 97-105

Leonard T

₍₁₉₇₅₎

A

_{Bayesian approach}

to the linear model with

_unequal

variances. Technometrics 17, 95-102

Meuwissen

THE,

De

_{Jong G, Engel}

B

₍₁₉₉₆₎

Joint estimation of

_breeding

values and

heterogeneous

variances of

_large

data files. J

_Dairy

Sci

_79,

310-316

Misztal I, Gianola

_{D, Foulley}

JL

₍₁₉₈₉₎

_Computing

_aspectsof a non linear method of sire

evaluation for

_categorical

data. J

_Dairy

Sci 72, 1557-1568

Meng XL.,

Rubin DB

₍₁₉₉₁₎

_Using

EM to obtain

_asymptotic

variance-covariance matrices: the SEM

_algorithm.

J Am Stat Assoc

_86,

899-909

Meng XL,

Rubin DB

₍₁₉₉₃₎

Maximum likelihood estimation via the ECM

algorithm:

A

general

framework. Biometrika 80, 267-278

Meyer K,

Carrick

J,

Donnelly

BJP

₍₁₉₉₃₎

Genetic _parametersfor

_growth

traits of Australian beef cattle from a multibreed selection

_experiment.

J Anim Sci _71,

2614-2622

Patterson

_{HD, Thompson}

R

₍₁₉₇₁₎

_Recovery

of interblock information when block sizes

are

_unequal.

Biometrika _58,545-554

Quaas

RL

₍₁₉₉₂₎

RL. REML Note

_{book, Mimeo,}

Cornell

_{University, Ithaca,}

New York Robert

_{C, Foulley JL, Ducrocq}

V

_(1995a)

Inference on

_homogeneity

of intra-class

corre-lations among environments

using

heteroskedastic models. Genet Sel Evol 27, 51-65

Robert

C, Foulley JL, Ducrocq

V

_(1995b)

Estimation and

_testing

of constant

_genetic

and intra-class correlation coefficients _{among environments.}Genet Sel Evol 27, 125-134

San Cristobal

_{M, Foulley JL,}

Manfredi E

₍₁₉₉₃₎

Inference about

_{multiplicative}

het-eroskedastic _componentsof variance in a mixed linear Gaussian model with an

ap-plication

to beef cattle

_breeding.

Genet Sel Evol 25, 3-30

Searle

_SR,

Casella

_G,

McCulloch CE

₍₁₉₉₂₎

Variance

Components.

J

_Wiley

and

Sons,

New-York

(17)

Visscher PM

₍₁₉₉₂₎

On the _powerof likelihood ratio tests for

detecting heterogeneity

of intra-class correlations and variances in balanced half-sib

_designs.

J

Dairy

Sci

_75,

1320-1330

Visscher

_{PM, Thompson R,}

Hill WG

₍₁₉₉₁₎

Estimation of

_genetic

and environmental

variances for fat

_yield

in individual herds and an _{investigation}into

_{heterogeneity}

of

variance between herds. Livest Prod Sci 28, 273-290

Visscher

PM,

Hill WG

₍₁₉₉₂₎

_{Heterogeneity}

of variance and

dairy

cattle

breeding.

Anim Prod 55, 321-329

Walker S

₍₁₉₉₆₎

An EM

_algorithm

for non linear random effects model. Biometrics _52,

934-944

Weigel KA,

Gianola D

₍₁₉₉₃₎

A

_{computationally simple Bayesian}

method for estimation

of

heterogeneous

within-herd

phenotypic

variances. J

_Dairy

Sci 76, 1455-1465

Weigel KA,

Gianola D, Yandel

BS,

Keown JF

₍₁₉₉₃₎

Identification of factors

_causing

heterogeneous

within-herd variance _components

_using

a structural model for variances.

J

_Dairy

Sci 76, 1466-1478

Zangwill

W

₍₁₉₆₉₎

Non Linear

_Programming:

A

_{Unified Approach.}

Prentice Hall,

Engle-wood Cliffs

APPENDIX A:

_Algebra

for the

_{estimating equations}

The

_Q

function to be maximized is

_(in

condensed

_notation)

(18)

Letting

!=<9(2Q)/<9 In

(y2 ei

Second derivatives: from

_!A4a!,

one has

(19)

or,

alternatively

Finally,

the non-linear

_system

to solve reduces to

Derivatives with

_respect

to T

_(ratio)

(20)

Additional derivatives for the exact EM

From the second

expression

in

_[A7]

it follows

immediately

One has also to _express

Now,

from

_[A7]

where

W r8

is a

(I x 1)

defined

_by

The

_system

for true EM

_(or

_gradient

EM without inner

_iteractions)

is then

Extension to K

_independent

random factors

Q

in

_[Al]

remains

_{formally unchanged}

with ei in

[A3]

being

now

Based on this

_expression

of the

residual,

formulae

[A4ab]

still hold.

(21)

or,

alternatively

As far as Tis

_concerned,

_[A6]

becomes

leading

to the

_following

_system

Derivatives with

_respect

to 7!

_(parameters

of the

_log-ratio)

Q

and the model for

_{In (}

_T

_e

_,

₂

are the same as in

[Al]

and

[A2]

but the vector of residuals is defined

as

o , I I

(22)

or, after some

_algebra

Furthermore,

This can be

_{easily generalized}

to several

_independent

random factors

if conditional maximization is

performed

factor

by

factor. The

_system

_[AI8]

_applies

to each factor k with

APPENDIX B: Formulae

_[12]

for

_grouped

data

In some instances

_(see,

_{eg, the}

_example

in table

_I)

data can be

_grouped

so that the

n

2 observations within a stratum i share the same covariates

(23)

Substituting

X

i

by

its

_expression

in

_[Bl]

_gives

where yzj is the

jth

element of yi, and

ECM approaches to heteroskedastic mixed models with constant variance ratios

HAL Id: hal-00894170

https://hal.archives-ouvertes.fr/hal-00894170

Submitted on 1 Jan 1997

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

ECM approaches to heteroskedastic mixed models with

constant variance ratios

Jl Foulley

To cite this version:

Original

article

ECM

approaches

to

heteroskedastic

mixed models with

constant

variance ratios

JL

Foulley

génétique quantitative

appliquée,

agronomique,

Jouy-en-Josas cedex,

(Received

February

accepted

May

1997)

techniques

having

heterogeneous log

by

dispersion

by

(ML)

(REML)

Estimating equations

using

expectation-conditional

(ECM)

algorithm

simplified

(gradient ECM).

approaches

proposed

allowing

hypothesis testing

analysis

example

theory.

/

model / maximum

/

présente

techniques

paramètres

logarithmes.

paramètres

dispersion

classique

(ML)

(REML).

équations

l’algorithme d’espérance-maximisation

(ECM)

simplifiée

gradient

approches

_approaches

_Foulley

_{génétique quantitative}

_appliquée,

_agronomique,

_{Jouy-en-Josas cedex,}

_February

_accepted

_May

₁₉₉₇₎

_having

_{heterogeneous log}

_by

_dispersion

_by

_(REML)

_{Estimating equations}

_using

_{expectation-conditional}

_(ECM)

_algorithm

_simplified

_approaches

_proposed

_allowing

_theory.

_{model / maximum}

_/

_paramètres

_dispersion

_(ECM)

_simplifiée

_gradient

_approches

_proposées,

_{d’hypothèse}

_/

_/

_recently

_generated

_{quantitative genetics}

_breeding.

_{begin with,}

_large

_experimental

_{heterogeneous}

_important

_production

_{heterogeneous}

_arising

_(Foulley

_{al, 1990;}

_{al, 1992; Weigel}

_al,

_1993;

_DeStefano,

_1994;

_Foulley

_Quaas,

_1995).

_(accuracy

_estimation,

_{handling large}

_sets),

_making

_parsimonious

_possible.

_accomplished

_i)

_{by modelling}

_potentially

_{heteroskedasticity,}

_ii)

_by

_assuming

_parameters

_{heterogeneity}

_log

_components

_{al; 1990,}