HAL Id: hal-00894022
https://hal.archives-ouvertes.fr/hal-00894022
Submitted on 1 Jan 1994
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Inferences on homogeneity of between-family
components of variance and covariance among
environments in balanced cross-classified designs
Jl Foulley, D Hébert, Rl Quaas
To cite this version:
Original
article
Inferences
on
homogeneity
of
between-family
components
of
variance and
covariance
among
environments
in
balanced cross-classified
designs
JL
Foulley
D
Hébert
2
RL
Quaas
3
1
Institut National de la Recherche
Agronomique,
Station deGénétique Quantitative
etAppliquée,
Centre de Recherches de
Jouy-en-Josas,
78352Jouy-en-Josas Cedex;
2
Domaine
Expérimental
Agronomie d’Auzeville,
Centre de Recherches deToulouse,
BP27,
31326 Castanet TolosanCedex, h
b
ance;
3
Cornell
University, Department of Animal Science, Ithaca,
NY14853,
USA(Received
4August
1993;accepted
29 November1993)
Summary - Estimation and
testing
ofhomogeneity
ofbetween-family
components of variance and covariance among environments areinvestigated
for balanced cross-classifieddesigns.
The variance-covariance structure of the residuals is assumed to bediagonal
and heteroskedastic. Thetesting
procedure
forhomogeneity
offamily
components is based on the ratio of maximizedlog-restricted
likelihoods for the reduced(hypothesis
ofhomogeneity)
and saturated models. Anexpectation-maximization
(EM)
algorithm
isproposed
forcalculating
restricted maximum likelihood(REML)
estimates of the residual andbetween-family
components of variance and covariance. The EM formulae toimplement
this are iterative and use the classicalanalysis
of variance(ANOVA)
statistics,
ie the between- andwithin-family
sums of squares andcross-products. They
can beapplied
both to the saturated and reduced models and guarantee the solutions to be in the parameter space. Procedures
presented
in this paper are illustrated with theanalysis
of 5
vegetative
andreproductive
traits recorded in anexperiment
on 20 full-sib families ofblack medic
(Medicago
lupulina
L)
tested in 3 environments.Application
to pure maximum likelihoodprocedures,
extension to unbalanceddesigns
andcomparison
withapproaches
relying
on alternative models are also discussed.genotype X environment interaction
/
heteroskedasticity/
expectation-maxi-mization/
restricted maximum likelihood/
likelihood ratio testLa
procédure
de testd’homogénéité
des composantesfamiliales
repose sur le rapport des vraisemblances restreintes maximisées sous les modèles réduit(hypothèse
d’homogénéité)
et saturé. Unalgorithme d’espérance-maximisation
(EM)
estproposé
pour calculer lesestimations du maximum de vraisemblance restreinte
(REML)
des composantes résiduelles etfamiliales
de variance et de covariance. Lesformules
EM àappliquer
sont itératives et utilisent lesstatistiques
classiques
del’analyse
de variance(ANOVA),
c’est-à-dire les sommes de carrés etcoproduits
inter- etintrafamilles.
Elless’appliquent
à lafois
aux modèles réduit et saturé et garantissent
l’appartenance
des solutions àl’espace
desparamètres.
Les méthodesprésentées
dans cet article sont illustrées parl’analyse
de 5 caractèresvégétatifs
etreproductifs
mesurés lors d’uneexpérience
portant sur 20familles
depleins frères
testées dans 3 milieux chez la minette(Medicago
lupulina
L).
L’application
au maximum de vraisemblance stricto sensu, la
généralisation
à desdispositifs déséquilibrés
ainsi que lacomparaison
à desapproches
reposant sur d’autres modèles sontégalement
discutées.interaction génotype x milieu
/
hétéroscédasticité/
espérance-maximisation/
maxi-mum de vraisemblance restreinte
/
rapport de vraisemblanceINTRODUCTION
There is a
great
deal of interesttoday
inquantitative
andapplied genetics
inheterogeneous
variances.Ignoring
suchheterogeneity,
as isusually
done,
maysubstantially
affect thereliability
ofgenetic
evaluation and thus reduce theefficiency
of selection(Hill,
1984;
Visscher andHill,
1992).
There is concern not
only
aboutestimating dispersion
parameters
for hetero-skedasticmodels,
but also abouttesting
hypotheses
for the realdegree
ofhetero-geneity
which can beexpected
fromexperimental
results. In thisrespect,
Visscher(1992)
investigated
the statistical power of the likelihood ratio test in balanced half-sibdesigns
fordetecting heterogeneity
ofphenotypic
variance and intra-class correlation between environments.In that
approach,
the(family)
correlation between environments(p)
is assumed to beequal
to1,
andheterogeneity
ofbetween-family
components
of covariance among environments inonly
due toscaling
of variances.The aim of this paper is to extend that
approach
to the case of truegenotype
by
environment interactions(p #
1).
Our attention will be focused on:i)
cross-classified balanced
designs;
andii)
the nullhypothesis
involving homogeneity
ofbetween-family
components
of variance and covariance between environments. This variance-covariance structure has beenwidely
used foranalyzing family
data recorded in differentenvironments,
inparticular
due to its close link with a2-factor classification model
(ie
family
andenvironment)
with interaction(Mallard
etal, 1983; Foulley
andHenderson,
1989).
Moreover,
even for balanceddesigns,
the estimation of the 2parameters
involved in thissimple
structure via maximum likelihoodprocedures
has noanalytical
solution in thegeneral
case when noassumption
is made about the residual variances. This motivated theproposal
made in thisstudy
to use theexpectation-maximization
(EM)
algorithm
(Dempster
etTHEORY
Generalities
Let us assume that the records from the balanced cross-classified
layout family
(or
genotype)
x environment can be written as:where y2!! is the
performance
of the kth progeny(or individual) (k
=1, 2, ... ,
n)
of the
jth
family
(or
genotype) ( j
=1, 2, ... ,
s)
evaluated in the ith environment(i
=1, 2, ... , p) ; b
ij
is the random effect of thejth family
in the ithenvironment,
assumednormally
distributed,
such thatVar(b
ij
)
=a
B
i,
Cov(bij,bi
’
j)
=O’Bii
,,
fori
! i’,
andCov(bi!,bi!!!)
= 0for j # j’
andany
i andi’;
ande2!k is a residual
effect
pertaining
to the kth progeny in the subclassij,
assumedA!77D(0, <7!)
viz,
normally
andindependently
distributed with mean zero and varianceU2
wi
Using
vectornotation,
ie yjk ={
Yijk
}, !!
=Ig
il
,
bj =
{bij}
and ej k ={e2!k}
for i =1, 2, ... ,
p, the model
[1]
canalternatively
be written as:where
b
j
rv
N(0,!3)
and ejk rvN(0,E!),
withE
B
={a-Bii’
}
’
standing
for the(p
xp)
matrix ofbetween-family
components
of variance and covariance between environments and£w
=Diag{
Q
w
i
} for
the(p
xp)
diagonal
matrix of residualcomponents
of variance.&dquo;
Actually,
thisapproach
consists ofconsidering
theexpression
of the trait in different environments(i, i’)
as that of 2genetically
related traits with a coefficientof correlation pii, =
asi!!/!B!!s!&dquo;
(Falconer, 1952).
In a
given
environment(i),
this1-way
linear modelgenerates
the classical ANOVAstatistics,
ie thebetween-family
(SB!,
B
i
)
andwithin-family
(Swi, W
i
)
sums of squares and mean squares,
respectively,
whose distributions arepropor-tional to
chi-squares:
Due to the cross-classified structure of the
design,
one also has to consider a sum(S
B
..,)
and a mean(BZi!)
between-family cross-product
for each(ii’)
combinationIf we let yj_ =
lyij. 1,
Y.. ={Yi..},
then the matrixS
B
={BBi&dquo;}
} with
elements from[3a]
and[4],
such that:&dquo;&dquo;
has a Wishart
distribution,
denotedW(r,
s -1),
withparameters
(s — 1)
andr =
Eyv
+nE
B
,
thusgeneralizing
to a matrix ofbetween-family
sums of squaresand
cross-products
the(o,2 w
i
+naBi)X!s_1!
distributionarising
in(3a!.
In the 1-dimensional case, the set of
SB!
andS
w
.
areindependent,
locationinvariant sufficient statistics for
a-
1
andw
i
similarly
the matricesS
B
and
Syv
=Diag{5w,} have
the sameproperty
forE
B
andEyv. Hence,
one can writethe
density
of[5]
asSimilarly,
Using
[6a]
and[6b]
in theexpression
for thelog
likelihood,
where Ct is a constant. This leads to:
with W =
Diag{W
d
=Syv/s(n - 1)
andtr(.)
= the traceoperator.
Notice that maximization of
[7]
yields
REML estimators ofE
B
andEyj,
because themarginally
sufficient statisticsS
B
andSyir
are used in thelog-likelihood
function. Under the saturated model(a wi 2 7!
0
,2 Wi,
andUBij
, 0
!8.!!.!!!
for anyi, i’, i&dquo;
andi&dquo;’),
thepartial
derivativeswith respect
tog
o
B
and
0’
2
wi
of minus twice thelog
likelihood(-2L)
are:8F/8aB;;,
is a(p
xp)
matrixhaving n
as the(i, i’)
element and 0elsewhere,
so that the
equation
[8a]
= 0gives
f
= B.Similarly,
8£w /
8
a
/
j
has 1 as the ithdiagonal
element
and 0 elsewhere. Given thatE
w
and W arediagonal
matrices and thatf
=B,
the solutions toequations
[8a]
= 0 and[8b]
= 0 areprovided
that B — W ispositive
definite. The maximum of thelog-likelihood
function is then(apart
from aconstant):
Otherwise,
REML estimates ofE
B
andE
W
are nolonger
identical to ANOVAestimates and
require
the use of anotheralgorithm
for their calculation(see
Appendix
A).
The null
hypothesis
consists ofassuming
thehomogeneity
of thebetween-family
components
of variance(’di,
or
2
=CT
2
)
and covariance(Vi #
i’,
a-Bii’ =C
B
)
aspostulated
in manyanalyses
of genotype
by
environmentexperiments
(Dickerson,
1962; Yamada, 1962;
Mallard etal,
1983).
Theapproach presented
in this paper allows us to test thissimplified
structure ofE
B
against
Falconer’s saturated model for any structure of the residual variances. The nullehypothesis
(H
o
)
considered here can be written as:where
Ip
=identity
matrix of orderp and
Jp
=(p
xp)
matrix of ones.Under
H
o
,
REML estimation ofE
B
andEyv
becomes much morecomplex.
Here8F/8a
§
=nIp
and ar/aC
B
=n(J
P
-
Ip)
result in thefollowing equations:
were
lp = (p x 1)
vector of ones.Since
r-
1
-
I
-
F-’BF-
1
,
the REML solution for the residualcomponents
([8b])
is no
longer
Êw
= W and thesystem
ofequations
[8b] (see
also(B11!), [12a]
and[12b]
has noanalytical
solutions in thegeneral
case. This was the reasonmotivating
our search for another
approach
forcomputing
REML solutions toE
B
and£
w
under
Ho.
An EM
diagonalization approach
The
expectation-maximization approach
is a very efficientconcept
in maximum(auaas,
1992).
The basicprinciple
is to treat the unobservable random variablesb
ij
and e2!! asmissing
data.Actually,
the EMalgorithm
will not beapplied
directly
to the model described in
[1]
and[2]
but after aspectral decomposition
ofE
B
according
to itseigenvalues
andvectors,
ie:In this
formula,
A =Diagf6
il
is the(p
xp)
matrix ofeigenvalues 6
i
,
with 6
i
repeated
as many times as itsmultiplicity
order,
and U =(U
1
,
U
2
, ... ,
Ui, ... ,
Up)
is the
(p
xp)
matrix of thecorresponding
p normedeigenvectors
U
i
ofE
B
(U’U
=I
P
).
Under thespecial
form shown in!11!,
E
B
hasonly
2 distincteigenvalues:
with
multiplicity
orders 1 and(p — 1)
respectively.
Moreover,
the matrix U ofeigenvectors
does notdepend
on the values of6
1
and6
2
,
U’being
the Helmertmatrix of order p, see for
example Searle,
1982(p
71 and322)
for more details about such matrices. Forinstance,
for p =3,
Due to the invariance
property
of(RE)ML
estimators,
the one-to-one transfor-mation in[14a]
and[14b]
allows us tochange
theparameterization
from(<
T1
,C
B
)
to(6
1
,b
2
),
or moreconveniently
to(<!i,T)
where T =6
1
+(p — 1)6
2
,
the back transformation is:From the
spectral decomposition
ofE
$
,
the model in[2]
can be written as:where U is
defined
as before andthe vector
f
j
={ fi!
is
such
thatf
j
NN(0,A).
Using
theDempster
et al(1977)
terminology,
acomplete
data set x can beconstructed from /-1,
f
j
,
jkefor j
=
1,2,...,s
and k =1, 2, ... ,
n,
whereas theincomplete
data set is the vector y ofobservations.
’Let us first consider the case of
E
B
.
If thef
j
’s
wereknown,
sufficient statisticsREML would then be obtained
by equating
theexpectation
of these sufficientstatistics,
ie:to their calculated values
(M step).
Actually,
these sufficient statistics are notdirectly
observable and the EMalgorithm proceeds
firstby estimating
themby
taking
their conditionalexpectation
given
the observed data set(E step).
Since such an estimationdepends
on the value of the unknownparameters,
theprocedure
is iterative and consists of
implementing
the 2 usualsteps:
Estep:
at iteration(t!,
calculateM
step: compute
6!&dquo;’I
andT[
t+
1
]
from thefollowing
equations:
As shown in
Appendix A,
the(p
xp)
matrixA’
l
can beexpressed
as:where
U,
B and r are defined as above(see !13!, [5a]
and[5b] respectively)
andC!t!
is the matrix of variance of
prediction
errors of[
t]
=E(f
j
)
y,
8!t] ,
r
[t]
,
S!),
the bestpredictor
off
j
at iteration[t]
such that:Similarly,
sufficient statistics forE
w
under thecomplete
data set x are:and the E and
M ’steps
are
as follows:For the E
step,
at iteration[t],
calculate:using
thefollowing
formula based on the samereasoning
aspreviously
(see
Ap-pendix
A):
sums of squares and
cross-products
(y..
being
defined asFor the M
step
compute
the next value ofEyv
from:Formulae
[25]
and[24a]-[24b]
define the E and Msteps,
respectively,
of an EMparameters.
Notice that in this schemetr(P)/p (average
diagonal
element ofP)
and((1’P1) -
tr(P!!/p(p -
1)
(average
off-diagonal
element ofP)
behave as sufficientstatistics for aB and
C
B
withrespect
to thecomplete
data set.Formulae for the residual
components
areunchanged
withUC
[t]
U’
=:E
W
1 E!y!
andM[
t
]
=
n£ §
I
(r
iti
1
. For
the saturatedmodel,
the formulae toapply
are the same forE
W
and, simply,
&dquo;
+
t
f
E
=Il
ltl
ls
forE
B
.
Testing
procedures
Hypotheses
of interest concern the vector 0 ofparameters
involved in the matrices ofbetween-family
(E
B
)
andwithin-family
(Ey!)
components
of variance and covariance between environments. Thetheory
of thegeneralized
likelihood ratiocan be
applied
to that purpose, asalready proposed by Foulley
et al(1990,
1992),
Shaw(1991)
and Visscher(1992)
among others.Let
o
:
H
0 E8
0
be the nullhypothesis
and:
H
I
0 6 8 -8
0
itsalternative,
where 8 refers to thecomplete
parameter
space andO
o
,
a subset of itpertaining
to
H
o
.
UnderH
o
,
the statisticwhere
L(0;y)
is defined as in!7!,
has anasymptotic
chi-square
distribution with rdegrees
offreedom,
rbeing
the difference in the numbers of estimableparameters
involved in e and
O
o
(Mood
etal,
1974).
Here e contains
p(p
+3)/2
parameters
corresponding
to p residualcomponents
of variance andp(p
+2)/2
between-family
components
of variance and covariancebetween environments whilst
Oo
hasp+2
2 parameters
only
(p
residualcomponents,
<T1 and
C
B
),
so that r =!p(p+ 1)/2! -
2.In the
Neyman-Pearson approach
ofhypothesis
testing, H
o
isrejected
at thea level if the calculated value of
A(y)
exceeds a critical valueÀ
c
such thatPr(xr >
À
c
)
= a.However,
the likelihood ratio statistic.!(y)
in[24]
can alsobe
interpreted
as the difference indegree
of fit via maximum likelihoodprocedures
by
2 models: a reducedmodel(R)
withparameter
vector 0 EO
o
and a full model(F)
with 0 E Oencompassing
both the nullhypothesis
and its alternative. In thetheory
ofsignificance
testing
(Kempthorne
andFolks,
1971),
this statistic is also used as a measure ofstrength
of evidenceagainst
the reduced model or the nullhypothesis.
The lower theprobability
underH
o
ofexceeding
this statistic evaluatedfrom the data
(also
referred to as the P-value orsignificance
level or size of thetest),
thestronger
the evidenceagainst
H
o
.
Example
Data used here to illustrate the
procedures
are from anexperiment
carried out inMontpellier
(south
west ofFrance)
on 20 full-sib families of black medic(Medicago
lupulina
L)
tested in 3 different environments(control,
harvesting
andcompetition
treatments).
The
experimental design
was described in detailby
H6bert(1991).
There werereplicate. Thirty-six
traits were recorded and the variable used was the mean of the 5plants
cultivated in eachreplicate
so that p =3,
s = 20 and n = 2.Basic ANOVA statistics for the
between-family
andwithin-family
sums of squares andcross-products
aregiven
in table I for a subset of 5 traits.Firstly,
the nullassumption
that thediagonal
terms ofE
w
wereequal
was testedvia a Bartlett’s test based on ANOVA mean squares statistics. P values were
0.007,
0.08, 1.4
x10-
7
,
8 x10’!
and 0.04 so that thisassumption
can bereasonably rejected
(except
perhaps
for trait2).
Test statistics about
E
B
and estimates ofE
B
and£w
under both the reduced and saturated models aregiven
in table II. P-values forvegetative
yield
traits,
represented
hereby dry
matterweight
(trait
No3)
anddry
matterweight/max
plant
size diameter(trait
No4),
were verylow,
indicating
alarge heterogeneity
ingenetic
variation between evironments with full-sib variancessubstantially
reduced in theharvesting
(i
=1)
andcompetition
(i
=3)
environmentscompared
with the control(i
=2).
Incontrast,
theharvesting
andcompetition
environments do notgenerate
ameaningful
level of stresscompared
with the control for theexpression
ofgenetic
variation ofdays
to 1stripe pod
(trait
No2)
and relativepod weight
(trait
No5).
These 3 environments then behave as’exchangeable’,
as statisticians would say. In thisexample,
genetic
correlations between environments were ratherhigh
and it would have beeninteresting
to test for some traits(eg
No 1 and5)
using
theassumption
that these correlations areequal
tounity
by
Visscher’s(1992)
DISCUSSION
This paper describes a further contribution to the solution of the
problem
oftesting
homogeneity
ofbetween-family
components
of variance and covariance between environments in the case of balanced cross-classifieddesigns.
Thetesting
procedure
is based on the likelihood ratio test asalready
advocatedby
Shaw(1987)
in
quantitative genetics.
Thisstudy
extends that of Visscher(1992),
which wasrestricted to the case of pure
scaling
effects between environments(ie
allgenetic
correlations between environments
equal
toone).
The choice of an EM
algorithm
forcomputing
REML estimates ofE
B
andE
w
under the null
hypothesis
allows us to makeexplicit
theequations
of the iterative process toimplement
via formulae based on the usual ANOVA statistics. Thisalgorithm
does notrequire
any constraint on the value of these ANOVA statistics( eg
B - W can be
non-positive
definite)
provided
thestarting
values forE
B
are withinthe
parameter
space. Asimple
reason for this is that the Ephase
under the restrictedmodel involves the conditional
expectation
(given
thedata)
of sums of squares, eg£ f£
and
e
e !!
as estimators of variance. Because E(L
f
l
f
ly,
A = 8!t! )
isj 7 k j
,
always positive
definite(Foulley
etal,
1987),
thisproperty
of the EMalgorithm
is also true under the saturatedmodel;
it can then be used toprovide
REML estimates ofE
B
andE
w
when ANOVA estimators are notpermissible
(eg
for traits1,
3,
4 and 5 in theexample).
Some authors such as Anderson
(1984)
and Shaw(1987)
advocate the use ofML rather than REML
procedures
to testhypotheses
about variance covariance matrices. Our EMalgorithm
can beeasily adapted
to obtain such ML estimates ofE
B
andE
w
.
It suffices toreplace
(s-1)B+r
in formulae[19]
and!21!
for A and Qrespectively by
(s-1)B
corresponding
to thechange
in the conditionalexpectation
(given
thedata)
of n (y
j
. -
p) (yj_ - !)’
according
towhether g
is considered asj
a
parameter
of interest(ML)
or a nuisance factor to beintegrated
out(REML).
This
algorithm
can also retrieve the usual ANOVA estimates forE
B
andEw
using
the2-way
crossed-mixed model:involving
fixed environmental effects(h
i
),
randomfamily
effects[s
j
- NIID (0, a/ ) ] ,
random interactions[hs
ij
-
NIID(0,afl!)]
and residuals[e
ijk
°° NIID(0,a
£
)].
Infact, Foulley
and Henderson(1989)
showed that this is anequivalent
modelfor a
simplified
version of themultiple
trait model in[1]
restricted toE
B
=(
U2
B -
C
B
)Ip
+ C
B
Jp
andE
w
=Q!2,yIP
with0,2 B
=Q
s
+
or2!’, h
C
B
= or
.
and
ow
2
=O
re.
2
Notice however that this
simplified multiple
trait model differs from that consideredthroughout
thisstudy
(see
[11])
notonly by
theassumption
of homoskedasticresidual variances but also
by
its restriction to apositive
covariance(C
B
)
betweenenvironments.
For
instance,
for trait 1(days
toflowering),
EM-REML estimates of variance and covariancecomponents
obtained from thealgorithm
described in[17]
to[22]
withvalues can
easily
be checked with ANOVAestimatps:
a2
=79.89;
&2,
=0.97;
anda
2
= 26.39. Hereagain,
the EMalgorithm provides
estimates within theparameter
space, which is notalways the
case with ANOVA estimators as shown for instancewith trait 5 :
&1 =
79.50, C
B
= 79.43 anda2
= 23.23 with EMversus as
=79.65,
!hs
= -1.02 andQe
=24.07 with ANOVA.The EM
algorithm
is a first-orderprocedure
and therefore has closerelationships
with a maximization
procedure
based on zeroed first derivatives. As shown inAppendix
B in the case ofE
B
,
the difference between the formulae toimplement
in the 2 iterativeprocedures
consists ofreplacing
(s - 1)B
+r!t!
in[19]
by
sB in the 1st derivativealgorithm
(B
being
a sufficient statistic for r in the saturatedmodel).
Again,
the use of EMguarantees
staying
in theparameter
space whereas there is no obviousproof
of that for the 1st derivativealgorithm.
Moreover,
the EMapproach
turns out to be easier to understand and to use than the other one, which reliesmainly
onalgebraic
tricks. As far as REML estimates of£w
areconcerned,
a functional iterationalgorithm
based on first derivatives was alsoproposed
inAppendix
B due to the lack of an obviousanalogue
of the EM formulae.Finally,
this EMreasoning
can be extended to an unbalanced structure of data and to additional nuisance fixed effects cross-classified withfamily
effects,
ie to anextended version of model
[1]
such aswhere
13
is a vector of fixed effectsincluding
the ith classification for environment and effects of other factors to beadjusted
for,
andx’ij
k
is thecorresponding
incidence(row)
vector. Under thatmodel,
the E and Msteps
defined in[17]
and[18a]
and[18b]
forE
B
and in[21]
and[23]
forE
w
are still valid.However,
theE-statistics in
!17!, [21]
and[24b]
are not evaluated as functions of ANOVA statistics butdirectly
from the numerical values of the BLUP and the variance ofprediction
errors of the
f
j
orb!’s.
In thissituation, also,
the EMalgorithm
should beapplied
systematically
to both the reduced and saturated model forEy!.
Anotherapproach
would be to write
[28]
under itsequivalent
form(for
C
B
>0):
where
hi, sj
andhsjj
are defined as in[25]
and e2!! NNID(O,
o,2w.).
Under sucha mixed-model
structure,
one can then use the methodsdeveloped
by Foulley
et al(1990, 1992)
and San Cristobal et al(1993)
forcalculating
REML estimates of variances in the presence ofheterogeneous
residualcomponents.
However,
theprocedure
derived in this paper remainsdefinitively
moregeneral,
forinstance,
itcan also be
easily applied
to anon-diagonal
structure ofE
w
using
formulae[17]
to[22]
unchanged
and[23]
slightly
modified intoE!+l]
=
S2!t!/ns.
This paper deals with a null
hypothesis
of constantbetween-family
variance and covariance. In someinstances,
a moreappropriate
nullhypothesis
would be aconstant
between-family
correlation(p)
between environments(a-
B
ii
’
=/9<?’B,<!B,/)
and/or
of constant intraclass correlation[a-1i
=t(a-1i
+
or2
wi
)].
Testing
procedures
for these
assumptions
will bereported
in aseparate
article.
ACKNOWLEDGMENTS
The authors are
grateful
to I Olivieri(INRA, Montpellier)
forstimulating
discussions which motivated thisstudy,
to J Ruane for theEnglish
revision of themanuscript
and to the anonymous referees for their valuable comments.Special
thanks areexpressed
to C Robert and M San Cristobal who read themanuscript thoroughly
and checked the numericalexample.
REFERENCES
Anderson TW
(1984)
An Introduction to Multivariate StatisticalAnalysis.
JWiley
and
Sons,
New YorkDempster
AP,
LairdNM,
Rubin DB(1977)
Maximum likelihood fromincomplete
data via the EMalgorithm.
J R Statist Soc B39,
1-38Dickerson GE
(1962)
Implications
ofgenetic-environmental
interaction in animalbreeding.
Anim Prod4, 47-63
Falconer DS
(1952)
Theproblem
of environment and selection. Amer Nat86,
293-298Foulley
JL,
Henderson CR(1989)
Asimple
model to deal with sireby
treatment interactions when sires are related. JDairy
Sci72,
167-172Foulley JL,
ImS,
GianolaD,
Hoeschele I(1987)
Empirical Bayes
estimation ofparameters
for npolygenic binary
traits. Genet SelEvol 19,
197-224Foulley JL,
GianolaD,
San CristobalM,
Im S(1990)
A method forassessing
extentand sources of
heterogeneity
of residual variances in mixed linear models. JDairy
Sci
73,
1612-1624Foulley
JL,
San CristobalM,
GianolaD,
Im S(1992)
Marginal
likelihood andBayesian
approaches
to theanalysis
ofheterogeneous
residual variances in mixed linear Gaussian models.Comput
Stat Data Anal13,
291-305H6bert D
(1991)
Plasticiteph6notypique
et interactiongenotype
milieu chez Medi-cagoLupulina.
These Doctorat Sciences. Univ SciencesTechniques
duLanguedoc,
Montpellier
Hill WG
(1984)
On selection among groups withheterogeneous
variance. Anim Prod39,
473-477Kempthorne
0,
Folks L(1971)
Probability,
statistics and dataanalysis.
The Iowa StateUniversity
Press,
Ames(10)
Mallard
J,
MassonJP,
Douaire M(1983)
Interactiongenotype
x milieu et mod6le mixte. I. Mod6lisation. Genet Sel Evol15,
379-394Meyer
K(1990)
Present status ofknowledge
about statisticalprocedures
andalgorithms
to estimate variance and covariancecomponents.
In:4th
WorldCongress
on Genetics
Applied
to LivestockProduction,
Edinburgh,
23-27July
1990,
vol13,
(WG
Hill,
RThompson,
JAWooliams,
eds),
407-418Mood
A, Graybill
FA,
Boes DC(1974)
Introduction to theTheory of
Statistics. Mc Graw-HillInc,
London(auaas
RL(1992)
REML NoteBook.Mimeo, Dep
AnimSci,
Cornell Univ Ithaca(NY)
San Cristobal
M, Foulley JL,
Manfredi E(1993)
Inference aboutmultiplicative
heteroskedastic
components
of variance in a mixedlinear
Gaussian model with anSearle SR
(1982)
MatrixAlgebra
Useful
to Statistics. JWiley
andSons,
New YorkShaw RG
(1987)
Maximum likelihoodapproaches applied
toquantitative genetics
of natural
populations.
Evolution41,
812-826Shaw RG
(1991)
Thecomparison
ofquantitative genetic
parameters
betweenpopulations.
Evolution45,
143-151Visscher PM
(1992)
On the power of likelihood ratio tests fordetecting
heterogene-ity
of intra-class correlations and variances in balanced half-sibdesigns.
JDairy
Sci75,
1320-1330Visscher
PM,
Hill WG(1992)
Heterogeneity
of variance anddairy
cattlebreeding.
Anim Prod
55,
321-329Yamada Y
(1962)
Genotype by
environment interaction andgenetic
correlation of the same trait under different environments.Jap
J Genet37,
498-509APPENDIX
An EM
algorithm
for REML estimationsof E
B
andE
w
(part A)
E
B
andEyv
underH
o
An
explicit
formula for A can be obtainedby successively conditioning
anddeconditioning
theexpression
in[17]
withrespect
to the mean vector p, ie:where 0 stands for the vector of
parameters
involved in the matrices ofbetween-family
(E
B
)
andwithin-family
(E
w
)
components
of variance and covariance between environments and0
1
’
is the current estimate of 0 at iteration!t!.
s
Now, conditionally
on y, p and 0 =0’
l
,
theexpectation
ofEf
j
f
can
be j= 1decomposed
into:This
decomposition
isespecially helpful
because itallows
us to introduce the usual statistics of Gaussianmodels,
ie the conditional meanf
j
=E(f!
ly,
1-1,e)
and itsprediction
error varianceC
j
=The next
step
is tospecify
theexpression
for:with
respect
to the distribution of41
y,
e. On account ofassumptions
made in!1),
the distribution of this random variable isN[y.., (E
B/
s)
+(Ew/ns)),
so that:The formula for the variance of
prediction
error in[A3b]
does notdepend
on p, so that theexpression
of(Al]
reduces to:where
C!t!
is defined as in[A3b]
using
0!
as a current estimate of 0 inEw
andE
B
,
the matrices B and Ubeing
constant over rounds of iteration.The next values of the unknown
components
ofE
B
(ie
B
a
andC
B
or6
1
andT
)
arecomputed
at the Mstep
from thediagonal
elements ofA’
l
(!18a!
and!18b!).
The
diagonal
terms of the matrixE
w
under thecomplete
data set xsince:
Because this statistic is not
observable,
it isreplaced by
its conditional expec-tationgiven
the data y and e =9!.
As forB-components,
thisexpectation
is calculated afterconditioning
anddeconditioning
withrespect
to the mean vector !, ie:where
&dquo;
0
and C are as before.From
[A9a],
thequadratic
L êjkêjk
can be written asjk
with M =
nUCU’E-1
The next
step
is to take theexpectation
of each element in theright-hand
side withrespect
to the distribution ofgly,
0. Thenwhere T
designates
the matrix of total mean squares and meancross-products,
Similarly,
Placing
[All], [A12]
and[A13]
in[A8]
andnoting
that the conditional variance in[A9b]
does notdepend
on !,gives
In
fact, ft
is evaluatedconditionally
on 0 =0!,
ieby taking
F =F
l
’
]
,
C =Cl’l
and M =M’
l
in[A. 14].
The next value ofor
2
E
B
andEyjr
under the saturated modelActually,
the EMalgorithm
describedpreviously
can beeasily
accomodated to deal with the saturated model. This isespecially
helpful
when the ANOVA estimates ofE
B
fall outside theparameter
space.Nothing
ischanged
withrespect
toEyv,
which has the samediagonal
structure with p different elements in both situations. As far asE
B
isconcerned,
a sufficientstatistic under the
complete
data set is now the(p
xp)
matrixbjb
’
.
3
j
U ( ! f! f! ) U’.
However,
for agiven U,
thegeneral
expression
of the conditional jexpectation
ofL
:
fj f)
given
the data wasalready
derived(see
[19]
and[A5],
so thatj
the E
step
remains the same.Because all the elements of A are now
required,
thechanges
toimplement
at the Mstep
are thefollowing:
compute
the next valueI:!+1]
ofE
B
(saturated
model)
by
where
A’
l
is obtained from[A5]
withU
M
being
the matrix of normedeigenvectors
ofE
W
.
Notice that here U isupdated
at each iteration from theequation
E[t]U
lt]
=
U[t]![t].
Algorithms
based on first derivatives(part
B)
Between
family
components E
B
Using
thespectral
decomposition
in!13!,
ie r = nU’!U +E
w
,
we obtainwhere
Uip
Ui
2
>&dquo;&dquo;
Uie, ... ,
Ui
r-
arethe ri
normedeigenvectors
ofE
B
correspond-ing
to theeigenvalue 6
i
withmultiplicity
order ri. Remember that under the reducedmodel,
rl = 1 and r2 =p — 1 for
6
1
and6
2
defined in[14a]
and[14b],
respectively.
Substituting
8F /861
by
itsexpression
[B1]
in theequation
<9(-2L)/<9<*’t
= 0 leadsto:
Let L =
vn:U ! 1/2
with Lpartitioned
in the same way asU,
ieLie
=vn
-
6iUi,,
where
(A)
i£ie
stands for thei
f
th
diagonal
element of the A matrix. Now: and withThen,
from and Furthermore: or so that andUsing
[B7]
and!B8!,
thesystem
[B3]
reduces toThe EM
procedure
described in(18a!,
[18b]
and[19]
can bealternatively
writtenThe
parallel
between[B9]
and[B10]
isstraightforward.
Here Breplaces
((s - 1)B
+r!/s,
since these 2quantities
have the sameexpectation
F under thesaturated model.
Residual
components
Ew
<9F