HAL Id: hal-00894050
https://hal.archives-ouvertes.fr/hal-00894050
Submitted on 1 Jan 1994
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
A reparameterization to improve numerical optimization
in multivariate REML (co)variance component
estimation
E Groeneveld
To cite this version:
Original
article
A
reparameterization
to
improve
numerical
optimization
in
multivariate
REML
(co)variance
component
estimation
E
Groeneveld
Institute
of
AnimalHusbandry
and AnimalBehaviour,
FederalAgricultural
ResearchCenter,
Höltystr
10, 31535Neustadt, Germany
(Received
28February
1994;accepted
15 June1994)
Summary - Multivariate restricted maximum likelihood
(REML)
(co)variance
componentestimation
using
numericaloptimization
on the basis ofDownhill-Simplex
(DS)
orquasi-Newton
(QN)
procedures
suffers from theproblem
of undefined ’covariance matrices’ as areproduced by
theoptimizers.
Sofar,
thisproblem
has been dealt withby assigning
’bad’ function values. For thisprocedure
towork,
it isimplied
that the information this ’bad’ function value conveys is sufficient to avoidgoing
in the same direction in thefollowing
optimization
step. To a limiteddegree
DS can cope with this situation. On the other handQN usually
breaks down if this situation occurs toofrequently.
This contributionanalyzes
theproblem
and proposes a reparameterization of the covariance matrices to solve it. Asa
result,
fasterconverging QN optimizers
can beused,
asthey
nolonger
suffer from lack of robustness. Four real data sets wereanalyzed using
a multivariate modelestimating
between 17 and 30(co)variance
componentssimultaneously. Optimizing
on theCholesky
factor instead of on the
(co)variance
components themselves reduced thecomputing
timeby
a factor of 2.5 to more than250,
whencomparing
the robust modified DSoptimizer
operating
on theoriginal
covariance matrices to aQN
optimizerusing
reparameterized covariance matrices.multivariate REML
/
optimization/
quasi-Newton/
Downhill-Simplex/
reparame-terizationRésumé - Un reparamétrage pour améliorer l’optimisation numérique dans une
estimation REML multivariate de composantes de variance-covariance. L’estimation
du ma!imum de vraisemblance restreinte
(REML)
des composantes de variance-covarianceà l’aide des
procédures numériques d’optimisation Simple!-Descendant
(SD)
ou quasi-Newton(QN)
se heurte à ladifficulté
résultant de laproduction
de matrices de covariancesnon
définies. Jusqu’à présent,
cettedifficulté
a été résolue en attribuant une vraisemblancearbitraitement mauvaise à de telles matrices, de
façon
à éviter de revenir dans cetteconverge
plus lorsque
cette situation sereproduit
trop souvent. Cette note propose unreparamétrage
pour résoudre leproblème.
Il devient ainsipossible
d’utiliser laprocédure
QN
dont la convergence estrapide
et la robustesse assurée.Quatre fichiers
de donnéesont été
analysés
pour estimer simultanément de 17 à 30 composantes de(co)-variance.
L’optimisation
dufacteur
deCholesky
au lieu des composantes elles-mêmes réduit le temps de calcul d’unfacteur
compris entre 2,5 etplus
de500,
quand
oncompare la procédure
QN
avecreparamétrage
des matrices de covariance à laprocédure
SDmodifiée appliquée
aux matrices de covariance
d’origine.
REML multivariate
/
optimisation/
quasi-Newton/
Simplex-Descendant/
repara-métrage
THE PROBLEM
In restricted maximum likelihood
(REML)
(Patterson
andThompson,
1971),
maximization of the likelihood is doneusing
either the EMalgorithm
(Dempster
etal,
1977)
orprocedures
that do notrequire
explicit
derivatives. Aproblem
specific
tothe latter class of
optimizers
is addressed in this paper. Graser et al(1987)
proposed
a
sampling technique
thatspanned
thecomplete
parameter
space for asingle
traitanalysis.
Meyer
(1989)
used aDowhnill-Simplex
(DS)
andquasi-Newton
(QN)
technique,
Kovac(1992)
modified the DSprocedure
and alsoexpanded
Powell’s method ofconjugate gradients.
All these authors had to deal with
problems
arising
fromparameters
andposed
by optimizers
that lie outside theparameter
space.Despite
thisproblem,
a host ofREML estimates have been
reported
with avarying
number of simultaneous traits(Groeneveld,
1991; Meyer, 1991;
Tixier-Boichard etal, 1992;
Ducos etal, 1993;
Spilke
etal, 1993;
Mielenz etal,
1994).
Consider a mixed linear model with one random effect and no
missing
values asspecified
inequations
[1-5].
With a residual and an additivegenetic
effect thelog
likelihood inequation
[6]
must be maximized. Thecomputational
procedure
requires setting
up andsolving
the mixed-modelequations
(MME).
where:
y - vector of observations
Z - incidence matrix for the animal effect
j3 -
vector of unknownparameters
for fixed effectsu - vector of unknown
parameters
for the animal effecte - vector of residuals
A -
relationship
matrix of order number of animals and their known ancestorsRo -
residual(co)variance
among traitsGo -
covariance matrix for additivegenetic
effects among traits - Kroneckerproduct
This
requires
a set of covariance matrices for both the residual and the additivegenetic
components
R
o
,
Go.
Their inverses are used insetting
up the MME.After
setting
up the MME thesystem
ofequations
may be solvedby Cholesky
factorization and backward substitution.
where
LV - value
proportional
to thelogarithm
of the likelihood functionb° -
solution vector of the MME C*
- inverse of the coefficient matrix of the MME W -
(XIZ)
n a
- number of animals
n - number of observations
In
DS,
acomplete
vertex iscomputed
before theoptimization begins.
The DSprocedure
used here follows Kovac(1992),
andis,
thus,
very different from theoriginal
DS asproposed
by
Nelder and Mead(1965).
Initially,
itperforms frequent
restarts,
terminating
the iteration atincreasing
accuracy. Thisprocedure
alleviated the well-knownproblem
of the DS toget
stuck atsuboptimal points.
InQN,
gradients
arerequired. They
may either besupplied
in theiranalytical
form orapproximated
using
finite differences(Schnabel
etal,
1982).
REML is an iterative
procedure
and so theR
°
andGo
matrices have to be validfor each round when the MME are set up.
Thus,
initially,
valid covariance matrices have to beprovided
to thealgorithm.
For a 2-trait model with 1 additivegenetic
component
the residual and additivegenetic
covariance matricesR
o
andGo
of dimension 2 have to be estimatedamounting
to anoptimization
in a 6-dimensionalparameter
space.However,
in thefollowing
iteration,
new sets of(co)variance
areprovided
by
the
optimizers.
For both DS andQN,
the covariance matrices are an unstructuredvector of
parameters,
in this case a vector of 6 values. The constraints of covariancematrices,
ie thateigenvalues
have to bepositive
(equation
[7]),
are therefore unknown to theoptimizers.
Given thisbackground
it is notsurprising
that theoutcome of an
optimization
step,
ie a new set of(co)variances,
is notguaranteed
to meet therequirements
of covariance matrices. Inpractical
terms,
this means thatthe determinant of the coefficient matrix
generated
on the basis of these covariancematrices will become less than zero, thus
aborting
the whole process, because thelog
of anegative
number cannot be taken.Groeneveld,
1994).
Furthermore,
when the true correlation between the traits ishigh,
the ’covariance matrices’proposed by
theoptimizers
are morelikely
to lie outside theparameter
space, and the true covariance matrices to be located closeto the
edge
of theparameter
space.An obvious
(at
least in the context ofDS)
solution to the treatment of undefined covariance matrices is toassign
a ’bad’ likelihoodvalue,
shouldnegative
eigenvalues
occur. This will tell theoptimizer
(DS)
to avoid this direction insubsequent
optimization
steps.
While thisprocedure
may workreasonably
well withsampling-based
optimization algorithms,
itproduces major problems
withQN,
whichrequires
the function to be continuous and differentiable. If the condition occursduring
the proces of
approximating
thegradients
by
finitedifferencing,
assigning
a ’bad’likelihood will result in a nonsensical
gradient.
When usedduring
thefollowing
optimization
step
obviously
likewise nonsensical directions are chosen. Inshort,
assigning
’bad’ likelihood values when undefined covariance matrices occur willoften result in
aborting
theQN
optimization
step.
We can thus observe that the
optimizers
that have super linear convergenceproperties
(and
are thus much faster thansampling-based
procedures
like DS(Dennis
andTorczon,
1991))
failincreasingly
as thedimensionality
of theproblem
increases.
Thus,
theproblem
of undefinedparameters
arising during optimization
leads to the
paradoxical
situation that an efficient class ofoptimizers
canonly
beused with confidence on small
problems
with 1 or 2traits,
wherecomputing
time is notimportant
andefficiency
ofoptimization
not anissue,
whereas inefficientsampling-based
optimizers,
which are at bestonly
linearly converging,
must be used onlarger
problems.
A SOLUTION
Part of the
problem
can be solvedby
performing
a constrainedoptimization.
This isrelatively
easy for the variances in which the constraint isonly
thatpositive
numbersbe chosen.
However,
notechnique
seems to be available toimpose
a set of constraints such that nonegative
eigenvalues A
occur for a subset of thedimensionality
of theoptimization
space asgiven
inequation
(7!.
Instead,
we propose toperform
theoptimization
on theCholesky
factor of the covariance matrices. Let:Thus,
instead ofoptimizing
on theR
o
,
itsCholesky
factorC
r
is usedapplying
the same
operation
to all covariance matrices in the model.Operationally,
thisimplies
thefollowing
steps:
1)
Usersupplies
initial covariance matrices.2)
The
Cholesky
factorization isperformed
on all covariance matrices.4)
The function SMME is calledby
theoptimizer.
This function sets up and solves MME andcomputes
the likelihood valuepassing
theCholesky
factor asparameters.
5)
SMMEcomputes
theoriginal
R
o
andGo
from thefactors,
sets up and solves the MME andcomputes
the likelihood value.6)
Control refers back to theoptimizer
(Step 4)
to have it decide on the nextstep
based on the last LV and the current set of factors.
This process results in a matrix that
always
has theproperties
of a covariancematrix, irrespective
of the values that theoptimizers
may come up with. This isbecause:
As a
result,
undefined ’covariance’ matrices cannot occur.A
special
case arises when certain covariances are not estimable. This mayhap-pen with residual covariances when measurements on different traits are
mutually
exclusive,
a situationfrequently
occurring
injoint
analyses
of data from 2 testen-vironments.
During
optimization,
the non-definedcomponent
isskipped, thereby
reducing
itsdimensionality.
The currentimplementation
of thereparameterization
computes
theCholesky
factor on the basis of thecomplete
covariance matrix with zeros inserted for the undefinedparameters.
Although
the values of theCholesky
factordepend
on the zero inserted for undefinedconvariances,
the sameoptimum
has
always
been reached foroptimization
on thereparameterized
and on theoriginal
scale.
RESULTS
Four mutivariate runs are
given
to assess the effect ofoptimizing
on theCholesky
factors. The
timings
listed refer to a Hewlett Packard 7100computer system
with the 99 MHz processor.Run 1 was an
analysis
of a selectionexperiment
in chickens with 5traits,
2 fixedeffects,
year andbarn,
and the animalcomponent.
As no traits weremissing
acanonical transformation was
performed,
which reduced the number of numericaloperations
dramatically.
Thirty
components
must be estimatedsimultaneously.
The data set waskindly
supplied
by
CHagger
(Swiss
Federal Institute ofTechnology).
Run 2 was ananalysis
of 3 meatquality
traits on around 2 000pigs.
Not all records werecomplete.
There were 6 class effects and 1 covariable in themodel,
which was identical for all traits. Randomcomponents
were common litter andanimal, resulting
in 18components
to be estimated(Dietl
etal,
1993).
Run 3 was an
analysis
of 4 048 station test records from swinecomprising
the 4 traitsdaily
gain,
feed conversionefficiency,
valuablecuts,
and a meatquality
parameter.
There were 2covariables,
4 fixed classeffects,
1 random littercomponent
and the random correlated animal effect. The covariableweight
at the end of testthe field or in test
stations,
but not in bothenvironments,
thecorresponding
residual covariance
component
was not estimable. This results in 17 variances and(co)variances
to be estimated. All 4 models included the fullrelationship
amonganimals. A summary
description
of the runs isgiven
table I.Table II shows results from DS and
QN
using
thisprocedure.
The number offunction evaluations declines
substantially
for both.The
general
picture
shows a much smaller number of function evaluations for theQN optimizer
compared
with DS. This is to beexpected
asQN
optimizers
have asuper linear rate of convergence, ie the number of iterations decreases for a fixed
increase in accuracy as convergence is
approached. However, QN optimizers
aborted in 3 out of the 4 models whenoptimization
was done on theoriginal
(co)variance
matrices
attesting
to theproblems
of undefinedparameters
outlined above. Withoptimization
on thecomponents
of the(co)variance
matrices,
only
themodified DS succeeded in
locating
theoptimum
(DS
column in tableII).
Thecosts,
however,
werelarge,
particularly
for run 1(at
3.7 s per functionevaluation).
This
converged
at the number of rounds indicated with an accuracy of10-
6
on the distance between the worst and bestparameter
set in the vertex, but had still notquite
reached theoptimum. Interestingly,
apart
from the 8 557illegal
points
that wereproduced
by
the DSoptimizer,
a loss of rank of the coefficientsystem
occurred 23 120 times. This condition is encounteredduring
factorization when a zeropivot
is detected. In this case, factorization is abortedand, again,
are
partly
due tobadly
conditioned covariancematrices,
thatjust
about pass thetest for
positive
eigenvalues,
but still do not render the coefficient matrixpositive
definite. It is thus an effect of limited accuracy on
digital
computers
for these 2 tests.Obviously,
thisphenomenon
alsoproduced
alarge
amount of directionalmisinformation, wasting
a substantial amount of CPU time.Runs 1-3 had to cope with a
large
number of undefinedparameters
nearly
1undefined for the 3 or 4 that were within the
parameter
space,resulting
in asubstantial amount of
conflicting
directional information. The situation was betterin run
4,
where DSonly
left theparameter
space 13 times.QN
aborted in the first 3 runs after avarying
number of function evaluations because of a discontinuous function surface introducedby
the ’bad’ function valuesgiven
to undefinedpoints.
Only
run 4 wascompleted successfully,
despite
the occurrence of 96illegal
points.
With
optimization
on theCholesky
factor of the(co)variance
matrices thepicture
changes drastically.
The extent to which the DSoptimizer
benefited wasdependent
on the number ofillegal
points
prior
toreparameterization.
Accordingly,
run 1finished in less than a 20th of the time
converging
to the bestpoint,
while run 2and 3 finished twice as fast.
Only
run 4 did notbenefit,
which was to beexpected.
Infact,
thereparameterization
resulted in more function evaluations. Whether this is a chance result or an indication of a moregeneral
pattern,
cannot beconclusively
decided at thispoint. However, experience
from alarge
number of other runs hasshown a substantial
variability
in the number of function evaluation till convergencefor
seemingly
identical models in terms of number ofparameters.
It is therefore assumed that the observed slow down is morelikely
a chance result than ageneral
phenomenon.
QN
found the solution in all runs.Depending
on the number ofillegal
points
encountered
before,
thespeed
up was substantial. This wascomputed
as the ratioof the number of function evaluations of
QN
withCholesky
factor versus DS on theoriginal
scale. IfQN
and DS canonly
be used on the basis of theoriginal
covariancematrices,
only
DS willreliably
give
results.Thus,
this is considered as the referencepoint.
Run 1 was
particularly
impressive:
with DSoperating
on theoriginal
covariancematrices, optimization
wasbasically
impracticable
ascomputing
time was at around 3 weeks or more of CPU timeprohibitively high
(and
the bestpoint
was still notquite
reached).
QN,
on the otherhand,
solved theproblem
in less than 3 h. The other 3 runs showed aspeed-up
factor of between 2.5 and 15.7.CONCLUSIONS
Multivariate REML estimates for
general
statistical models suffer fromhigh
com-putational
demands and the rather lowdimensionality
in terms of number of traits thatthey
could handle. In most cases,only
bivariateanalyses
could be done withgeneral
modelsproducing suboptimal
covariance matrices ofhigher
order(Spilke
andGroeneveld,
1994).
The indicatedimprovement
inspeed
by
optimizing
on theCholesky
factor of the covariance matrices will have a 2-fold effect:firstly,
it willmuch more efficient
QN
algorithms,
which willconsiderably
extend the scope ofmultivariate REML
(co)variance
component
estimation. Mostimportantly,
it will allow users to increase thedimensionality
of the models that can behandled,
thushelping
close the gap between the number of traits used ingenetic
evaluation and(co)variance
component
estimation.This
analysis
was done with the VCE program, amultivariate,
multimodelREML
(co)variance
component
estimationpackage
(Groeneveld, 1994),
which isavailable for research purposes on the anonymous
ftp
server under the Internetnumber 192.103.38.1.
ACKNOWLEDGMENT
The valuable
suggestions
of the reviewers aregratefully acknowledged.
One reviewerreferred to a paper
by
Lindstrom and Bates(1988)
who alsosuggested reparameterization
of covariance matrices in mixed models forrepeated-measures
data. Their conclusion that&dquo;reparameterization
iskey
toensuring
consistent convergence of theNewton-Raphson
algorithm&dquo;
for this class of models agrees with ourfindings.
REFERENCES
Dempster APN,
LairdNM,
Rubin DB(1977)
Maximum likelihood fromincomplete
datavia the EM
algorithm.
J
R Stat Soc Ser B 1-38Dennis
JE,
Torczon V(1991)
Direct search methods onparallel
machines. SIAM J SciComput 1, 448
Dietl
G,
GroeneveldE,
Fiedler I(1993)
Genetic parameters of muscle structure traits inpigs.
In:44th
AnnualMeeting of
theEuropean
Associatiorefor
AnimalProduction,
Aarhus,
DenmarkDucos
A,
BidanelJ, Ducrocq
V, BoichardD,
Groeneveld E(1993)
Multivariate restricted maximum likelihood estimation ofgenetic
parameters forgrowth,
carcass and meatquality
traits in FrenchLarge
White and French LandracePigs.
Genet SelEvol 25,
475Graser
HU,
SmithSP,
Tier B(1987)
A derivative-freeapproach
forestimating
variancecomponents in animal models
by
restricted maximum likelihood. J Anim Sci 64, 1362Groeneveld E
(1991)
Simultaneous REML estimation of 60 covariance components inan animal model with
missing
valuesusing
aDownhill-Simplex algorithm.
In:42nd
Annual
Meeting of
theEuropean
Associationfor
AnimalProd!ctio!c, Berlin, Germany
Groeneveld E
(1994)
REML VCE - a multivariate multimodel restricted maximumlikelihood
(co)variance
component estimationpackage.
In:Proceedings of
an ECSymposium
onApplication of
Mixed Linear Models in the Predictionof
Genetic Meritin
Pigs
(E
Groeneveld,
ed) (in press)
Kovac M
(1992)
Derivative-free methods in covariance component estimation. Ph Dthesis,
University
of Illinois atUrbana-Champaign,
USALindstrom
MJ,
Bates DM(1988)
Newton-Raphson
and EMalgorithms
for linear mixed-effects models forrepeated-measures
data. J Amer Stat Assoc 83, 1014Meyer
K(1989)
Restricted maximum likelihood to estimate variance components for animal models with several random effectsusing
aderivative-free algorithm.
Genet Sel Evol 21, 317Mielenz
N,
GroeneveldE,
MullerJ, Spilke
J(1994)
Simultaneous estimation of covariances with REML and Henderson 3 in a selected chickenpopulation.
Br Poult Sci(in print)
NelderJA,
Mead R(1965)
Asimplex
method for function minimization.Computer
J7,
308
Patterson
HD,
Thompson
R(1971)
Recovery
of inter-block information when block sizesare
unequal.
Biometrika 58, 545Schnabel
R,
KoontzJ,
Weiss B(1982)
A ModularSystem
ofAlgorithms
for Unconstrained Minimization. TechnicalReport CU-CS-240-82, Comp
SciDept,
UnivColorado,
Boulder, CC,
USASpilke J,
Groeneveld E(1994)
Comparison
of four multivariate REML(co)variance
component estimation
packages.
In: 5th WorldCongress
on GeneticsApplied
toLivestock
Production, Guelph,
vol 22, 11-14Spilke J,
GroeneveldE,
Mielenz N(1993)
Monte-Carlo simulation invariance-covariance-component estimation in mixed linear
multiple
trait models. Arch Anim Breed 6, 679 Tixier-BoichardM.,
BoichardD,
BordasA,
Groeneveld E(1992)
Genetic parametersof residual feed intake in adult males and females from a