HAL Id: hal-00893806
https://hal.archives-ouvertes.fr/hal-00893806
Submitted on 1 Jan 1989
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Restricted maximum likelihood to estimate variance
components for animal models with several random
effects using a derivative-free algorithm
K. Meyer
To cite this version:
Original
article
Restricted
maximum
likelihood
to
estimate variance
components
for
animal
models with
several
random
effects
using
a
derivative-free
algorithm
K.
Meyer
Edinburgh University,
Instituteof
AnimalGenetics,
West MainsRoad, Edinburgh
EH93JN, Scotland,
UK(received
21 March1988, accepted
11January
1989)
Summary - A method is described for the simultaneous estimation of variance
compo-nents due to several
genetic
and environmental effects from unbalanced databy
restricted maximum likelihood(REML).
Estimates are obtainedby
evaluating
the likelihoodexplic-itly
andusing
standard,
derivative-freeoptimization procedures
to locate its maximum. The model ofanalysis
considered is the so-called Animal Model which includes the ad-ditivegenetic
merit of animals as a randomeffect,
andincorporates
all information onrelationships
between animals.Furthermore,
random effects in addition to animals’ ad-ditivegenetic effects,
such as maternalgenetic,
dominance or permanent environmentaleffects are taken into account.
Emphasis
isplaced entirely
upon univariateanalyses.
Sim-ulation is
employed
toinvestigate
theefficacy
of three different maximizationtechniques
and the scope forapproximation
ofsampling
errors.Computations
are illustrated with anumerical
example.
variance components - restricted maximum likelihood - animal model - additional
random effects - derivative - free approach
Résumé - Utilisation du maximum de vraisemblance restreint et d’un algorithme
sans dérivation, pour estimer les composantes de variance d’un caractère, selon un
modèle animal avec plusieurs effets aléatoires. On décrit une méthode pour estimer
simultanément les composantes de la uartance "d’un
seul
caractère,
dues au milieu ouplusieurs
effets
génétiques.
La méthode admet des donnéesnon équilibrées
et sefonde
sur le maximum de vraisemblance restreint(«
REML»).
Lescomposantes
estimées sont obtenues par l’évaluationexplicite
de lafonction
devraisemblance,
dont on recherche le maximum par destechniques générales d’optimisation,
ne nécessitant pas le calcul des dérivées. Le modèled’analyse
est un «modèleanimal»,
où l’on considère la valeurgénétique
individuelle des animaux comme un
effet
aléatoire,
et tientcompte
de toutel’information
généalogique disponible.
Deseffets
aléatoirescomplémentaires
(effets
maternelsgénétiques,
effets
dedominance,
effets
de milieupermanent)
sont aussipris
en compte. La simulationest utilisée pour évaluer
l’efficacité
de troistechniques
demaximisation,
et pour déterminerapproximativement les distributions des estimateurs. Les calculs sont illustrés par un
exemple numérique.
INTRODUCTION
Over the last
decade,
restricted maximum likelihood(REML)
has become themethod of choice for
estimating
variancecomponents
in animalbreeding
and relateddisciplines trying
topartition
thephenotypic
variation intogenetic
andother
components.
This has been facilitated notonly by
an increase in thegeneral
level of
computational
resourcesavailable,
butby
thedevelopment
of numerousspecialized
algorithms, exploiting specific
features of the data structure or modelof
analysis
as well asutilizing
avariety
of numericaltechniques.
So
far,
REML has found mostpractical
use in theanalysis
ofdairy
cattle dataunder a &dquo;sire model&dquo;. For this
model,
records of progeny are usedonly
to obtaininformation on half of their sires’
breeding value,
while dams andrelationships
between females areignored. Recently,
interest has increased in more detailedmodels,
inparticular
theconceptually simplest
breeding
value or&dquo;Animal Model&dquo;
(AM)
where each record is taken toprovide
information on the additivegenetic
merit of the animal measured.
By including
animals which do not have records themselves but areparents,
this allows for all information onrelationships
to betaken into account.
A
large
proportion
of REMLapplications
have been restricted to models withone random factor
(e.g. sires)
apart from random residual errors,estimating
two variancecomponents
only
in a univariateanalysis,
or p(p
-f-1)
for a multivariateanalysis
of p traits. Whilealgorithms
for morecomplicated
models have beendescribed, they
areby
andlarge computationally demanding.
Oftenthey
involveinversion of a matrix of size
equal
to the total number of levelsof all
random effectsfitted. This can be
prohibitive
forpractically
sized data sets. Thus REML has foundcomparatively
little use so far for modelsfitting
several random effects.Maximum likelihood estimation
involves,
by definition,
location of the maximumof the likelihood function for a
given
set ofdata,
model ofanalysis
andparameters
to be estimated.
Estimating
variancecomponents
for unbalanced datagenerally
requires
iterative schemes. Standard textbooks on numericalanalysis classify
pro-cedures to find theoptimum
(minimum
ormaximum)
of a functionaccording
tothe amount of information
required
from derivatives of the function. The so-calledNewton methods utilize both first and second
derivatives,
i.e.geometrically
speak-ing slope
andcurvature,
and are thusquickest
to converge. Methodsrelying
on firstderivatives
only
includesteepest
descent,
conjugate gradient
andQuasi-Newton
pro-cedures
approximating
second derivatives.Finally,
there are derivative-free methodsinvolving
direct searchstrategies
or numericalapproximation
of derivatives(see
forexample
Gill etal.,
1981).
In the
main,
REMLalgorithms currently employed
in animalbreeding
fallinto the first two
categories.
Fisher’s Method ofScoring
is aspecial
case of theNewton
procedures, requiring expected
values of second derivatives of thelog
likelihood function(G)
to be evaluated. As these are often difficult toobtain,
Expection-Maximization
(EM) type
algorithms
(Dempster
etal.,
1977),
exploiting
first derivative
information,
are used morewidely.
A derivative-free REML
algorithm
has beensuggested by
Graser et al.(1987)
for univariateanalyses
to estimate the additivegenetic
and error variance underprocedure
was suitable for data fromlarge
selectionexperiments
involving
severalthousand animals.
This paper describes the use of a derivative-free
approach
to estimate variancecomponents
by
REML for AMs which include notonly
animals’ additivegenetic
merit but also additional random
effects,
and thus cover a wide range ofmod-els suitable for the
analysis
of animalbreeding
data. Univariateanalyses only
areconsidered at
present,
extensions to multivariate situations will be discussed else-where.CALCULATING THE LIKELIHOOD
The Model Let:
denote the linear model of
analysis
with: Y the vector of Nobservations,
b the vector of NF fixed effects
(including
any linear ofhigher
ordercovariables)
X the NxNF incidence or
design
matrix for fixed effects with column rankNF
*
,
u the vector of all NR random effectsfitted,
Z the NxNR incidence matrix for random
effects,
ande the vector of N random residual errors.
Assume:
&dquo;’{TI--B r&dquo;I
which
gives:
The mixed model
equations
(MME)
pertaining
to[1]
are then(Henderson, 1973):
or C
F
= r. If C is not of fullrank,
as it is often thecase, estimates for b are not
unique.
The Likelihood
REML
operates
on the likelihood of linear functions of the data vector withexpectations
zero, so-called errorcontrasts,
or,equivalently,
on thepart
of the likelihood(of
the datavector)
which isindependent
of fixed effects. This resultsin the loss in
degres
of freedom due tofitting
of fixed effectsbeing
taken into account(Patterson
&
Thompson,
1971).
ForY !N(Xb, V),
thelog
likelihood is(e.g.
Harville1977):
-
-where X*
(of
orderNxNF
*
)
is a full rank submatrix of X.Using
matrixequalities
where C* is the coefficient’matrix in
[2]
with Xreplaced by
X
*
,
and P is a matrix:Calculation of the first two terms
required
in[4]
depends
on thespecific
structureof R and G in a
given analysis.
The lattertwo,
however,
can be determined in ageneral fashion,
assuggested by
Graser et al.(1987),
by
Gaussian Elimination(as
described in most NumericalAnalysis
textbooks,
orby
Smith & Graser(1986))
applied
to the mixed model array: the coefficient matrix in[2]
augmented by
theright
hand side and aquadratic
in the data vector.Calculation
ofy’Py
andlog
C*!
The mixed model array for
[1]
is:&dquo;Absorbing&dquo;
rows and columnspertaining
to random effects into the rest to thematrix then
gives:
and
eliminating
rows and columns for fixed effectscorrespondingly,
yields
y!Py,
the
weighted
sum ofsquared
residualsrequired
to evaluatelog
.C.Absorption
ismost
easily
carried outby
Gaussian elimination:repeated absorption
of one rowand column at a time. This will also allow
log
I C
*
to
be determinedsimultaneously.
Subdivide M of size KxK(K=NF
+
NR
+
1)
with elements m2! and column vectors mi into rows 1 toK—1,
and row K:Partitioned matrix results then
give
with
Mx_
1
*
=M
K
-
1
-M
KM
’K/M
KK
=I
M
ij-
M
i
K
mjk!mKK} _ fm
ij
*
}
the matrixresulting
when&dquo;absorbing&dquo;
row and columnK,
or&dquo;pivoting&dquo;
onmK
x
.
Repeated
use of this result shows that the
required
determinant is thensimply
the sum of thelog
ofpivots
log
mii*,
i =2,...,
K)
arising
whenabsorbing
all rows and columns ofM into the first row, as
required
to evaluatejrpy.
If X is not of fullrank,
M hasto be set up
replacing
Xby
X* or,equivalently, absorptions
have to be carried outUnivariate
analyses
Results
presented
so far hold for any model of form(1!.
Consider
now a univariateanalysis
withidentically
andindependently
distributed errors, i.e.For
given
values of the other variancecomponents,
the error variance can beestimated
directly
in this case, from the residual sums of squares as(see
Harville,
1977;
or Graser etal.,
1987)
Let the other
parameters
to beestimated,
i.e.(co)variances
of the random effectsfitted,
be denotedby
oi with i =1, ..., p - l,
and p the total numberof
components
with up =
0-2 E*
As discussed
by
Harville & Callanan(1988),
a function of REML estimates of a set ofparameters
is also the REML estimate of this function.Hence,
instead ofmaximizing
log
G withrespect
to the pcomponents
Qi , we canreparameterize
to9
and
p—1 functions fi
(!i,
(TÐ
of the othercomponents
and the error variance.An obvious choice is to express the Qi as a
proportion
(À
i
i/u2 )
of thelatter,
sothat
having
found REML estimates ofu
§
and
the,
a
i
wecan
estimate6
i
=62 E’
Furthermore,
for fixed values ofA
i
,
log
G attains its maximum withrespect
tou§ at
the REML estimate ofU2 E.
This allows estimation to be conducted in twosteps:
Firstly,
a &dquo;concentrated&dquo;likelihood
is maximized withrespect
to theA
i
only
which
yields
REML estimates!i.
Secondly,
&2
is
obtained(from
[9])
for theiz
(Harville
&Callanan,
1988).
Theadvantage
of thisapproach
is that it reduces the dimension of the numerical search for the maximum oflog L
by
one. As the numberof iterates and likelihoods to be evaluated to find the maximum of
log L
usually
increases
substantially
with the number ofparameters
to beestimated,
this canlead to a considerable
saving
incomputational
resourcesrequired.
From
[8]
it followsimmediately
that:Log
!G!
depends
on the random effects fitted. For thesimplest
model withanimals as the
only
randomeffect,
as consideredby
Graser et al.(1987):
where
QA
is the additivegenetic
variance,
A the numeratorrelationship
matrixbetween
animals,
a the vector of(direct)
genetic
effects foranimals,
and NA denotesthe number of animals. Since
log
IAI
does notdepend
on theparameters
to beestimated,
it is a constant and does not need to be calculated in order to maximizelog
G. The inverse of A isrequired
in[6]
(for
G-
1
)
though,
but this can be set upefficiently
from a list ofpedigree information, following
rulesdescribed,
forinstance,
by Quaas
(1976).
Let c of
length
NC denote a vector of such effects to be included in the model ofanalysis,
with .This
gives:
In other cases, the model of
analysis
may involve two random effects for each animal. Let m, oflength NA,
denote the second animal effect and assume eachelement has variance
a
’
.
If there arerepeated
records per animal for atrait,
mrepresents
thepermanent
effects due toanimals,
excluding
additivegenetic
effects.These are
usually
assumed to be uncorrelated with any other effects in themodel,
so that
If m had variance
0&dquo;!1
D,
[13]
would beaugmented by log
!D!.
As withlog
IAI,
this term is constant and does not need to be evaluated. Note
though
thatG-
1
andconsequently
D-
1
isrequired
in(6!.
Atypical example
for this kind of structure isa model where m stands for dominance
effects,
u m 2
for therespective
variance andD for the dominance covariance matrix among animals.
For other
traits,
forexample
measures ofreproductive performance,
wedistin-guish
between a direct and a maternal(or paternal)
additive-genetic
component,
allowing
for a covariance between the two. In thatsituation,
there may not be arecord
supplying
information on m for eachanimal,
but information isacquired
indirectly
via linksarising
from thegenetic
covariance andrelationships.
With UAMdenoting
the covariance between a and m and rAM thecorresponding
correlation,
and
partitioned
matrix resultsgive
For all models discussed so
far,
computational
requirements
to determine thepart
oflog
!G!,
whichdepends
on theparameters
to beestimated,
are trivial. Thisresults from random effects
being
eitheruncorrelated,
so that G isblockdiagonal
(!12!
and!13!),
or Gbeing
the directproduct
of a matrix ofparameters
and a matrixdescribing
correlationsamongst
levels of random effects as in[14].
Extensions toother models are
straightforward,
aslong
as G can bepartitioned
into blocks ofsuch structure. For
example,
fitting
permanent
environmental effects(c)
as well asdirect and maternal additive
genetic
effects,
[14]
would beaugmented
simply
by
(NC
log
Œð),
provided
c was uncorrelated to a and m. Table I summarizeslog
,Cfor 10 models which may arise in the
analysis
of animalbreeding
data,
with up to 3 random effects andinvolving
up to 5(co)
variancecomponents.
Otherwise,
G(or
using techniques
as described above forlog
!C*!.
Forinstance,
if G contained ablock of form
the contribution to
log
IGI
would beAssume
V(a )
=QA
A,
Cov(a,
c’)
=Cov(m,
c’)
= 0 andV(c)
=o- c
I
for all models.Terms are assumed to be the result of Gaussian Eliminations
performed
for M withaE
2 factored out.Computational
ConsiderationsTypically,
theaugmented
coefficient matrix M is verylarge
but also very sparse. Hence use of sparse matrixtechniques, storing
the non-zero elements of Monly,
is
advantageous
and allowsmatrices
of order of thousands to be handled. Since Mis
symmetric,
only
the lower(or
upper) triangle
isrequired.
One form of sparsematrix
storage,
described in standard text books such as Knuth(1973),
is aso-called &dquo;linked list&dquo; . Such linked
lists,
one list for each row of M inconjunction
with a vector
pointing
to the first element in each row, are wellsuited,
and allowthe Gaussian Elimination
steps
required
to evaluatey!Py and
log
!C* !
to be carriedout
efficiently.
In
setting
upM,
the order ofequations
can be of vitalimportance
as it affects the &dquo;fill-in&dquo;during
theabsorption
process, i.e. the number of additional non-zerooff-diagonal
elementsarising.
Forcomputational efficiency
this should bekept
as small aspossible.
There is extensive literature concerned with numericaloperations
onsparse matrices. Tewarson
(1973),
forexample,
discussestechniques
for the choiceof
pivot
in each Gaussian Eliminationstep
whichyields
the least localfill-in,
and also considers the scope of apriori
columnpermutations.
A number ofstrategies
forre-ordering
matricesexists,
oftenutilizing graph theory;
see for instance Duff et al.(1986).
Suchgeneral techniques, making
little or noassumptions
about the matrixstructure can be
computationally
expensive.
This may beprohibitive
for situationswhere the direct solution of a
large
sparsesystem
ofequations
isrequired
a fewtimes
only,
but may be worthwhile for ourapplication
where numerous likelihoodevaluations are to be
performed.
Future research should consider thistopic.
In the
meantime,
criticalinspection
of the data andrelationship
structure with theirimplications
for thepattern
ofoff-diagonal
elements in the mixed model array,and
judicious ordering
of effects may achieve alarge
proportion
of thepotential
benefits from
general reordering algorithms.
A standardstrategy
inattempting
to minimize fill-in is to process rows with the fewestoff-diagonals
first. Graser et al.(1987)
thereforesuggested
selection ofpivots
corresponding
to theyoungest
animals first. For the models with several randomeffects
for eachanimal,
these should beassigned
to successive rows. In other cases, it may bepossible
toexploit
additional features of the data structure. For data from amulti-generation
selectionexperiment
with selection within
families,
forexample, grouping
of animalsaccording
to female &dquo;founders&dquo; appearspreferable
to agrouping
according
togeneration.
On theother
hand,
if animals are nested withincontemporary
(fixed)
groups, it may beadvantageous
to orderequations
so that animalsdirectly
follow their group effects.For R of form
(8],
QE!’
is usually
factored from(6].
Inthis
case, calculations todetermine
Y
Py
andlog
[C
*
as describedabove,
do notyield
the termsrequired
in
[4]
directly,
but(y
p
y
!E)
and(log IC
I
*
+(NF
*
+NR)
log
U2
),
which has to beborn in mind when
assembling
the likelihood.MAXIMIZING THE LIKELIHOOD
Choice of a
strategy
to locate the maximum of the likelihoodfunction,
orequiva-lently
the minimum of -2log
G,
is determinedby
several considerations.Firstly,
more
demanding
than any calculationsrequired by
theoptimization procedure
assuch.
Hence,
a method whichrequires
a small number of function evaluations ineach iterate is desirable.
Secondly,
theprocedure
should berobust,
i.e. cope withstarting
values forparameters
considerably
different from the eventualestimates,
and should be little affected
by problems
of numerical accuracy,yielding sufficiently
precise
estimates of the minimum even for very flat functions.Thirdly,
constraints on theparameter
space should be accommodatedand, preferably,
notrequire
extrafunction values or reduce the
speed
of convergence.The
suitability
of three differentapproaches
was examinedusing
simulated datafor models
1, 2,
4 and 8 asspecified
in Table I. Records weresampled according
tothe model of
analysis
for one or severalgenerations
(up
tofour),
eachcomprising
a
given
number of full-sib families(ranging
from 25 to800)
of variable size(2
to10),
with dams nested within sires and each sire mated to aspecified
number ofdams
(1
to5).
Error variances were estimateddirectly,
while all othercomponents
wereexpressed
as aproportion
of thephenotypic
variance, i.e.,
8
A
,
Om,
Oc
andO
AM
fora 2, 0fi, 02
and QpM,respectively. Obviously, B
A
is theheritability
andOc
what iscommonly
referred to as&dquo;c
2
effect&dquo; . As describedabove,
this reducedthe dimension of search to
1, 2,
3 and 4 for Models1,
2,
4 and8,
respectively.
This
parameterization
rather thanexpressing
components
as aproportion
of the error variance(À¡)
was chosen since it allowed checks forparameter
estimates outof bounds more
readily and,
for the limited casesexamined,
as itappeared
to bemore robust
against
badstarting
values.Quadratic approximation
For a model with animals as the
only
randomeffect,
Graser etal.
(1987)
fitted aquadratic
function in r =0,2!la2 E to
thelog
likelihood,
predicting
the
maximum oflog G
as the maximum of this function. For oneparameter,
thisrequired
function values for 3 different r values perapproximation. Having
calculatedlog £
for 3 initialpoints,
each iterate then involved one functionevaluation,
for r* which maximizedthe
quadratic
function of theprevious
step.
This value and thosepertaining
to the two r values either side closest to r* were then utilized indetermining
the nextquadratic
approximation
tolog
G. Asreported by
Graser et al.(1987),
simulations for this model showedrapid
convergence. A bad initial guess for rgenerally
did notaffect the estimation
procedure greatly,
aslong
as the threepoints
in the initialapproximation spanned
asufficiently large
range.Though
the number of iteratesand likelihood evaluations
required
tended toincrease,
the same maximum oflog
G as for
&dquo;good&dquo;
starting
values was attained withoutincreasing
computational
demands
excessively.
This
approach
extends to the case ofmultiple
parameters.
Fort,
with elementsBi,
denoting
the vector ofparameters
withrespect
to whichlog
G
is to bemaximized,
andlog G (t)
thecorresponding log likelihood,
thequadratic approximation
is:For p
parameters,
a total of z = 1 +p(p +3)/2
different values of t andlog
G(t)
arerequired
in each iterate to set up and solve asystem
of zequations
for the
intercept
q, the vector of linear coefficients q and thesymmetric
matrixof
quadratic
coefficientsQ.
This number increasesrapidly
with the number ofparamaters,
e.g, z =6, 10,
15 and 21 forp =
2, 3,
4 and5,
respectively.
For one
parameter,
choice of thepoint
to bereplaced
in each iterate wasstraightforward.
In the multi-dimensional case,however,
it was less obvious. Twostrategies
wereexplored.
After z initialpoints
had beenobtained,
the firstinvolved,
as for p
=1,
in theregular
case one function evaluation periterate,
i.e. calculationof
log
G(t
*
)
for t* from the last iterate. This newpoint
was added to the set of zpoints
which formed the basis topredict
t* in theprevious
step.
The worst of theresulting
set of z + 1points
was theneliminated,
and a new vector t* determined. If thequadratic
approximation failed,
i. e. iflog £
(t
*
)
was lower than all z functionvalues in the
set,
t* wasreplaced by
(t
*
+) /
t
m
2,
wheret
nt
was theparameter
vector with
highest
function value in the set. If necessary, this wasrepeated
untilthe
replacement
was successful.Hence,
each iterate increased the average likelihoodof the z current
points.
The second
strategy
comprised
z function evaluations per iterate. Given a vectorof
starting
valuesto
(t
*
from theprevious
iterate),
np vectorst
i
were derivedby
multiplying
the i-th element ofto
by
a factorreflecting
a chosenstep
size,
1.10 forsteps
of10%
in this case.Following
a scheme describedby
Nelder & Mead(1965),
further
parameter
vectors were then determined as(ti
+tj )
/
2 fori < j =
0,
..., p.This
yielded
therequired
total of zgrid
points
andsubsequent
estimate t*. Forboth
strategies,
all vectors t were checked for elements out of theparameter
space,and if necessary these were set to their
respective
bounds.The
quadratic
approximation
performed
well for Model2,
though,
for the limited number ofexamples
considered,
it was notconsistently
better than thetwo alternative
procedures
studied,
in terms of the number of likelihood evaluationsrequired.
For Models 4 and8, however,
where the data structure was such thatonly
a smallproportion
of animals had direct information on the secondgenetic effect,
problems
of numerical accuracy occurred. Often thesystem
of zequations
to be solved was indeterminate or almost so.Typically
thisyielded
non-positive
definiteestimates of
Q
and uselesspredictions
of t*. For the secondstrategy,
an alternativeapproach, slightly
morerobust,
was tried. This consisted ofestimating
elements ofq and
Q by
numericaldifferentiation,
i.e. as forward-differenceapproximations
tothe first and second derivatives of
log
G,
respectively.
On the
whole,
quadratic
approximation
of the likelihood functioninvolving
multiple
parameters
appeared
to be unsuitable as ageneral
searchprocedure.
For a one-dimensionalsearch, however,
itperformed consistently
best among the 3strategies
examined.Quasi-Newton
Procedures which do not
require
second derivatives of the function to beminimized,
butapproximate
the Hessian matrix(=
matrix of secondderivatives)
are referredto as
Quasi-Newton
methods. Thisapproximation
isusually performed iteratively,
vectors of
changes
ingradients
(=
firstderivatives)
and estimates between iterates. While mostQuasi-Newton
procedures require
firstderivatives,
some arederivative-free, approximating
the vector ofgradients using
finite differences. These have been. found to showquadratic
convergence and are recommended as the derivative-freemethods to be used for smoth functions with continuous derivatives. Further details
are
beyond
the scope of this paper and can be found in standardtextbooks,
forinstance Gill et al.
(1981).
Statistical
library
subroutines to find the minimum of a functionusing
aQuasi-Newton
algorithm, namely
NAG routine E04JBF and IMSL routineZXMIN,
have beenapplied
to -2log
1
:-.
These routinesrequire
the user tosupply
a subroutine toevaluate the function to be
minimized, passing
the number and vector ofparameters
as
arguments
andreturning
the function value. Inaddition, starting
values for theparameters,
the maximum number of iterates or function calls allowed and somecriteria of accuracy of evaluation
required
androunding
errors tolerated have to bespecified.
E04JBF
provided
thefacility
to constrainparameters
between fixed upperand lower bounds
(the
IMSLequivalent
ZXMWD was nottested),
i.e. 0 to 1for
0
A
, 0
M
and0c,
and -1 to 1 forB
AM
.
However,
toimpose
theseconstraints,
the routine
required
functionvalues,
setting
allparameters
simultaneously
totheir upper or lower limits.
Obviously,
this violated other restrictions on theparameter
space in thegenetic
context,
i.e. that the sum ofcomponents
is boundedcorrespondingly
(0 <
8
A
+8
M
+9
c
+9
n
n,t <_
1),
and that the absolute value ofthe
genetic
correlation has a maximum ofunity
(9A
M
<_
0
A
0
M
).
Consequently,
log G
could often not be determined for therequired
constellation since M becamenegative
definite orY
Py
assumed anegative
value. Hence minimization was carriedout unconstrained.
Techniques
toimplement
morecomplicated
constraintsexist,
and further research should
investigate
theirsuitabililty
for the kind of models which areof
interest in animalbreeding.
Unless a
parameter
vector was encountered for which -2log
G could not beevaluated,
theQuasi-Newton algorithms performed
well for all models examined. Each iteraterequired
pfunction
evaluations toapproximate
the vector of first derivatives. The number of iteratesperformed depended
onuser-specified
criteria of accuracy and maximum number of function evaluations allowed. If likelihood functions were weryflat,
routines wouldstop
before the minimum of -2log
G wasdetermined as
accurately
asdesired, flagging problems
of numerical accuracy.Figure
1 illustrates thetypical
pattern
ofchanges
in likelihood and estimatesobserved for an
analysis
under Model 4 for a&dquo;good&dquo;
and a &dquo;bad&dquo; initial guess ofparameter
values. The simulated data for thisexample comprised
2generations
with 100 full-sib families of size 4 to 8 and 25 half-sib families each. Records were
sampled
forpopulation
values ofo-P
= 100(phenotypic variance),
9
A
=0.50, Om
=0.20 and IT AM = -0.05.
Starting
values used were thepopulation
values(Set I)
andITA =
0.30, o-
M
= 0.15 and ITAM = 0.05(Set II),
respectively.
For SetI,
ZXMINrequired
93 likelihoodevaluations,
for agiven
significance
level of 6significant
digits.
For SetII,
however,
the routine used 204 function calls before it considered the maximum of -2log
G to befound, although Figure
1suggests
that likelihoodSimplex
The
Simplex
method of Nelder & Mead(1965)
isgenerally
advocated as thederivative-free
procedure
to use if the multivariate function to be minimized isdiscontinuous,
though, initially,
it has beendeveloped
with the maximizationof a likelihood function in mind. It relies on a
comparison
of function valueswithout
attempting
to utilize any statistics related to derivatives of the function.Such
optimization techniques
aregenerally
refered to as direct searchprocedures.
While
they
often havebeen developed by
heuristicapproaches
withoutproof
ofconvergence,
they
have found to be effective inpractice
(Swann, 1972).
The
Simplex
orPolytope
method,
as some authorsprefer
to call it to avoid confusion with theSimplex
technique
used in LinearProgramming,
wasinitially
suggested
by Spendley
et al.(1962).
Itoperates
on a set ofparameter
vectors and theirpertaining
functionvalues,
which form the vertices of asimplex
in theparameter
space, hence its name. As reviewedby
Swann(1972),
it is based onthe
conceptes
of&dquo;evolutionary operations&dquo; , developed
tooptimize
productivity
of industrialplants,
in which theparameter
space is searchedfollowing
somegeometric
configuration.
Thedesign
whichrequires
the least number ofpoints
and hencemakes most efficient use of the function values
calculated,
is aregular simplex.
This is defined
simply
as a set ofmutually equidistant
points,
n + for asimplex
ofdimension n. For two
dimensions,
forexample,
theregular simplex
is anequilateral
triangle.
A usefulproperty
of aregular
is that a newsimplex
can be formed theexisting simplex by
addition of asingle
newpoint.
The search
proceeds
as follows. Tobegin,
asimplex
ofspecified
size is set up,including
thepoint
representing
an initial guess for theoptimum,
andcorresponding
function values are obtained. The aim in each iterate then is to
replace
the worstpoint,
i.e. for a minimizationproblem
thepoint
with thehighest
function. The newpoint,
defining
the nextsimplex,
is chosen so as to preserve thegeometric
shape
in a direction away from the discardedpoint,
butpassing through
the centerrepeated
until thesimplex
straddles theoptimum.
Reducing
the size of thesimplex,
search then recommences with the new, smallerdesign,
or terminates if thesimplex
becomes smaller than a
specified
limit(Swann 1972).
Subsequently,
theSimplex
method has been modifiedby
Nelder & Mead(1965).
Abandoning
theregularity
ofdesign,
theirprocedure
allows thesimplex
to rescaleitself
automatically
in eachiterate,
changing
shape
and sizeaccording
to thelandscape
of the surface searched. Thisadaptability
is achievedby
a combinationof so-called
reflection,
expansion
and contraction steps.Consider a function
F=f(t)
to be minimized withrespect
to pindependent
variables,
and letto
denot the guess for theoptimum
to start with. The initialsimplex
thencomprises
to
and p otherpoints
t
i
(i
=1,
..., p)
obtainedby
modifying
one co-ordinate of
to
at a timeby
a chosenstep
size. Each iteratebegins by ordering
andrenumbering
thepoints
in thesimplex according
to thepertaining
function values. Thefollowing
threepoints
are identified:to
with function valueF
o
is nowthe
currently
bestpoint
in thesimplex
(F
o
<F.t,
i =1,
..., p),
tp
with function valueFp
is the worstpoint
(Fp
>Fi,
i =0, ..., p-1)
which is to bereplaced
in thisiterate,
andtp-,
is the next to worstpoint
(Fp_
1
<Fp
andF!,_1
>Fi,
i =0, ..., p -
2).
Next,
the center of thepoints
to remain in thesimplex
is obtained as:i.e.
by averaging
each coordinate over the set ofpoints.
A trialpoint
is thengenerated by
&dquo;reflecting&dquo; tp
towards the center:where the reflection coefficient a is a
positive
constant. Thecorresponding
functionvalue
F,.
iscompared
withF
o
toFp
to determine furthersteps
in the iterate. Threepossible
outcomes aredistinguished.
Firstly,
t
r
can be better than the worstpoint
but not better than the bestpoint
(F
o
<F
r
<Fp).
In this case,t
r
replaces tp
and a new iterate is started.Secondly,
if
t,.
is the new bestpoint
(F,.
>F
o
),
it is assumed that the direction of search hasbeen a
good
one and search is&dquo;expanded&dquo;
in this directionby
examining
apoint
with
corresponding
function valueF
e
.
Theexpansion
coefficient 7 ispositive
constant. If
F
e
is less thanF
r
,
theexpansion
has been successful andt
e
replace
tp.
Otherwiset
e
is discarded andt,.
is substituted fortp
tocomplete
the iterate.Thirdly,
t
r
may be worse than any of theremaining
ppoints
(F
r
>Fp_
1
).
This leads to &dquo;contraction&dquo; of thesimplex, obtaining
a newpoint
t
e
withpertaining
function value
F,,
as:if
F
T
is less thanFp,
and asotherwise. The contraction
coefficient 0
is a constant in the range from 0 to 1. IfF
e
is less than the smaller ofF
r
andF
N
,
the contraction has been successful andtc.
replaces tp
to end the iterate. Ifnot,
thecomplete
simplex
is shrunkby moving
for i =
1, ..., p,
before a new iterate is started.Suggested
values fora,
(
3
and 7are
1.0,
0.5 and2.0,
respectively
(Nelder
& Mead1965).
Fullcomputational
details
together
with a flowdiagram
can be found in Nelder & Mead(1965),
and aFORTRAN
implementation
forexample
in O’Neill(1971).
Since differences in function values are not utilized in
establishing
the directionof
search,
restrictions on theparameter
space can beimposed easily by assigning
avery
large
positive
function value toparameter
vectors out of bounds. Aspointed
out by
Nelder & Mead(1965),
this will be followedautomatically by
a contractionstep
which willeventually keep
estimates within their bounds. The convergence criterionsuggested by
Nelder&
Mead(1965)
is thevariance,
or standarddeviation,
of function values in the
simplex,
rather than the moreconventionally
usedchange
in estimates between iterates. The rationale for this was that in statistical
problems
such as maximum likelihood
estimation,
the curvature of the surface near theminimum reflects the amount of information available on the
parameters.
If thesurface is
flat,
sampling
errors arelarge
and it is not worthwhile to determine theminimum with
great
accuracy.For REML estimation of
multiple
variancecomponents,
theSimplex
methodproved
robust and easy to use. For agiven
vector ofstarting values,
to,
the initialsimplex
was made up ofto
and p vectorst
i
,
obtainedby multiplying
the i-thelement of
to
by
1.20. A factor thislarge,
corresponding
to astep
size of20%
was chosen to ensurequick
convergence even for badstarting
values. Thisimplied,
however,
that forstarting
values close to theestimates,
theprocedure
would searchunnecessarily
in the wrong direction. A smallerstep
size may be sufficient andperhaps
more eflicient i.e.yield
estimates with less likelihood evaluations. For cases whereparameter
vectors out of bounds were not aproblem,
theQuasi-Newton
algorithm performed generally
somewhat better than theSimplex
method. The variance of function values in theSimplex
had to be less than10-
8
or even10-
9
forthe
Simplex
to find the minimum of -2log £
with the same accuracy as ZXMIN(for
an accuracy level of 5 or 6significant digits).
In terms ofchanges
inparameter
estimates,
however,
a limit of 10-5 to 10-6generally appeared
sufficient.Figure
2 illustrates the convergence behaviour of theSimplex
method for thesame
analyses
asdepicted
inFigure
1. Oscillations in likelihood and estimatesclearly
reflect the &dquo;trial and error&dquo; mechanism of thisapproach.
For both sets ofstarting
values,
88 likelihood evaluations wererequired
before the variance offunction values in the
Simplex dropped
below thespecified
limit of10-
9
.
SAMPLING VARIANCES
The matrix of
approximate,
large
sample
covariances among variancecomponent
estimates isgiven by
the inverse of the information matrix. This in turn canbe
approximated by
the inverse of the Hessianmatrix,
i.e. the matrix of second derivatives of thelog
likelihood function withrespect
to theparameters
to be estimated. Aquadratic
approximation
oflog £
andQuasi-Newton procedures
provide
anapproximate
Hessian matrix(Q)
as aby-product. Using
theSimplex
method this has to be obtained
explicitly.
Nelder & Mead(1965)
described anusing
forward differences(see
e.g. Gill etal.,
1981).
While the mechanics forapproximating
second derivatives arestraightforward,
it can beproblematic
inpractice.
As noted when
examining
thequadratic
approximation
oflog G
formultiple
parameters,
Q
was often notpositive
definite,
whether derivedby least-squares
asdescribed
by
Smith & Graser(1986)
orusing
finitedifferences,
andconsequently
yielded
inappropriate
predictions
of theparameters
maximizing log
G.Similarly,
the
approximate
Hessianproduced
by Quasi-Newton
routines did notnecessarily
give
meaningful
estimates ofsampling
variances.Indeed,
alarge proportion
of theliterature on
Quasi-Newton
algorithms
is concerned withprocedures
to deal withbad or non
positive
definiteapproximations
to the matrix of second derivatives. Simulation wasemployed
to examine thesampling
distribution of variancecomponent
estimates and theirpredicted
variances for models1,
2 and 4. Datawere
sampled
for one to fourgenerations
consisting
of 25 to 800 full-sib families ofsize 2 to 10 each. Dams were nested within sires with each sire mated to 1 to 5
dams. Variance
component
estimates were obtainedusing
theSimplex
method withpopulation
parameters
asstarting
values and a convergence criterion ofV(-2
log
G)
<10-
8
.
Second derivatives wereapproximated
as describedby
Nelder & Mead(1965)
using
astep
size of0.1% of
the estimates.Approximation
ofsampling
variances worked well formodel 1,
i.e.estimating
the additivegenetic
and error varianceonly.
It wassatisfactory
for model 2 whichincluded a additional environmental
component
due to full-sib families(litters),
but it failed foranalyses
under model4,
including
a maternalgenetic
effect as anadditional random effect for each animal.
Table II summarizes
empirical
andpredicted sampling
variances andempirical
correlations between estimates for avariety
ofdesign
for model 2. Resultsclearly
emphasize
the need to ensure that the dataprovide
sufficient information tohigh
negative
sampling
correlation betweena
2
and
<7!. or,
equivalently,
betweenheritability
andc
2
effect. In terms of the likelihoodsurface,
thisimplied
a maximumalong
a flatridge,
i.e. an area where for a constant sum of the twoparameters
the value of the likelihoodchanged
very little withchanges
in theparameter
values. If each sire was mated toonly
onedam,
i.e. there were no half-sibfamilies,
there waslittle scope to
partition
the withinfamily
variance into itsgenetic
and environmentalcomponents.
Thisyielded
a likelihood surface of ashape
which did not allowsampling
variances to beapproximated.
Including
data for a secondgeneration provided
the covariance betweenparents
and their
offspring
as an additional source ofinformation,
thusallowing
aconsid-erably
better discrimination between0
&dquo;1
and
o- c 2
as evidencedby
the decreased(absolute value)
sampling
correlation between the twocomponents.
While for onegeneration
sampling
variances decreased withincreasing
number of dams per sirethe reverse
generally
held for dataspanning
severalgenerations.
These
relationships
are also illustrated inFigure
3 foranalyses
under Model 4.Considering
onegeneration
only
(top row),
no animal had recordsexpressing
bothits direct and maternal
genetic
effects. Hence thesampling
correlation between0
&dquo;1
and
am
2
waslarge
andnegative.
This wasaccompanied by
little variation inestimates of QAM
depicting
that this was determinedby
thegenetic
covariance structure among animalsonly.
For dataincluding
3generations
(bottom row)
sufficient
comparisons
between and withingenerations
were available toyield
estimates ofQ
A
and
am 2
virtually independent
of eachother,
while estimates of 0&dquo;
AM were
highly
variable and showed astrong
negative
association withor M, 2
NUMERICAL EXAMPLE
Table III contains simulated data for 282 animals in two
generations.
Each genera-tion consists of 18 full-sib families of size 6 to10,
where each sire is mated to 3 dams. Parents ofgeneration
one did not haverecords,
which introduced 24 baseanimals,
yielding
306 animals in total. Records weresampled according
to Model 8(see
Ta-ble
I)
forpopulation
values ofu
£
=40,
u
%
=15,0&dquo;
AM
=-5,
u$
= 10 and
U2
=50,
a
phenotypic
mean of 200 and a fixed effect of 20 forgeneration
2.Data were
analysed
under Models1,
2, 3, 4,
7 and8,
as described in Table I.Analyses
were carried outusing
theSimplex
method to find the maximum of the likelihood. Two sets ofstarting
values werechosen,
namely
thepopulation
values as agood
(Set
I:B
A
=0.40, 0Nj
=0.15,
O
AM
= -0.05 andOc
=0.10)
andB
A
=0.10,
om
=0.30, O
AM
=0.10, Oc
= 0.20(=
SetII)
as a bad initial guess. For models otherthan
8,
theappropriate
subset ofparameters
was used. Estimates for two values forthe minimum variance of function values in the
Simplex
are summarized in TableIV. Characteristics of the
augmented
coefficient matrix M and intermediate results for the first likelihood evaluated in each run aregiven
tohelp
with the validation ofcomputer
programs. Gaussian eliminationsteps
are notdependent
on the model ofanalysis,
hence theexample given by
Graser et al.(1987)
should suffice as anillustration. Numerical