HAL Id: hal-00894158
https://hal.archives-ouvertes.fr/hal-00894158
Submitted on 1 Jan 1997
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
An ’average information’ restricted maximum likelihood
algorithm for estimating reduced rank genetic
covariance matrices or covariance functions for animal
models with equal design matrices
K Meyer
To cite this version:
Original
article
An
’average
information’ restricted
maximum
likelihood
algorithm
for
estimating
reduced rank
genetic
covariance matrices
or
covariance functions for animal
models
with
equal design
matrices
K
Meyer
Animal Genetics and
Breeding Unit, University of New England,
Armidale,
NSW 2351, Australia(Received
21May 1996; accepted
17January
1997)
Summary - A
quasi-Newton
restricted maximum likelihoodalgorithm
thatapproximates
the Hessian matrix with the average of observed andexpected
information is described for the estimation of covariance components or covariance functions under a linear mixedmodel. The
computing
strategy outlined relies on sparse matrix tools and automatic differentiation of a matrix, and does notrequire
inversion oflarge,
sparse matrices. For thespecial
case of a model withonly
one random factor andequal design
matrices for alltraits,
calculations to evaluate the
likelihood,
first and’average’
second derivatives can be carriedout trait
by trait, collapsing computational requirements
of a multivariateanalysis
to those of a series of univariateanalyses.
This is facilitatedby
a canonicaldecomposition
of thecovariance matrices and
corresponding
transformation of the data to new, uncorrelated traits. The rank of the estimatedgenetic
covariance is determinedby
the number of non-zeroeigenvalues
of the canonicaldecomposition,
and thus can be reducedby fixing
anumber of
eigenvalues
at zero. This limits the number of univariateanalyses
needed tothe
required
rank. It isparticularly
useful for the estimation of covariance function whena
potentially large
number ofhighly
correlated traits can be describedby
a low orderpolynomial.
REML
/
average information/
covariance components/
reduced rank/
covariance function/
equal design matricesRésumé - Algorithme de maximum de vraisemblance restreint, basé sur l’« informa-tion moyenne », pour estimer les matrices de covariance génétique ou les fonctions de covariance de rang partiel, dans les modèles animaux avec matrices d’incidence identiques. On décrit un
algorithme
de maximum de vraisemblance restreint de typequasi-Newton,
quiapproche
la matrice Hessienne par la moyenne del’information
observéeles outils de traitement des matrices creuses et sur la
différentiation
automatique d’unematrice, sans nécessiter d’inversions de
grandes
matrices creuses. Dans le casparticulier
d’un modèle avec un seul
facteur
aléatoire et une matrice d’incidenceidentique
pourtous les
caractères,
les calculs de lavraisemblance,
de ses dérivéespremières
et secondes« moyennes» peuvent être
effectués
caractère parcaractère,
ce qui ramène les besoins decalcul liés à une
analyse
multivariate au niveau de ceux d’une séried’analyses
univariates.Ceci est rendu
possible
par ladécomposition
canonique des matrices de covariance à partirde la
transformation
des données en caractères nouveaux, non corrélés entre eux. Le rangde la matrice de covariance
génétique
estimée est déterminé par le nombre de valeurs propres non nulles de ladécomposition canonique,
et donc peut être réduitquand
onfixe
à zéro certaines valeurs propres. Le nombred’analyses
univariates est ainsiégal
au rang.Ceci est
particulièrement
utile pour l’estimation de lafonction
de covariance, qui décrit les covariances entre un trèsgrand
nombre de caractères très corrélés par l’intermédiaired’un
polynôme
d’ordreinférieur.
REML
/
information moyenne/
composantes de covariance/
fonction de covariance/
rang partielINTRODUCTION
Estimation of
(co)variance
components
by
restricted maximum likelihood(REML)
fitting
an animal model to date ismainly
carried outusing
a derivative-free(DF)
algorithm
asinitially
proposed
by
Graser et al(1987).
While this has been found to be slow to converge, especially for multi-trait andmulti-parameter analyses,
itdoes not
require
the inverse of alarge
matrix and can beimplemented efficiently
using
sparse matrixstorage
and factorisationtechniques,
making
itcomputationally
feasible for models
involving
tens of thousands of animals.Recently
there has been renewed interest inalgorithms
utilising
derivatives of the likelihood function to locate its maximum. This has been furtheredby
technicaladvances,
making
computations
faster andallowing
larger
andlarger
matrices to be stored.Moreover,
therediscovery
of Takahashi et al’s(1973)
algorithm
to invertlarge sparse
matrices has removed most of the constraints onalgorithms
imposed
previously by
the need to invertlarge
matrices.In
particular,
’average
information’(AI)
REML,
aquasi-Newton
algorithm,
which
requires
first derivatives of the likelihood butreplaces
second derivatives with the average of the observed andexpected information,
describedby
Johnson andThompson
(1995)
has been found to becomputationally highly advantageous
over DF
procedures.
It is well
recognised
that for several correlatedtraits,
most information available is contained in a subset of the traits or linear combinations thereof. This subsetis the smaller the
higher
the correlations between traits. Moretechnically,
severaleigenvalues
of thecorresponding
covariance matrix between traits are very smallor zero. If a modified covariance matrix were obtained
by setting
all smalleigen-values to zero and
backtransforming
to theoriginal
scale(using
theeigenvectors
corresponding
to non-zeroeigenvalues),
it would have reduced rank.There has been interest in reduced rank covariance matrices in several areas.
covariance matrix and
exploiting
a transformation to canonical scale.Kirkpatrick
and Heckman
(1989)
introduced theconcept
of ’covariancefunctions’, expressing
the covariance between traits as ahigher
orderpolynomial
function.Polynomials
can be fitted to full or reduced order. In the latter case, theresulting
covariancematrix has reduced
rank, ie,
a number of zeroeigenvalues
(Kirkpatrick
etal,
1990).
The covariance function(CF)
model wasdeveloped
with theanalysis
of ’traits’with
potentially infinitely
manyrepeated,
or almostrepeated
records inmind,
where the
phenotype
orgenotype
of individuals is describedby
a function ratherthan a finite number of measurements
(Kirkpatrick
andHeckman,
1989).
Atypical
example
is thegrowth
curve of an animal.Hence,
in essence, CFs are theinfinite-dimensional
equivalent
of covariance matrices.Analysis
under a CF modelimplies
that coefficients of the CF are estimated rather than individual covariances as under
the usual
multivariate,
’finite’ linearmodel;
seeKirkpatrick
et al(1990)
for furtherdetails.
While it is
possible
tomodify
an estimated covariance matrix to reduce its rank(as
doneby
Kirkpatrick
etal,
1990,
1994),
it would bepreferable
toimpose
restrictions on the rank of covariance matrices’directly’
during
(REML)
estimation.Ideally,
this could be achievedby
increasing
the order of fit(ie,
rankallowed)
sequentially
until an additional non-zeroeigenvalue
does notsignificantly
increase the likelihood.Conceptually,
this could beimplemented
simply by
reparameterising,
to theeigenvalues
andcorresponding
eigenvectors
of a covariancematrix,
andfixing
therequired
number ofeigenvalues
at zero. Practicalapplications
of suchreparameter-isations,
however,
have been restricted tosimple
animal models withequal
design
matrices for alltraits;
see Jensen and Mao(1988)
for a review. Forthese,
acanon-ical
decomposition
of thegenetic
and residual covariance matrixtogether yields
atransformation to uncorrelated variables with unit residual
variance,
leaving
the number ofparameters
to be estimatedunchanged
(for
fullrank).
Meyer
and Hill(1997)
described how REML estimates of CFs or, moreprecisely,
their coefficients could be obtained
using
a DFalgorithm through
asimple
repa-rameterisation of the variance
component
model.However, they
found it slow toconverge for orders of fit
greater
than three or four.Moreover,
for simulated data sets the DFalgorithm
failed to locate the maximum of the likelihoodaccurately
in severalinstances,
especially
if CFs were fitted to ahigher
order than simulated.This paper reviews an AI-REML
algorithm
for thegeneral,
multivariate case,presenting
acomputing
strategy
that does notrequire
sparse matrix inversion.Sub-sequently,
simplifications
for thespecial
case of asimple
animal model withequal
design
matrices for all traits are considered. Additional reductions incomputational
requirements
are shown for the estimation of reduced rankgenetic
covariancema-trices or reduced order CFs.
THE GENERAL CASE Model of
analysis
with y,
13,
u and edenoting
the vector ofobservations,
fixedeffects,
randomeffects and residual errors,
respectively,
and X and Z are the incidence matricespertaining
to(3
and u. LetV(u)
=G,
V(e)
= R andCov(u,e’)
=0,
so thatV(y)
= V =ZGZ’
+
R.For an animal
model,
ualways
includes the vector of animals’ additivegenetic
effects(a).
Inaddition,
it may contain other randomeffects,
such as animals’maternal
genetic
effects,
permanent
environmental effects due to the animal orits
dam,
or common environmental effects such as litter effects.Let
E
A
={a
Ai
j}
’
denote the t x t matrix of additivegenetic
covariances. Foru = a this
gives
G
=E
A
® A where A is the numeratorrelationship
matrix and 0denotes the direct matrix
product.
If other random effects arefitted,
G isexpanded
correspondingly;
seeMeyer
(1991)
for a more detaileddescription.
Assuming y is
orderedaccording
to traits within animalswhere N is the number of animals that have
records,
and2::
+
denotes the direct matrix sum(Searle, 1982).
LetE
E
={!E!!
be
the matrix of residual covariances between traits. Fort traits,
there are a total of W =2t -
1 possible
combinations oftraits recorded
(assuming
single
records pertrait),
eg, W = 3 for t = 2. For animali with combination of traits w,
R
i
isequal
to!Ew’
the submatrix ofE
E
obtainedby
deleting
rows and columnspertaining
tomissing
records.Average
information REMLAssuming
a multivariate normaldistribution, ie,
y NN(Xb, V),
thelog
of the REML likelihood(G)
is(eg,
Harville,
1977)
where X* denotes a full-rank submatrix of
X,
andLet e denote the vector of
parameters
to be estimated withelements O
i
for i =1, ... , p.
Derivatives ofThe latter is
commonly
called the observedinformation.
It hasexpectation
For V linear
in 9,
a2V/8Bi8B!
=0,
and theaverage of observed
[5]
andexpected
[6]
information
is(Johnson
andThompson,
1995)
The
right
hand side of[7]
is(except
for a scalefactor)
equal
to the second derivative ofy’Py
withrespect
to B
i
and0j ,
ie,
the average information isequal
to the datapart
of the observed information.REML estimates of e can then be obtained
by substituting
the average infor-mation matrix for the Hessian matrix in a suitableoptimisation
scheme which usesinformation from second derivatives of the function to be
maximised;
seeMeyer
and Smith
(1996)
for a detailed discussion ofNewton-Raphson-type algorithms
in this context.Calculation of the
log
likelihoodCalculation
of log G
pertaining
to[1]
has been described in detailby Meyer
(1991).
It relies onrewriting
[3]
as(Graser
etal, 1987; Meyer,
1989)
where C is the coefficient matrix in the mixed model
equations
(MME)
for[1] (or
a full rank submatrix
thereof).
The first two
components
oflog L
canusually
be evaluatedindirectly, requiring
only
thelog
determinants of matrices of sizeequal
to the maximum number of records or effects fitted per animal. For u = awhere
N
A
denotes the number of animals in theanalysis
(including
parents
withoutrecords).
log
IAI
is a constant and can be omitted for thepurpose
ofmaximising
log
,C.Similarly,
withN
w
denoting
the number of animalshaving
records forcombination of traits w
The other two terms in
[8],
log
ICI and y’Py,
can be determined in ageneral
way for all models of form
!1!.
Let M(of
size M xM)
denote the mixed modelright
hand sides(r)
and aquadratic
in the data vectorA
Cholesky
decomposition
of Mgives
M =LL’,
with L a lowertriangular
matrix with
elements l
ij
(l
ij
= 0for j
>i),
andFactorisation of M for
large
scale animal modelanalyses
iscomputationally
feasiblethrough
the use of sparse matrixtechniques;
see, forinstance,
George
and Liu(1981).
Calculation of first derivatives
Differentiating
[8]
gives
partial
first derivativesAnalogously
to the calculation oflog £
the first two terms in[14]
canusually
be determined
indirectly
while the other two terms can be evaluatedextending
theCholesky
factorisation of the MMM(Meyer
andSmith,
1996).
Let
D!
=å’5:.
A/
å()
i
be a matrix whose elements are 1 ifB
i
isequal
to the klth element ofE
A
and zero otherwise.Further,
let6
ki
denote Kronecker’sDelta, ie,
6
kl
= 1 for k = and zerootherwise,
andQA
l
denotes the klth element ofEA
1
.
Foroj
=a A kl
Similarly,
withD!
=BE
Ew/
8Bi
anda’lL
the klth element ofY.
l
¿
while all other first derivatives of
log
!G!
andlog
!R!
are zero.Smith
(1995)
describes aprocedure
for automatic differentiation of theCholesky
decomposition.
In essence, it is an extension of theCholesky
factorisation whichgives
notonly
theCholesky
factor of a matrix but also itsderivatives, provided
the
corresponding
derivatives of theoriginal
matrix can bespecified.
Inparticular,
It involves
computation
of a lowertriangular
matrix F. This is initialised to10f(L)Ial
ijl
.
Oncompletion
of the backwardsdifferentiation,
F contains thederivatives of
f(L)
withrespect
to the elements of M. Smith(1995)
states that the calculation of F(not
including
the work needed tocompute
L)
requires
about twice as much work as one likelihood evaluation. Once F has been determined firstderivatives of
f (L)
can be obtained one at a time astr(F<9M/<9!t),
ie,
only
onematrix F is
required.
Meyer
and Smith(1996)
describe a REMLalgorithm utilising
thistechnique
to determine first and(observed)
second derivatives oflog
G for the case consideredhere. For
f (L)
=log
I C +
y’Py,
the scalar is a function of thediagonal
elements of L(see
[12]
and[13]).
Hence,
{8f(L)/8l
ij
}
is adiagonal
matrix with elementsn¡¡
1
for
i =1, ... , M -
1 and21,!,1,!
in row M.The non-zero derivatives of M have the same structure as the
corresponding
part
(data
versuspedigree)
part
of MAs outlined
above,
R isblockdiagonal
for animals.Hence,
matricesaR -
1
/ alJ
Ekl
have submatrices
-E-’Do!E-1
ie,
derivatives of M withrespect
to residual(co)variances
can beset up
in the same way as the ’datapart’
of M.The
strategy
outlined for the calculation of first derivatives oflog L
does notrequire
the inverse of the coefficient matrix C. Incontrast,
Johnson andThompson
(1994, 1995)
and Gilmour et al(1995)
for the univariate case, and Madsen et al(1994)
and Jensen et al(1995)
for the multivariate case deriveexpressions
for8logG/aB
i
based on[4],
whichrequire
selected elements ofC-
1
.
Their scheme is
computationally
feasibleowing
to the sparse matrix inversion method of Takahashi et al(1973).
Misztal(1994)
claimed that each sparse matrix inversion took about two to three times aslong
as one likelihoodevaluation,
ie,
computational requirements
for both alternatives to calculate first derivatives oflog C appear
comparable.
Calculation of the average information Define
For
B
i
=where
In
is anidentity
matrix of size n,Z.,,,
the submatrix of Z and amthe subvector of a for trait m,ie, b
i
issimply
aweighted
sum of solutions for animals in the data.For O
i
=(J&dquo; Ek¡
and 6 =
y -
Xb -
Zu the vector of residuals for[1]
with subvectors8
m
for m =1,...,
tExtension to models
fitting
additional random effects such as litter effects ormaternal
genetic
effects isstraightforward;
see, forinstance,
Jensen et al(1995)
forcorresponding expressions.
Using
!19!, [6]
can be rewritten asJohnson and
Thompson
(1995)
calculated vectorsPb
j
as the residuals fromrepeatedly solving
the mixed modelequations pertaining
to[1]
with yreplaced
by
b
j
for j
=1, ... , !.
Oncompletion,
[22]
could be evaluated assimple
vectorproducts.
Alternatively,
define a matrix B =[bi
I b
2
I ...
bp!.
Then consider the mixed model matrix with yreplaced by
B, ie,
with the last row and column(for
right
handsides)
expanded to p
rows and columnsFactoring
M
B
or,equivalently,
’absorbing’
C into the last p rows and columns ofM
B
then overwritesB’R-
1
B
with B’PB which has elements{b’ iPb
j
}
(Smith,
1994 perscomm).
With theCholesky
factorisation of Calready
determined(to
calculatelog
G),
this iscomputationally undemanding.
EQUAL
DESIGN MATRICESFor a
simple
animal model with all traits recorded at the same orcorresponding
times,
design
matrices areequal, ie,
[1]
can be rewritten asMeyer
(1985)
described a method ofscoring
algorithm
for this case,exploiting
a canonical transformation to reduce a t-variateanalysis
tot corresponding
univariateanalyses.
For
E
E
positive
definite andE
A
positive
semi-definite,
there exists a matrixQ
Transforming
the data tothen
yields
t new, ’canonical’ traits which are uncorrelated and have unit residualvariance. This makes the
corresponding
coefficient matrix in the MMEblockdiag-onal for
traits, ie,
Meyer
(1991)
described how thelog
likelihood(on
theoriginal
scale)
in this case can becomputed
traitby
trait as the sum of univariate likelihoods on the canonicalscale
plus
anadjustment
for the transformation(last
term in!29!)
with
y*
the subvectorof y
*
for trait i andP*
the ithdiagonal
block of theprojection
matrix on the canonical scale P*
which,
likeC
*
,
isblockdiagonal
for traits.Terms
required
in[29]
can be calculatedby
setting
up andfactoring,
as describedabove,
univariate MMM(on
the canonicalscale),
Mi ,
of sizeM
o
=(M -
1)/t
+ 1each
Moreover,
all first derivatives oflog
G as well the average informationmatrix,
both on the canonical
scale,
can be determined traitby
trait.First derivatives on the canonical scale
Consider the
parameterisation
ofMeyer
(1985)
where0
*
,
the vector ofparameters
on the canonical
scale,
has elementsAg
and wi! for i< j =
1, ... , t,
ie, parameters
are the(co)variances
on the canonical scale.The
log
likelihood on the canonical scale can be accumulated traitby trait,
On the
original
scale,
L and F have the samesparsity
structure(Smith, 1995).
However,
whileAy
=Wij = 0 for
given E
A
andE
E
,
thecorresponding
derivatives and estimates arenot,
unless the maximum of the likelihood has been attained.Hence,
while theoff-diagonal
blocks ofLC
are zero, thecorresponding
blocks ofF *
=
8f (L
*
)/c7M
*
are not.It can be shown that both the
diagonal
blocks of F*corresponding
toL*
F*
and the row vectorscorresponding
to1*’, fi
are
identical to those obtainedby
backwards differentiation ofL!.
In otherwords,
first derivatives withrespect
to the variancecomponents
on the canonical scale(!ii
andWii
)
can be obtained traitby
trait from univariateanalyses.
Calculation of derivatives withrespect
to!2!
andWij
,
however, requires
theoff-diagonal
blocks of F*corresponding
to traits i andj,
FC_! .
Fortunately,
as outlined in theA
PP
endix,
matricesFC.!
can be determinedindirectly
from termsarising using
theCholesky decomposition
and backwards differentiation for individual traits on the canonical scale.From
[17]
and!18!,
firstderivatives
off (L
*
)
=log
I C*
+
y
*
’P
*
y
*
are thenwith
MM
F
the Mthdiagonal
element of F*. Forf (L
*
)
=*
1
1
g1C
0
+
y
*
’P
*
y
*
,
F
M
M
= 1.
Other terms
required
to determine the first derivatives on the canonical scale arewhere G* and R* are the canonical scale
equivalents
to G andR,
respectively.
Average
information on the canonical scaleFor G* = A ®
A,
R* =,
tN
I
and thus V* =Var(y
*
)
blockdiagonal
fortraits,
[20]
ie,
vectorsbi
are zeroexcept
for subvectors for traits k andl,
bi
k
= si andbi
l
= sk , with Sjstanding
in turn forA!Zoa!
and8) .
With P*
blockdiagonal
fortraits,
0g
=).,kl
or uki(k x l)
and 0) =
A
mn
or umn(m <
n),
thisgives
Hence,
calculation of the average information on the canonical scalerequires
all termss’P*s
j
for i! j
=1,...,
2t and k =1, t.
These can be obtained traitby
trait,
analogously
to theprocedure
described above for thegeneral
case. After theCholesky
factorisationof Mk has
been carriedout,
solutiona* for
animals’ additivegenetic
effects and residualsek
for
trait k areobtained, storing
theCholesky
factorL*.
Define a matrix S of size N x 2t with columnsequal
to vectors si. S is the canonical scaleequivalent
to B above. Once all columns of S have beenevaluated,
set up a matrix
- vi 1
-for each trait. This is the MMM for trait k on the canonical scale with
yk
replaced by
S. The matrix S’S has elementss’s
j
which are the sum of squaresand
crossproducts
of the vectors of(weighted)
solutions and residuals.Absorbing
Ck
intoS’S
(using
the stored matrixL*)
then overwrites this matrix withS’PkS
which has elementsS!PkSj’
After all t traits have beenprocessed
bi!P*b!,
(twice)
the averageinformation,
can be ’assembled’according
to!38!.
Derivatives on the
original
scaleLet
H
*
,
with elements2 bi ! P* b! ,
andg
*
,
with elements810g £* /80g,
denote the average information matrix and vector ofgradients
on the canonicalscale,
respectively. Corresponding
terms on theoriginal
scale are thenwhere J with elements
<9!/c)!
is the Jacobian matrix of e withrespect
toe
*
.
From
[25]
and[26],
J has non-zero elementsAlternative
parameterisation
The above
parameterisation requires
switching
between theoriginal
and canonical scale(or
accumulating
thetransformations).
Alternatively,
asperformed
by
Meyer
(1991)
for a derivative-freealgorithm,
0
*
can be defined to have t elementsA!
(i
=1, ... , t)
and t
2
elementsq2!
(i, j
=1, ... , t),
ie,
the’genetic’
variances onthe canonical scale and the elements of the transformation matrix
Q.
This allows estimation to be carried out on the canonical scale.Moreover,
as outlinedby Meyer
(1991)
for a derivative-freealgorithm,
evaluation oflog
G forgiven
Aii
requires
scalar calculations
involving
the elements ofQ only,
ie, maximising
the conditionallog,C
(for
given
Àii
)
withrespect
to qij iscomputationally
inexpensive.
This allowedthe maximum of
log G
to be located in atwo-step
procedure
withcomputational
requirements equivalent
to those of at parameter
(rather
thant(t
+1) )
analysis.
While derivatives withrespect
to q2!(not shown)
require
theoff-diagonal
blocks ofF,
derivatives withrespect
toA,
do not(see
!32!).
Hence a nestedmaximisation,
using
an average informationstep
to estimateparameters
A
zz
and a derivative-freeprocedure
to maximize G withrespect
to q2! forgiven A
zz
appears computationallyadvantageous
in this case.REDUCED RANK COVARIANCE MATRICES
Forcing
the estimate of thegenetic
covariance matrixE
A
to have reduced rank(k
A
<t),
results in a number of zeroeigenvalues
on the canonical scale. In thiscase, several terms
contributing
tolog
G or its derivatives are constant ordepend
on the canonical transformation(Q)
only.
Thusthey
need to be evaluatedonly
once per analysis, effectively reducing the
computational
requirements
per round of iteration to those for univariateanalyses
for thek
A
non-zeroeigenvalues.
Likelihood: for
Aii
!0,
contributions tolog
G[29]
becomeWhile the former is a
constant,
determinedby
the data structureonly,
the latterdepends
on the canonical transformation.However,
as[45]
shows,
it can be calculated for any Q from thecorresponding
residual sums of squares(SS)
andcrossproducts
(CP)
on theoriginal
scale. Let Y denote a matrix of size N x t with columnsequal
to vectors of observations y2. Bothlog
IX’X
O
and
quadratics
ym (I - Xo(XoXo)-Xo)
yn can then be determinedby factoring
the matrixthe
eigenvalues
for both canonical traits(A
ii
andA
jj
)
are non-zero. This reduces thenumber of
Cholesky
decompositions
(of
matricesMZ )
andcorresponding
backwards differentiations(to
obtain matricesF*
and
vectorsf2 )
to be carried out to the number of non-zeroeigenvalues,
k
A
.
Moreover,
only
k
A
(k
A
-1)/2
insteadof t(t-1)/2
off-diagonal
blocksFc
*
.. ’J
need to be evaluated. This can reducecomputational
requirements
perround
of iterationdramatically.
Average
information:
as shownabove,
the average information matrix can be constructed from residual SS and CP in the vectors of random effects solutions and residuals. ForA
kk
=0,
ak
= 0. Residuals on the canonical scale are then a linear combination of residuals on the
original scale,
ek
=!m-1
q
km8
m
with8
m
=y&dquo;, -Xo(3&dquo;!.
Again
the latter need to be evaluatedonly
once. Termsrequired
for the AI can then be obtained asbefore,
replacing
Ms
k
in[39]
withif
A
kk
= 0. Formultiple eigenvalues equal
tozero,
[47]
only
needs to be evaluatedonce
(per iterate).
COVARIANCE FUNCTIONS
The
’infinitely-dimensional’
modelConsider
’repeated’
measurements taken at a number of ages(or equivalent),
withpotentially infinitely
many records. The covariance between records taken at ages l and m can beexpressed
aswhere
7is the covariance function and K with elements
K
ij
is thepertaining
matrix ofcoefficients,
and a&dquo;, is the mth age, standardised to the interval for which thepolynomials
are defined. 4! is a matrix oforthogonal polynomial
functions evaluated at thegiven
ages with elements!2! _
<!(0t),
thejth polynomial
evaluated for agei,
and k denotes the order ofpolynomial
fit.Conceptually
k = oo, but inpractice
k <
t for observations att ages.
Kirkpatrick
et al(1990, 1994)
suggested
the use of the so-calledLegendre polynomials
(see
Abramowitz andStegun,
1965)
which span the interval from -1 to 1.Note that
polynomials
include a scalarterm, ie,
for t records a full order fit involves terms to the power0, ... , t -
1. From[48],
a covariance matrix can be rewritten asLet
E
A
=4l
A
K
A4
l
Q
for an order of fitk
A
,
andK
A
the matrix of coefficients for thecorresponding
covariance function A.Further, partition
the error covariance matrix intocomponents
owing
topermanent
andtemporary
environmentaleffects,
E
E
=E
R
+E,.
Under the ’finitemodel’,
theseusually
cannot bedisentangled
unless there are
repeated
records for the same age. Assume the latterrepresent
independent
’measurementerrors’, ie, E,
=Diag
{Œ;i
}
’
and that the former can bedescribed
by
aCF, R,
ie, E
R
=£
4l
K
R
R
4l
with order of fitk
R
.
Fitting
measurementerrors
separately together
with R to the order t - 1yields
anequivalent
model toa full order fit for
E
E
.
Hence the maximum fork
R
is t - 1 rather than t. General caseEstimates of the elements of the coefficient matrices of the covariance functions
(and
the measurementerrors)
can be estimatedby
REMLusing algorithms
for the multivariate estimation of covariancecomponents
together
with asimple
reparameterisation.
Likelihood: as outlined
by Meyer
and Hill(1997),
log L
[9]
can be rewritten asUnder the CF
model,
the vector ofparameters
to be estimated is rlwith elementsK
A
,,
for i# j = 1, ... k
A
,
KR!!
for i ! j
=
1, ... k
R
andŒ;i
for
i =1,...,
t. For rCFs to be
estimated,
it has minimumlength
r + t(fitting
all
CFs to order 1 andassuming
allŒ;i
to
bedistinct)
and maximumlength
rt(t+1)/2 (fitting
all CFs to fullorder).
&dquo;
Derivatives: note that the elements of matrices 1)
only
depend
on the ages at which records were taken and thepolynomial
function chosen. For any vector tl a
corresponding
vector of covariancecomponents
(8)
can be calculatedusing
[49].
First and’average’
second derivatives oflog
,C can thus be determined withrespect
to the elements of 8 as described above(section
on estimation of covariancecomponents
in thegeneral
cases)
and then be transformed to the ’covariance functionscale’, using
where J with elements
OO
i/
0
77j
is the Jacobian matrix of 0 withrespect
to 71. Fromfor
77
i
=<7!
and 0
j
=aEmm’
Note that a reduced order fit for ,A
(k
A
<t)
implies
agenetic
covariance matrixE
A
of reduced rank. This may lead tocomputational
problems
whenapplying
’standard’
methodology
to factor the MMM(such
as theCholesky decomposition
and its automatic
differentiation)
as described above. Forpractical applications,
this can be overcome
by
setting
the t -k
A
zeroeigenvalues
to anoperational
zero,ie,
a small non-zero value such as10-
5
or10-
6
.
As for the estimation of covariance
components,
extensions to other modelsincluding
additional random effects arestraightforward.
Equal
design
matricesAssuming
measurement errors aregreater
than zero,E
E
=E
R
+E
E
ispositive
definite,
regardless
of the order of fit for R.Hence,
the transformation to canonical scale exists and thelog
likelihood can be determined as for the ’finitemodel’,
summing
terms from univariateanalyses
on the canonical scale as described above. For the estimation of covariancecomponents,
the canonical transformation doubled as a tool to reducecomputational requirements
(allowing
calculations to be carried out for one trait at atime)
and as areparameterisation,
ie,
the likelihoodwas maximised with
respect
to theeigenvalues
and elements ofeigenvectors
of the canonical
decomposition
ofE
A
andE
E
.
For the CF model there arethree matrices to be
considered, K
A
andK
R
,
orE
A
andE
R
derived fromthem,
andE
E
.
Decompositions
diagonalising
several matrices exist. Lin and Smith(1990),
forinstance,
used the commonprincipal
components
algorithm
ofFlury
and Constantine
(1985)
as anequivalent
to the canonicaldecomposition
for amultivariate mixed model with several random effects.
However,
thisrequired
the matrices to be transformed to bepositive definite,
and is thus not suitable for thisapplication.
Hence
only
the firstparameterisation
describedabove, namely
0
*
having
ele-mentsA
ij
and Wijfor i < j
=1, ... ,
t
is suitable for the estimation ofCFs.
First derivatives and average information on the canonical scale can then be determinedand transformed back to the
original
scale as described above. The Jacobian J in this case has non-zero elementsDISCUSSION
A
strategy
has been outlined forcomputing
thelog
likelihood in a multivariate mixedmodel, together
with its first and’average’
second derivatives withrespect
to covariancecomponents
or the coefficients of covariance functions. These can be used in a(modified)
Newton-type
estimationprocedure
(Marquardt, 1963),
together
with an additional transformation to ensure estimatesto be within the
parameter
space; seeMeyer
and Smith(1996)
for details.For a model with
only
one random factor andequal design
matrices for alltraits,
calculations can be carried out traitby trait,
reducing
computational
requirements
to those of a series of univariateanalyses.
This is feasiblethrough
a canonicaldecomposition
of thegenetic
and residual covariance matrices and acorresponding,
linear transformation of the data to new, uncorrelated variables.
In that case,
fitting
a covariance function to less than full order orforcing
es-timated
genetic
covariance matrices to be of reduced rank isequivalent
tofixing
eigenvalues
on the canonical scale at zero. Most termspertaining
to zeroeigen-values then
only
need to be calculated once peranalysis, resulting
in considerable reductions ofcomputational requirements.
In essence, it reduces workrequired
in each round of an iterative solution scheme to thatequivalent
tok
A
univariateanalyses.
This isparticularly
useful for acomparatively large
number ofhighly
cor-related measurements where a covariance function of low order suffices to describe
the data
adequately.
Incontrast,
foranalyses
where a transformation to canonical scale is notfeasible,
thecomplete
t-variate MMM needs to be set up, factored anddifferentiated in each round of
iteration,
even if CFs areonly
fitted to orderk
A
.
Making
computational
demandsproportional
to the order of fit for thegenetic
CF(or matrix)
encourages an’upwards’
strategy:
increasing
the order of fit oneby
one allows a likelihood ratio test to beperformed
at eachstep.
The minimum number ofparameters
describing
the data is found when the likelihood ceasesto increase
significantly
when the order of fit is increased. It is envisioned that estimation of reduced order covariance matrices or functions will become the standardprocedure
forhigh-dimensional
multivariateanalyses.
ACKNOWLEDGEMENT
This work was
supported
under grant UNE35 of the Australian Meat ResearchCorporation
(MRC).
REFERENCES
Abramowitz
M,
Stegun
IA(1965)
Handbookof Mathematical
Functions.Dover,
New YorkFlury B,
Constantine G(1985)
Algorithm
AS 211: The F-Gdiagonalization algorithm.
Appl Stat,
177-180Gilmour
AR, Thompson R,
Cullis BR(1995)
Average
information REML: an efficientalgorithm
for variance parameter estimation in linear mixed models. Biometrics 51,1440-1450
Graser
HU,
SmithSP,
Tier B(1987)
A derivative-freeapproach
forestimating
variance components in animal modelsby
restricted maximum likelihood. J Anim Sci64,
1362-1370Graybill,
FA(1969)
Introduction to Matrices withApplications
in Statistics. WadsworthPublishing Company, Inc, Belmont,
CaliforniaHarville,
DA(1977)
Maximum likelihoodapproaches
to variance component estimation and relatedproblems.
J Amer Stat Ass72,
320-338Jensen
J,
Mao IL(1988)
Transformationalgorithms
inanalysis
ofsingle
trait and of multitrait models withequal design
matrices and one random factor per trait: a review.J Anim Sci 66, 2750-2761
Jensen
J, Mantysaari E,
MadsenP, Thompson
R(1995)
Restricted Maximum Likelihoodestimation of
(co)variance
components in multivariate linear modelsusing
average of observed andexpected
information. In: 2!cdEuropean Workshop
on AdvancedBiometrical Methods in Animal
Breeding, Salzburg,
12-20June, 1995, Mimeo,
17 p JohnsonDL, Thompson
R(1994)
Restricted maximum likelihood estimation of variancecomponents for univariate animal models
using
sparse matrixtechniques
and average information. Proc 5th WorldCongr
GenetAppl
LivestProd, University of Guelph,
Guelph,
vol 18, 410-413Johnson
DL, Thompson
R(1995)
Restricted maximum likelihood estimation of variance components for univariate animal modelsusing
sparse matrixtechniques
and averageinformation. J
Dairy
Sci78,
449-456Kirkpatrick M,
Heckman N(1989)
Aquantitative genetic
model forgrowth, shape
and other infinite-dimensional characters. J Math Biol27,
429-450Kirkpatrick M,
LofsvoldD,
Bulmer M(1990)
Analysis
of theinheritance,
selection and evolution ofgrowth trajectories.
Genetics124,
979-993Kirkpatrick M,
HillWG, Thompson
R(1994)
Estimating
the covariance structure of traitsduring growth
andaging,
illustrated with lactations indairy
cattle. Genet Res64, 57-69
LinCY,
Smith S.P.(1990)
Transformation of multi-trait to unitrait mixed modelanalysis
of data with
multiple
random effects. JDairy
Sci73,
2494-2502Lindstrom
MJ,
Bates DM(1988)
Newton-Raphson
and EMalgorithms
for linear mixed-effects models forrepeated-measures
data. J Amer Stat Ass 83, 1014-1022Madsen
P,
Jensen J,Thompson
R(1994)
Estimation of(co)variance
componentsby
REMLin multivariate mixed linear models
using
average of observed andexpected
information. Proc 5th WorldCongr
GenetAppl
LivestProd,
University
of Guelph, Guelph,
Vol 22, 19-22Marquardt
DW(1963)
Analgorithm
for least squares estimation of nonlinear parameters.SIAM J 11, 431-441
Meyer
K(1985)
Maximum likelihood estimation of variance components for a multivariate mixed model withequal design
matrices. Biometrics41,
153-166Meyer
K(1989)
Restricted maximum likelihood to estimate variance components for animal models with several random effectsusing
a derivative-freealgorithm.
Genet SelectEvol 21,
317-340Meyer
K(1991)
Estimating
variances and covariances for multivariate animal modelsby
restricted maximum likelihood. Genet Selct Evol 23, 67-83Meyer K,
Smith SP(1996)
Restricted maximum likelihood estimation for animal modelsusing
derivatives of the likelihood. Genet Select Evol28,
23-49Meyer K,
Hill WG(1997)
Estimation ofgenetic
andphenotypic
covariance functions forlongitudinal
or’repeated’
recordsby
restricted maximum likelihood. Livest Prod SciMisztal I
(1994)
Comparison
ofcomputing properties
of derivative and derivative-freealgorithms
in variance component estimationby
REML. J Anim Breed Genet 111,346-355
Searle SR
(1982)
MatrixAlgebra Useful for
Statistics. JohnWiley &
Sons, New York Smith SP(1995)
Differentiation of theCholesky algorithm.
JComp Graph
Stat 4, 134-147 TakahashiK, Fagan J,
Chin MS(1973)
Formation of sparse busimpedance
matrix and itsapplication
to short circuitstudy.
Proc 8th Inst PICAConf Minneapolis, Mineapolis,
63-69;
citedby
Smith(1995)
APPENDIX
Calculation of
off diagonal
blocks of FSmith
(1995)
gives
pseudo-code
for backwards differentiation of theCholesky
factor of asymmetric
matrix. This can beadapted
tocalculating
a selected submatrixonly
for the case where the
corresponding
part
of theCholesky
factor has zero elements.Calculating
theoff-diagonal
block of F* for traits i andj,
Fc
i
,
indirectly,
requires
components
from univariateanalyses
on the canonicalscale,
namely
L
C.
with elementsLk
l
and1*
with elementsl!,
theparts
of theCholesky
factor ofM*
corresponding
to the coefficient matrix and vector ofright
handsides,
respectively,
andf!
with elements/!,
the ’F matrix’analogue
to the vector ofright
hand sides fortrait j.
F*e.. ’1
with elementsF!!
can then be calculated as follows:(1)
initialiseFèij
tofl1:’,
ie,
fork,
l =1,..., Mo - 1,
F!l
:= fkl;&dquo;;
(2)
calculate columns ofF!,2!
one at atime,
starting
with the lastcolumn, ie,
fork = Mo - 1, ... ,
I(a)
adjust
for columnsalready
evaluated, ie,
for m =M
o
-
1, ... ,
k + 1 and(b)
divideby
thepivot,
ie,
for l =1,...,
Mo - 1, F
lk
:=-F
lk
/ L
%
k
where ‘:_’ indicates that the left hand side of the
equation
is overwritten with thequantity
on theright.
As noted
by
Smith(1995),
only
selected elements of F(in
thegeneral
case)
need to beevaluated,
adjustments only occurring
for non-zero elements of theCholesky
factor. Hence F has the same
sparsity
structure as L. Similar considerations hold in this case,reducing
computational
requirements
inpractical applications
substantially.
Forinstance,
columns ofF*c.. ’J
are notrequired
foradjustments
to other columns if thecorresponding
row ofL*
has no non-zerooff-diagonal
elements. Hence in these columns
only
elementscorresponding
topotentially
non-zero elements in the derivatives of M* need to be evaluated. Other redundancies could be
perceived.
For aparticular
analysis
itmight
be worthcarrying
out asymbolic
factorisation of the MMM for all traits on theoriginal
scale to determine thesparsity
structure of the ’full L’ and thus the minimum number of elements ofFc..
which need to be evaluated. This would have to be carried outonly
once perNumerical
example
Consider the
example
given
by
Meyer
(1991),
consisting
of two traits measuredon 284 mice. Table I summarises
starting values,
intermediate results for the first iterate and estimates over rounds of iterations.Convergence
is reachedrapidly
eventhough
starting
values for covariances were far from the eventualestimates,
the AIalgorithm performing
similar to analgorithm using
observed orexpected
secondwith
eigenvalues
All
= 1.96406 andA
22
= 0.50389.Quadratics
on the canonical scale(y2 !Pi Yi )
are 314.77078 and 375.36955 for the two traits and thecorrespond-ing
log ICil
are 241.85744 and554.08056,
respectively.
With 329 animals in theanalysis
andr(X
o
)
=2,
thisgives
log
Gomitting
the termtlog
IAI,
of -1172.3761(cf
[29]).
First derivatives of
log
IG
*
I
and
log
]R
*
calculated
according
to[34]
and[35]
are
given
in table I. Termsy!’yt 2
( i ! j
=
1, 2)
andf2 !r!
(
i, j
=1, 2)
are42285.252,
50449.049 and64091.809,
and-83940.962, -101237.320,
-101089.518 and-127432.879, respectively.
Application
of[32]
and[33]
thengives
the first derivatives(canonical
scale)
of f (L
*
),
withcorresponding
derivativesðlog£/8()i =
-
1/2
(8 log
IG*I/8()i
+8 log
IR
*
I/8()i
+af (L
*
)/8Bi )
as shown in table I. From[42]
and
(43],
the Jacobian isand