HAL Id: hal-00894207
https://hal.archives-ouvertes.fr/hal-00894207
Submitted on 1 Jan 1998
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Estimating covariance functions for longitudinal data
using a random regression model
Karin Meyer
To cite this version:
Original
article
Estimating
covariance functions
for
longitudinal
data
using
a
random
regression
model
Karin
Meyer
Institute of
Cell,
Animal andPopulation Biology, Edinburgh University,
West MainsRoad, Edinburgh
EH93JT, Scotland,
UK(Received
13August 1997; accepted
31 March1998)
Abstract - A method is described to estimate
genetic
and environmental covariance functions for traits measuredrepeatedly
per individualalong
some continuousscale,
such as
time,
directly
from the databy
restricted maximum likelihood. It relieson the
equivalence
of a covariance function and a randomregression
model.By
regressing
onrandom, orthogonal polynomials
of the continuous scalevariable,
the coefficients of covariance functions can be estimated as the covariances among theregression
coefficients. Aparameterisation
is described which allows the rank of estimated covariance matrices and functions to berestricted,
thusfacilitating
ahighly
parsimonious description
of the covariance structure. Theprocedure
and the type of results which can be obtained are illustrated with anapplication
to matureweight
records of beef cows.@
Inra/Elsevier,
Pariscovariance functions
/
genetic parameters/
longitudinal data/
restricted maximum likelihood/
random regression modelRésumé - Estimation des fonctions de covariance de données en
séquence
àpar-tir d’un modèle à coefficients de régression aléatoires. On décrit une méthode d’estimation des fonctions de covariance
génétique
et nongénétiques
pour descarac-tères mesurés
plusieurs
fois par individu lelong
d’une échellecontinue,
comme letemps. Elle
s’appuie
directement sur les données àpartir
du maximum de vraisem-blance restreint, en considérantl’équivalence
entre fonction de covariance et modèle derégression
aléatoire. Les coefficientsfigurant
dans les fonctions de covariance peuventêtre estimés comme des covariances entre les coefficients de
régression
desobserva-tions par rapport à des
polynômes orthogonaux
de la variabletemporelle.
On décritun
paramétrage qui
permet de diminuer le rang des matrices et des fonctions deco-variances, rendant ainsi
possible
une bonnedescription
de la structure de covariance*
Correspondence
andreprints:
Animal Genetics andBreeding Unit, University
of NewEngland, Armidale,
NSW 2351, Australia.avec peu de
paramètres.
Laprocédure
et le type de résultats qui peuvent être obtenus sont illustrés par unexemple
concernant lespoids
vifs adultes de vaches allaitantes.©
Inra/Elsevier,
Parisfonction de covariance / paramètres génétiques
/
données en séquence/
maxi-mum de vraisemblance restreint
/
modèle à régression aléatoire1. INTRODUCTION
Covariance functions have been
recognized
as a suitable alternative to the conventional multivariate mixed model to describegenetic
andphenotypic
vari-ation forlongitudinal
data,
i.e.typically
data with many,’repeated’
measure-ments per individual recorded over time.
They
areespecially
suited for traitswhich are
changing
with time so thatrepeated
measurements do notcompletely
represent
the same trait. Theexample
considered here forth isgrowth
of ananimal with
weights
taken at a number of ages, but theconcept
isreadily
applicable
to other characters and other continuous scales or ’meta-meters’. In essence, covariance functions are the ’infinite-dimensional’equivalent
tocovariance matrices in a
traditional,
’finite’ multivariateanalysis
[15].
As the nameindicates,
a covariance function(CF)
describes the covariance betweenrecords taken at certain ages as a function of these ages. A suitable function
is a
higher
orderpolynomial.
Thisimplies
that whenfitting
a CFmodel,
weneed to estimate the coefficients of the
polynomial
instead of the covariancecomponents
in a finite-dimensionalanalysis.
The number of coefficientsrequired
is determined
by
the order of fit of thepolynomials.
A
finite-dimensional,
multivariateanalysis
isequivalent
to ’full fit’ CFanalysis
where the order of fit isequal
to the number of agesmeasured,
i.e. thecovariance matrices for the ages in the data
generated
by
the estimated CFsare
equal
to the estimates that would have been obtained in aconventional,
multivariate
analysis.
Inpractice,
however,
a reduced order fit often suffices. This reduces the number ofparameters
to be estimated and thussampling
errors,
resulting
in asmoothing
of the estimated covariance structure.Kirkpatrick
et al.[15, 16]
modelled CFsusing
orthogonal
polynomials
ofage,
choosing Legendre polynomials.
Let E denote a covariance matrix of sizeq x q, and 4i of size q x k the matrix of
orthogonal polynomials
evaluated atthe
given
ages with elements!2! _ !!(ti),
thejth polynomial
for the ith aget
i
.
The order of fit of the CF isgiven by
k <
q. This allows the covariance matrix to be rewritten as E= 4iK4i’
with K =f
k
i
j
a
matrix ofcoefficients,
and
gives
CFpowers of
t
m
and A the matrix ofpolynomial
coefficients. Thisgives
the mthrow of 4) as
4!!
=t
m
A.
For instance for k = 3 andLegendre polynomials
and
Thus
equation
(1)
can be rewritten as)
i
, t
m
(t
0
=t!AKA’t!
=t
mo
t§,
i.e.the coefficient matrix !2 with elements c!2! is obtained from K
by including
theterms of the
polynomial
chosen.Kirkpatrick
et al.[15]
described ageneralized
least-squares procedure
todetermine the coefficients of a CF from an estimated covariance matrix.
Often, however,
this is not available orcomputationally
expensive
to obtain.Meyer
and Hill[24]
showed that the coefficients of CFs can be estimateddirectly
from the databy
restricted maximum likelihood(REML)
through
asimple
reparameterization
ofexisting,
’finite-dimensional’ multivariate REMLalgorithms.
For thespecial
case of asimple
animal model withequal
design
matrices
computational requirements
were restricted to the order of fit of thegenetic
CF. In thegeneral
case,however,
theirapproach required
a multivariatemixed model matrix
proportional
to the number of ages in the data to be set up andfactored,
even for a reduced order fit. Thisseverely
limitedpractical
applications, especially
for data with records at ’allages’.
Polynomial
regressions
have been used to describe thegrowth
of animals for along
time[35],
butonly
recently
has there been interest in randomregression
(RR)
models. These haveby
andlarge
beenignored
in animalbreeding applications
sofar, although they
are common in other areas; see,for
instance,
Longford
[19]
for ageneral
exposition.
RRs in a linear mixedmodel context have been considered
by
Henderson!9!.
Jennrich and Schluchter[12]
included the ’random coefficients’ model in their treatment of REML and maximum likelihood estimation for unbalancedrepeated
measures modelswith structured covariance matrices. Recent
applications
include thegenetic
evaluation ofdairy
cattleusing
testday
records([10,
11,
14]
Van der Werfet al.
unpublished),
and thedescription
ofgrowth
curves inpigs
[2]
and beef cattle!33!.
This paper describes an alternative
procedure
for the estimation ofcovari-ance functions to that
proposed by Meyer
and Hill[24],
which overcomes thelimitations discussed above. It is shown that the CF model is
equivalent
to aRR model with
polynomials
of age asindependent variables,
and that REMLestimates of the coefficients of the CF can be obtained as covariances among the
regression
coefficients. A mechanism is described to restrict the rank of the estimated covariance matrices(of regression coefficients)
and thus theCFs,
2. ESTIMATION OF COVARIANCE FUNCTIONS
2.1. Model of
analysis
2.1.1. Finite-dimensional model
Consider an animal model
with y2! the observation for animal i at time
j,
a2! and rij thecorresponding
additivegenetic
andpermanent
environmental effects due to theanimal,
respectively,
Eij the measurement error(or
temporary
environmentaleffect)
pertaining
to y2! and F some fixed effects.Furthermore,
lett
ij
denote the age(or
equivalent)
at which y2! isrecorded,
and assume there are qi records foranimal i and a total
of
q different
ages in the data.Commonly,
under a ’finite-dimensional’ model ofanalysis,
datarepresented
by equation
(2)
areanalysed
eitherassuming
measures at different ages aredifferent
traits,
i.e.carrying
out aq-dimensional,
multivariateanalysis,
orfitting
the so-calledrepeatability
model,
i.e.assuming
aij =a
i and rij =i for
all j
=1, ... ,
q
i and
carrying
out a univariateanalysis.
In theformer, fully
parametric
case, covariance matrices are taken to be unstructured.Fitting
a covariance function
model, however,
weimpose
some structure on the covariance matrices. Thisimplies
theassumption
that the series of(up
to)
q measurementsrepresents
k different ’traits’ orvariates,
with1 <_
k <qdenoting
the order of fit of the covariance function. 2.1.2. Random
regression
modelAs shown
below,
the covariance function model isequivalent
to a ’randomregression’
modelfitting
functions of age(or
equivalent)
as covariables..
Kirkpatrick
et al.[15, 16]
used the well-knownLegendre polynomials
(see,
for instance Abramowitz andStegun
[1])
infitting
covariance functions. These have a range of —1 to 1.Let tij
denote thejth
age for animal i standardizedto this
interval,
and let0
,(t* )
be the mthLegendre polynomial
evaluated fort!..
We can then rewriteequation
(2)
as a RR modelwith aim and !y2.&dquo;,
representing
the mth additivegenetic
andpermanent
environmental random
regression
coefficients for animali,
respectively,
andk
A
andk
R
denoting
therespective
orders of fit.This formulation
(3)
implies
that the vectorof
q breeding
values in a’finite-dimensional’,
multivariateanalysis
isreplaced
by
the vector ofk
A
additivegenetic,
randomregression
coefficients.Note, however,
that withk
A
chosenemployed
as an effective tool to reduce the number of traits to be handled(and
breeding
values to bereported)
for ’traits’ measured over a continuous timescale such as
weights
(e.g.
birth, weaning, yearling,
final and matureweight)
in beef cattle or test
day
records fordairy
cows.Moreover,
the RR model(3)
yields
adescription
of the animal’sgenetic potential
for thecomplete
timeperiod
considered,
forinstance,
an estimate of thegrowth
or lactation curve.2.1.3. Covariance structure
The covariance between two records for the same animal is then
Generally
measurement errors are assumed to be i.i.d. with varianceQE
,
so thatCov(e2!,e2!!)
=a2
for j
= j’
and 0otherwise,
but otherassumptions,
such asheterogeneous
variances orautoregressive
errors, arereadily
accommodated.Clearly,
the first two terms inequation
(4)
are CF with the covariancesbetween random
regression
coefficientsequal
to the coefficients of thecorre-sponding
covariance functions(24!,
seeequation
(1)
above,
i.e. the RR modelis
equivalent
to a CF model.Conversely,
the RR modelprovides
an alternativestrategy
to estimate CFs. While the REMLalgorithm
describedby Meyer
and Hill[24]
required
mixed modelequations
of sizeproportional
to the totalnum-ber of ages q to be set up and factored in the
general
case,requirements
under theequivalent
randomregression
model areproportional
to the orders offit,
k
A
andk
R
.
Hence,
thisapproach
offersconsiderably
more scope to handle datacoming
in ’at allages’
and should beespecially advantageous
fork
A
ork
R
« q.2.1.4. Fixed effects
In
fitting
a RR model it isgenerally
assumed thatsystematic
differencesin age are taken into account
by
the fixed effects in the model ofanalysis.
Inmost cases, these include a fixed
regression
of the same form as the randomregression
(e.g. (9, 11, 12!),
which can bethought
of asmodelling
thepopulation
trajectory,
while the randomregressions
for each animalrepresent
individuals’ deviations from this curve.2.2. REML estimation
Considering
allanimals, equation
(3)
can be written in matrix form aswith y the vector of N observations measured on
N
D
animals,
b the vectorcoefficients
(N
A
>
N
D
denoting
the total number of animals in theanalysis,
including
parents
withoutrecords),
y the vector ofk
R
xN
D
permanent
environmental random
regression coefficients,
e the vector of N measurementerrors, and
X,
Z* andZ
z
denoting
thecorresponding ’design’
matrices. HereZD
is the non-zeropart
of Z*(for
k
A
=h
R
),
i.e. thepart
of Z*corresponding
to animals in the data. Thesuperscript
’*’ marks matricesincorporating
orthogonal polynomial
coefficients.Assuming
y is ordered foranimals,
ZD
isblockdiagonal,
the block for animal i is ofdimension q
i
xk
R
,
and has elements
øm(tij).
Note that each observationgives
rise tok
R
(or
k
A
for
Z
*
)
non-zero elements rather than asingle
element of 1 in theusual,
finite-dimensionalmodel,
i.e. thedesign
matrices areconsiderably
denser than in the latter case.Let
K
A
with elementsK
Amt
=Cov(am, a
l
)
andK
R
with elementsK
Rml
=
Cov(
7
,,,,,y
i
)
denote the coefficient matrices for the additivegenetic
andper-manent environmental covariance functions A and
R, respectively.
In termsof
analysis,
this isanalogous
totreating
RR coefficients as correlated ’traits’.Assume that the fixed
part
of the model accounts forsystematic
ageeffects,
so that a N
N(0, K
A
0A)
and y -N(O,
K
R
0I
ND
)
’
and that a and y areuncorrelated. For
generality,
letV(
E
)
=R,
but assume R isblockdiagonal
for animals with blocksequal
to submatrices of the q xq matrix Sg.
The mixed model matrixpertaining
toequation
(5)
is thenwhere A is the numerator
relationship
matrix betweenanimals,
IN
is anidentity
matrix of sizeN,
and Q9 denotes the direct matrixproduct.
M* hasN
F
+kAN
A
+k
R
N
D
+ 1 rows and columns(with
N
F
being
the total number of levels of fixed effectsfitted),
i.e. its size and thuscomputational requirements
are
proportional
to the order of fit of the CFs. For R =o, 6 21, o!
can be be factored from
M
*
,
resulting
in a matrix which can be set up as for a univariateanalysis.
Estimates of the distinct elements of
K
A
andK
R
and theparameters
determining
E
E
can be obtainedby REML,
applying
existing procedures
for multivariate
analyses
under a ’finite’ model. This may involve asimple,
derivative-free
algorithm
!21)
or, moreefficiently,
a methodutilizing
information from derivatives of thelikelihood,
such as Johnson andThompson’s
[13]
’average
information’algorithm;
see Madsen et al.[20]
orMeyer
[22]
for adescription
of the latter in the multivariate case.While true measurement errors are
generally
assumed to bei.i.d.,
there may be cases in which we need to allow forheterogeneous
variances or correlationsbetween
’temporary’
environmental effects. This may, to someextent,
com-pensate
forsuboptimal
orders of fit forpermanent
environmental orgenetic
autocorrelation p for measurement errors
following
astationary
timeseries,
forwhich
V(y)
is non-linear and for which derivatives are thus notstraightforward
to evaluate. In theseinstances,
atwo-step
procedure combining
aderivative-free search
(e.g.
aquadratic
approximation)
for the ’difficult’parameter(s)
with an average information
algorithm
to maximizelog G
withrespect
to the ’linear’parameters
can beenvisaged.
A similarstrategy
has beenemployed
by Thompson
[32]
inestimating
theregression
on maternalphenotype
as wellas additive
genetic
and environmentalcomponents
of variance.Alternatively,
estimation may be carried out in aBaysian
frameworkusing
a Monte Carlobased
technique,
see Varona et al.[33]
for anapplication
in a linear RR model.Calculation of the
log
likelihood(G)
requires
factoring
M* to calculate thelog
determinant of the coefficient matrix(log [C
*
[)
and the residual sums ofsquares
(y’P
*
y)
(see
Meyer
[21]
fordetails).
The likelihood is thenFor i.i.d. measurement errors, the error variance can be estimated
directly
asQE
=y’P
*
y/(N -
r(X)),
as for univariateanalyses.
2.2.1. Extensions to other models
So far
only
the case of asimple,
’univariate’ animal model has beenconsidered. More
complicated models,
however,
arereadily
accommodatedin the framework described. For
instance,
additional random effects such asmaternal
genetic
effects or litter effects can be taken into accountanalogously
by modelling
each as a series of randomregression
coefficients. Correlations between randomeffects,
e.g. non-zero direct-maternalgenetic
covariances,
canbe modelled
by allowing
for covariances between therespective regression
coefficients,
which thenyield
a CFdescribing
the covariance between randomeffects over time.
Similarly,
’multivariate’ CF[24]
for series of measurements for different traits(e.g.
height
andweight
measured at differenttimes)
can be estimatedsimply
by fitting
sets of RR coefficients for each trait andallowing
for covariances betweencorresponding
sets for different traits. Anexpectation-maximization
type
algorithm
for a bivariateanalysis
under a RR model hasrecently
beendescribed
by
Shah et al.!30!.
As mentionedabove,
avariety
ofassumptions
about the structure of the
within-individual,
temporary
environmentalcovari-ance matrices can be
accommodated;
see, forinstance,
Wolfinger
[36]
for adescription
of somecommonly
used models.2.3. Reduced rank covariance functions
For q correlated
measurements,
the informationsupplied
(or
most ofit)
can
generally
be summarized as a set of k Gq linear
combinations. Thesecan be determined
by
asingular
valuedecomposition
of thecorresponding
eigenvalues
with the remainder(q-k)
being
small or zero.Setting
the latter tozero and
backtransforming
(by
pre- andpostmultiplying
thediagonal
matrix ofeigenvalues
with the matrix ofeigenvectors
and itstranspose,
respectively)
thenyields
amodified,
reduced rank covariance matrix. Inestimating
covariancematrices,
this could be used to reduce the number ofparameters
to be estimated and thussampling
variation. Aparameterization
to the elements of theeigenvalue decomposition
andsetting
eigenvalues
k +1, ... ,
q and
thecorresponding
eigenvectors
to zero would achieve this but reduce the numberof
parameters
to be estimatedonly
for k <q/2.
Though
notperceived
for thisexplicit
purpose, the’symmetric
coefficients’ CF model of[15]
provides
analternative way of
estimating
reduced rank covariance matrices[22].
As outlined
by Kirkpatrick
et al.[15],
there is anequivalent
to theeigen-value
decomposition
of covariance matrices for covariancefunctions,
with acorresponding
interpretation.
Estimates of theeigenvalues
of a CF fitted toorder k are
simply
theeigenvalues
of thecorresponding,
estimated matrix of coefficients(K).
Similarly,
estimates of theeigenfunctions
of aCF,
theinfinite-dimensional
equivalent
toeigenvectors,
can be obtained from theeigenvectors
of K. Let vi denote the ith
eigenvector
of K with elements Vij and0;(t
*
)
thejth
orderLegendre
polynomial.
The itheigenfunction
of the CF is then[15]
Note that
0;(t
*
)
is not evaluated for anyparticular
age, but includespolynomi-als of the standardized age t*.
Hence, *
i
is acontinuous, polynomial
functionin t*. As discussed
by
Kirkpatrick
et al.[15],
eigenfunctions
ofgenetic
CFare
especially
ofinterest,
asthey
represent
possible
deformations of the mean(growth)
trajectory
which can be effectedby selection,
while thecorrespond-ing
eigenvalues
describe the amount ofgenetic
variation in that direction. Inparticular,
theeigenfunction
associated with thelargest eigenvalue
gives
the direction in which the meantrajectory
willchange
mostrapidly.
Fitting
a CF to order krequires
k(k
+
1)/2
coefficients,
i.e. covariances between randomregression
coefficients,
to beestimated,
andgives
estimates of the first keigenfunctions
andeigenvalues
of the CF. In someinstances,
one or severaleigenvalues
of the CF may be close to zero or smallcompared
to the other
eigenvalues.
Thisimplies
that werequire
a kth order fit tomodel the
shape
of the(growth)
curveadequately,
but that a subset of mdirections
(= eigenfunctions)
suffices. In otherwords,
wemight
obtain a moreparsimonious
fit of the CFby
estimating
a reduced rank coefficientmatrix,
forcing k -
meigenvalues
of K to be zero.Consider the
Cholesky decomposition
ofK, pivoting
on thelargest diagonal
ith element of
D, di,
can beinterpreted
as the conditional variance of variablei,
given
variables1, ... ,
i -1. Areparameterization
to the non-zerooff-diagonal
elements of L and the
diagonal
elements of D has been advocated for REML estimation of covariancecomponents
to remove constraints on theparameter
space or
improve
rate of convergence in an iterative estimation scheme[6,
18,
25!.
Otherparameterizations
in thiscontext,
based on theeigenstructure
of thecovariance
matrix,
have been consideredby
Pinheiro and Bates!26!.
An alternative form of the
Cholesky
decomposition
is K =L
*
L
*
’
whereL *
has
diagonal
elements 1*i
=!2.
L* is ofteninterpreted
asK
1/2
.
Theeigenvalues
of the power of a matrix areequal
to the power of theeigenvalues
of thematrix,
and theeigenvalues
of atriangular
matrix areequal
to itsdiagonal
elements(5!.
Hence,
the estimate of K can be forced to have rank mby
assuming
elements
d,,,,
+1
tod
k
inequation
(9)
are zero(elements d
i
are assumed to bein
descending
order).
Thisyields
a modified matrixThe
vectors l
i
corresponding
to the zeroi
d
are then notneeded,
i.e.K
+
is describedby
km —m(m — 1)/2
parameters,
m elementsd
i
and(k 1)m
-m(m — 1)/2
elements ofl
ij
(j
>i)
of thel
i
.
Clearly,
this is notequivalent
tofitting
a(full rank)
CF to the order m(which
would involvem(m
+1)/2
parameters) -
for instance for k = 4 and m = 2 we fit a cubicregression
assuming
there areonly
twoindependent
directions in which thetrajectory
is
likely
tochange,
while for k = m = 2 we fit a linearregression.
Strictly speaking,
equation
(6)
has to be of full rank.Hence,
forpractical
computations, d
i
are set to a smallpositive
value(e.g.
10-
4
).
Alternatively,
aREML
algorithm
which allows for a semipositive
definite covariance matrixof random effects could be
employed,
c.f. Harville[8]
orFrayley
and Burns!3!.
Obviously,
thisparameterization
can also be used to estimate reduced rank covariance matrices forfinite-dimensional,
multivariateanalyses.
3. APPLICATION
3.1. Material and methods
Meyer
and Hill[24]
fitted covariance functions toJanuary
weights
of 913 beefcows,
weighed
from 2 to 6 years of age, 2 795 records in total with up tofive records per cow available. Their
analysis
used age atweighing
in yearsand fitted measurement errors and fixed effects for each age
separately.
Thesedata were
re-analysed
using
the randomregression
model andfitting
age atweighing
in months.Analyses
were carried outusing
program DxMRR[23],
employing
a derivative-freealgorithm
to maximizelog
G.There were a total of 22 ages in the
data,
ranging
from 19 to 70 months.Figure
1gives
the meanweight
and number of records for each age class.Anal-yses were carried out
fitting
aseparate
measurement error variancecomponent
weighing
subclasses(86
levels),
year of birth effects(16
levels)
and a cubicre-gression
on age atweighing.
The model for fixed effects was’univariate’,
i.e.the effects were assumed to be similar for cows of all ages. Additive
genetic
andpermanent
environmental covariance functions were fitted to the sameorder
throughout
(k
A
= k
R
=k).
Orders of fit consideredranged
from 1 to 6.In
addition,
the usual’repeatability
model’ wasfitted,
i.e. a CF model withk = 1 and a
single
measurement errorvariance,
assumed to be the same for allages.
For each order of
fit,
the number of non-zeroeigenvalues
allowed for each coefficient matrix to be estimated was set to the same value(r)
forK
A
andK
R
,
considering
values of r < k of 1 to 3. In severalinstances, analyses
resulted in estimates ofK
R
with one smalleigenvalue.
In these cases, the rank ofK
R
was reduced
by
one. In every case, thisyielded
a furtherimprovement
inlog
G when
continuing
theanalysis,
i.e. earlier convergence had been to a falsemaximum as the search
procedure
had become ’stuck’ at the bounds of theparameter
space.In
general, analyses
took a considerable time to converge,markedly longer
for an order of fit of k than for a
comparable k-variate,
finite-dimensionalanalysis. Furthermore,
several restarts wererequired
for eachanalysis
beforelikelihoods stabilized.
Convergence
wasespecially
slow whenattempting
to estimate’unnecessary’
parameters,
i.e. an order of fit or rank of CF with one3.2. Results
3.2.1. Likelihoods
Maximum likelihood values from all
analyses together
witheigenvalues
of the estimated coefficient matrices and estimates of the measurement errorvariances are summarized in table 7.
Clearly,
arepeatability
model(first line)
wasinappropriate
in this case, estimates ofo,2
(for
k =1)
being considerably
higher
for older cows than for2-year-old
cows.Forcing
the rank r of anestimated coefficient matrix - and thus CF - to be 1 is
equivalent
to theassumption
that allcorresponding
correlations have a value ofunity.
For r = 1and k >
1,
thehigher
order coefficients then modelheterogeneity
of variances. For all orders offit, increasing
r from 1 to 2yielded significant
increases inlog
G.Increasing
r further to3, however,
did not raiselog
Gsignificantly,
increases of 5.33(k
=5)
and 5.75(k
=6)
being
outweighed by
the numberof additional
parameters
(6
and8,
respectively).
For r =2,
log G
increasedsignificantly
until k reached 4. Estimated CFs for this model werewith
t
i
denoting
the ith standardized age. Whenfitting polynomials,
however,
we should also examine an order of fit of k +
2,
i.e. the next order of fit with the sametype
(even
versusuneven)
ofexponent,
as anodd-degree polynomial
is
likely
to contribute little when aneven-degree polynomial
fits the data(4!.
Indeed,
whileadding
aquartic
coefficient(k = 5)
did not increaselog
Gsignificantly
over a cubicpolynomial
(k
=4),
fitting
aquintic
term(k
=6)
did.
Fitting
years rather than months of age as meta-meter for theCFs, Meyer
and Hill[24]
foundlog G
not to increasesignificantly beyond
k = 2. As shown infigure
1,
there wasonly
a smallspread
of ages within each year, i.e. thechange
in scale wasexpected
to have little effect on estimates. The modelemployed by
Meyer
and Hill!24!,
however,
wasmultivariate,
i.e. fittedseparate
fixed effects for each year of age.Presumably,
thestronger
trend in covariances observed inthis
analysis
can be attributed to less variationbeing
removedby
fixed effects. Inaddition,
likelihood ratio tests were carried outconsidering
full rank CFs and the associated number ofparameters.
3.2.2.
Phenotypic
variationestimates of
QE
over years. Estimates for k > 3 weresimilar,
the cubic term for k > 4causing
estimates for6-year-old
cows to risesharply.
As shown in tableI,
this wasaccompanied
by
estimates ofa;
5
of zero, i.e.presumably
to some extentdue to a restriction
imposed
on theparameter
space,forcing
QE
>
0.Except
for these last age
classes,
estimatesagreed closely
with those from afinite-dimensional multivariate
analysis treating
records at different years of age asseparate
traits,
which were40.0,
60.5,
65.8,
67.3 and 71.0(kg)
for average ages of20.3, 32.3, 44.3,
56.2 and 68.2months, respectively
[24].
3.2.3.
Eigenfunctions
As for
phenotypic
standarddeviations,
estimates of the firsteigenvalue
(table I!
andcorresponding
eigenfunction
ofA
(figure
3)
were similar for orders of fit k > 3. Values of theeigenfunction
for individual ages wereroughly
proportional
to thegenetic
variance for the age, as determined from thecorresponding
estimate ofgenetic
CF,A.
Positive valuesthroughout
imply
that selection for increasedweight
at any age islikely
to increaseweight
atall other ages. Estimates for the second
eigenvalue
and-function, however,
changed considerably
from k =3,
4 to k =5, 6,
inparticular
for r = 3. Thiswas
accompanied
by
asignificant
increase inlog
G from k = 4 to k = 5 andfrom k = 5 to k = 6 for r = 3 but not r = 2.
Eigenvalues
of estimates of A andR up to orders of fit of k = 4 were similar to those
reported by
Meyer
and Hill!24!,
suggesting
that the third sizeableeigenvalues
appearing
ork )
5might
be an artefact of the order of thepolynomial
function and the ’univariate’ fixed3.2.4. Genetic covariances
Figure
4 shows genetic
covariances obtained from estimates of ,A for k =3,
6(r
=3),
evaluated for therange of ages
spanned
by
data. For k = 3 and k =4,
the
resulting
surfaces were verysmooth, clearly following
aquadratic
and cubicfunction,
while fork >
5 surfaces became’wiggly’, following
the data moreclosely.
Apeak
ingenetic
variance at 4 years of age with asubsequent
declinewas consistent with estimates of
672,
1277, 2 221, 722
and 1988(kg
2
)
for ages2,
3, 4,
5 and 6 years,respectively,
from afinite-dimensional,
multivariateanalysis
[24].
Corresponding
genetic
correlations for k = 4 and 6 are shown infigure
5.For k =
4,
the surface was smooth with a’plateau’
close tounity
for agesfrom 3 years
onwards,
and correlations betweenweights
at 2 years and laterages
decreasing
withincreasing
time between measurements. This agrees withour
biological expectations
for the trait under consideration. Incontrast,
correlations for k = 5
(not
shown)
and k =6,
fluctuatedconsiderably,
especially
at the
edges
of thesurface, greatly
magnifying sampling
variation in thegenetic
covariances for these orders offit,
exhibited infigure
lf.
3.2.5. Permanent environmental variation
As shown in
figure
6,
estimates of 7Zproduced
permanent
environmental varianceswhich,
except
forweights
at 6 years of age, increasedconsiderably
lessover time than their
genetic
counterparts.
Moreover,
surfaces wereconsiderably
smoother for
higher
orders of fit(k
>4).
For the last ages in thedata,
estimates of thepermanent
environmental variances increaseddramatically,
and
causing
the increase in estimatedphenotypic
standard deviation observed above. While observation in the four age classes concerned were more variablethan in the earlier years, this appears to be
spurious
and canonly
be attributed to smaller numbers of records and thelarge
influence datapoints
at the4. DISCUSSION
There is a wealth of statistical literature on
modelling
of’repeated
records’in
general,
and structured covariance matrices inparticular
(see,
forinstance,
Lindsey
[17]
or Vonesh and Chinchilli[34]
and their extensivebibliographies).
Applications
in animalbreeding,
however,
have untilrecently
been limited tothe extremes of a
simple repeatability
model or anunstructured,
multivariatemodel. An added
complication
ofquantitative genetic analyses,
not sharedby
otherfields,
is the fact that we want topartition
the betweensubject
variation into itsgenetic
and environmentalcomponents.
Covariance function and random
regression
models enable us to model the covariance structure of such records moreadequately,
alleviating
theproblems
associated with anoversimplification
(repeatability model)
or anover-parameterization
(multivariate
model)
for traits whichchange gradually
along
some continuous
scale,
such as time.Moreover,
each source ofvariation, genetic
orenvironmental,
can be modelled. While aregression
model isusually thought
of in the context of
modelling
trends in means, covariance functions have beenperceived
to describe and smoothdispersion
matrices. Whenregression
coefficients are treated as
random, however, they implicitly impose
a covariance structure among the observations. Thus the twoapproaches
converge andequivalent
to aspecial
class of randomregression
models. While not commonpractice,
covariance functions can be formulated for the sources of variation modelledby
randomregression
coefficients in any RR model.Regression models,
fixed orrandom, require
someassumptions
about theparametric
form of theregression equation.
Regressing
onorthogonal
polyno-mials of time
(or
any othermeta-meter)
in order to fitKirkpatrick
et al.’s[15]
CFmodel,
only
invokes theassumption
that thetrajectory
to be modelled canbe described
by
suchpolynomials.
As shownabove,
in a maximum likelihood framework we can let the data determine the order ofpolynomial
fitrequired
and the rank of the CF which suffices.
Likelihood ratio tests carried out to determine the order of fit of covariance functions
required
should account for the fact that we are at theboundary
of theparameter
space and thus have ’non-standard’ conditions where the likelihoodratio test criterion A does not follow a
X
Z
distribution!29!,
whentesting
whetheradditional random
regression
coefficientsmight
have zero variance. Stram and Lee[31]
considered tests of variancecomponents
forlongitudinal
data in amixed model.
They
showed that thelarge
sample
distribution of A is a 50:50 mixture ofx2
distributions with
q and
q + 1degrees
offreedom,
respectively,
when we test the
hypothesis
that a matrix D is a q x qpositive
definite matrixagainst
thehypothesis
that it is a(q
+1)
x(q
+1)
matrix. Hence ’naive’tests,
assuming A
has ax2
distribution
withdegrees
of freedomequal
to the number ofparameters
tested(q+1)
are too conservative. Stram and Lee[31]
arguethough
that for suchsimple
tests(of
one additionalparameter),
resulting
biases arelikely
to be small. Inpractical applications,
the errorprobability
a has been’doubled’ to account for this
conservatism,
i.e. A has been contrastedagainst
X!+1,2Ü’
instead ofX
q
+
,,,,,
[27].
Alternatively,
we can decide on an order of fit apriori
or decideonly
touse the first n
eigenfunction
of the CF to describe the covariance structure.This choice may
depend
on thecomputational
resourcesavailable,
or may beinfluenced
by
theknowledge
thathigher
orderpolynomials
are notorious formagnifying
smallsampling
errors andproducing ’wiggly’
functions. Even if this choice does not maximize thelikelihood,
we arelikely
to obtain moreappropriate
estimates than under thesimple repeatability
model when thisclearly
does notapply,
both in terms of estimates ofgenetic
parameters
andprediction
ofbreeding
values. Further research isrequired
todevelop guidelines
as how to choose the ’best’ model for
particular
situations.Great care has to be taken when
defining
andinterpreting
RR models. As shown in the numericalexample,
datapoints
at the extremes of theindependent
variable can have abig
influence on estimates ofregression
coefficients, yielding
quite
erratic estimates of CF or measurement error variances. Various authorsestimated
genetic
parameters
for testday
milkyields
ofdairy
cowsfitting
arandom
regression
model. In most cases,resulting heritability
estimates werehigh
at thebeginning
and end of lactation and low in the middle oflactation,
in marked contrast to
previous
estimates under a ’finite-dimensional’model,
and thus
regarded
withjustified
scepticism
([10, 14];
Van der Werf etal.,
unpublished).
Thisemphasizes
thesensitivity
of RR models to the effects and orders of CFs fitted. It has to be stressed thatjudicious
modelling
of fixed and randomeffects,
careful choice of the orders of fit ofCFs,
and meticuluousAs
emphasized
by
Kirkpatrick
et al.!15!,
many families of functions can beemployed
in the estimation of CF - while the choice oforthogonal
function does not affect the estimate of the CF at the ages in thedata,
it does affectinterpolation. Following
theirchoice,
we have usedLegendre polynomials.
These, however,
generate
aweight
function withcomparatively
heavy
emphasis
on records at the outer
parts
of the interval for whichthey
are defined(compared
to,
say,weights following
a kernel from a Normaldistribution)
(!7!;
section3.3).
Other functionsmight
thus be more suitable.Alternatively,
some
scaling
of the observed ages can beenvisaged,
forinstance,
alogarithmic
transformation should reduce the
impact
of few observations on very old animals.5. CONCLUSIONS
Random
regression
modelsprovide
a valuable tool to modelrepeated
records in animalbreeding adequately,
especially
if traits measuredchange
gradually.
They
allow covariance functions to be formulated which describegenetic
and environmental covariances among records over time.Moreover, they
impose
a structure on covariance matrices.
Fitting
regressions
onorthogonal
polyno-mials of time
(or equivalent)
we can estimategenetic
covariance functions assuggested by Kirkpatrick
et al.!15!,
whoseeigenvalues
andeigenfunctions
pro-vide aninsight
into the way selection islikely
to affect the meantrajectory
of the records considered and can be used to characterize differences between
populations,
e.g. breeds of animals.6. SOFTWARE
A program is available for the estimation of covariance functions
by
REML,
fitting
a randomregression
animalmodel,
as described above. ’DxMRR’ iswritten in Fortran 90 and is
part
of the DFREMLpackage,
version3.0;
seeMeyer
[23]
for further details. This is contained on the CD-ROM of the Sixth WorldCongress
on GeneticsApplied
to LivestockProduction,
or can be downloaded from the DFREML home page athttp:
//agbu.une.edu.au/&dquo;kmeyer/dfreml.html.
ACKNOWLEDGEMENTS
This work was
supported through
grants from the BritishBiotechnology
andBiological
Sciences Research Council(BBSRC)
and the Australian Meat ResearchCouncil
(MRC).
REFERENCES
[1]
AbramowitzM., Stegun LA.,
Handbook of MathematicalFunctions, Dover,
NewYork,
1965.[2]
AndersonS.,
PedersenB.,
Growth and food intake curves forgroup-housed
[3]
Frayley C.,
BurnsP.J., Large-scale
estimation of variance and covariancecomponents, SIAM J. Sci.
Comp.
16(1995)
192-209.[4]
Graybill F.A.,
An Introduction to Linear Statistical Models. Volume I,McGraw-Hill, Inc.,
NewYork,
1961.[5]
Graybill F.A.,
Matrices withApplications
inStatistics,
2nded.,
WadsworthInc.,
1983.[6]
Groeneveld E., Areparameterisation
toimprove
numericaloptimisation
inmultivariate REML
(co)variance
component estimation, Genet. Sel. Evol. 26(1994)
537-545
[7]
HardleW., Applied Nonparametric Regression, Cambridge University Press,
1990.[8]
HarvilleD.A.,
Maximum likelihoodapproaches
to variance component estima-tion and relatedproblems,
J. Am. Stat. Assoc. 72(1977)
320-338.[9]
Henderson C.R.Jr.,
Analysis
of covariance in the mixed model:Higher-level,
non-homogeneous
and randomregressions,
Biometrics 38(1982)
623-640.[10]
JamrozikJ.,
SchaefferL.R.,
Estimates ofgenetic
parameters for a testday
model with randomregressions
forproduction
of first lactationHolsteins,
J.Dairy
Sci. 80,
1997)
762-770.[11]
JamrozikJ.,
Schaeffer L.R., DekkersJ.C.M.,
Genetic evaluation ofdairy
cattleusing
testday yields
and randomregression model,
J.Dairy
Sci. 80(1997)
1217-1226.
[12]
JennrichR.L,
SchluchterM.D.,
Unbalancedrepeated-measures
models with structured covariance matrices, Biometrics 42(1986)
805-820.[13]
JohnsonD.L., Thompson
R., Restricted Maximum Likelihood estimation of variance components for univariate animal modelsusing
sparse matrixtechniques
andaverage
information,
J.Dairy
Sci. 78(1995)
449-456.[14]
KettunenA., Mdntysaari E.,
StrandenL,
P6s6J.,
Genetic parameters for testday
milkyields
of FinnishAyrshires
with randomregression model,
J.Dairy
Sci. 80Suppl. 1,
197(1997)
(abstr.).
[15]
Kirkpatrick M.,
LofsvoldD.,
BulmerM., Analysis
of theinheritance,
selec-tion and evolution of
growth trajectories,
Genetics 124(1990)
979-993.[16]
Kirkpatrick M.,
HillW.G., Thompson R., Estimating
the covariance struc-ture of traitsduring growth
andaging,
illustrated with lactations indairy cattle,
Genet. Res. 64
(1994)
57-69.[17]
Lindsey J.K.,
Models forRepeated
Measurements, Oxford Statistical ScienceSeries,
ClarendonPress, Oxford,
1993.[18]
LindstromM.J.,
BatesD.M., Newton-Raphson
and EMalgorithms
for linear mixed-effects models forrepeated-measures data,
J. Am. Stat. Assoc. 83(1988)
1014-1022.
[19]
Longford N.T.,
Random CoefficientModels,
Oxford Statistical ScienceSeries,
Clarendon