Estimating covariance functions for longitudinal data using a random regression model

(1)

HAL Id: hal-00894207

https://hal.archives-ouvertes.fr/hal-00894207

Submitted on 1 Jan 1998

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Estimating covariance functions for longitudinal data

using a random regression model

Karin Meyer

To cite this version:

(2)

Original

article

Estimating

covariance functions

for

_longitudinal

data

_using

a

random

regression

model

Karin

_Meyer

Institute of

Cell,

Animal and

_{Population Biology, Edinburgh University,}

West Mains

_{Road, Edinburgh}

EH9

3JT, Scotland,

UK

(Received

13

_{August 1997; accepted}

31 March

₁₉₉₈₎

Abstract - A method is described to estimate

_genetic

and environmental covariance functions for traits measured

_repeatedly

_perindividual

along

some continuous

scale,

such as

_time,

_directly

from the data

_by

restricted maximum likelihood. It relies

on the

_equivalence

of a covariance function and a random

_regression

model.

_By

regressing

on

_{random, orthogonal polynomials}

of the continuous scale

_variable,

the coefficients of covariance functions can be estimated as the covariances among the

regression

coefficients. A

_{parameterisation}

is described which allows the rank of estimated covariance matrices and functions to be

_restricted,

thus

_facilitating

a

_highly

parsimonious description

of the covariance structure. The

procedure

and the type of results which can be obtained are illustrated with an

_application

to mature

_weight

records of beef cows.

@

Inra/Elsevier,

Paris

covariance functions

_/

_geneticparameters

/

longitudinal data

_/

restricted maximum likelihood

_/

random regression model

Résumé - Estimation des fonctions de covariance de données en

séquence

_à

par-tir d’un modèle à coefficients de _régressionaléatoires. On décrit une méthode d’estimation des fonctions de covariance

_génétique

et non

_génétiques

_pourdes

carac-tères mesurés

_plusieurs

fois _parindividu le

_long

d’une échelle

_continue,

comme le

temps. Elle

_s’appuie

directement sur les données à

_partir

du maximum de vraisem-blance restreint, en considérant

l’équivalence

entre fonction de covariance et modèle de

régression

aléatoire. Les coefficients

_figurant

dans les fonctions de covariance _peuvent

être estimés comme des covariances entre les coefficients de

_régression

des

observa-tions par rapport à des

_{polynômes orthogonaux}

de la variable

_temporelle.

On décrit

un

_{paramétrage qui}

_permetde diminuer le _rangdes matrices et des fonctions de

co-variances, rendant ainsi

_possible

une bonne

_description

de la structure de covariance

*

Correspondence

and

_reprints:

Animal Genetics and

_{Breeding Unit, University}

of New

England, Armidale,

NSW 2351, Australia.

(3)

avec _peude

paramètres.

La

procédure

et le type de résultats _{qui peuvent}être obtenus sont illustrés par un

_exemple

concernant les

poids

vifs adultes de vaches allaitantes.

©

Inra/Elsevier,

Paris

fonction de covariance / paramètres génétiques

/

données en _séquence

_/

maxi-mum de vraisemblance restreint

_/

modèle à régression aléatoire

1. INTRODUCTION

Covariance functions have been

_recognized

as a suitable alternative to the conventional multivariate mixed model to describe

_genetic

and

_phenotypic

vari-ation for

_longitudinal

data,

i.e.

_typically

data with many,

’repeated’

measure-ments _perindividual recorded over time.

_They

are

_especially

suited for traits

which are

_changing

with time so that

_repeated

measurements do not

_completely

represent

the same trait. The

example

considered here forth is

growth

of an

animal with

_weights

taken at a number of _ages,but the

_concept

is

_readily

applicable

to other characters and other continuous scales or ’meta-meters’. In _essence,covariance functions are the ’infinite-dimensional’

equivalent

to

covariance matrices in a

_traditional,

’finite’ multivariate

_analysis

[15].

As the name

_indicates,

a covariance function

(CF)

describes the covariance between

records taken at _{certain ages}as a function of these _ages.A suitable function

is a

_higher

order

polynomial.

This

_implies

that when

_fitting

a CF

model,

we

need to estimate the coefficients of the

_polynomial

instead of the covariance

components

in a finite-dimensional

analysis.

The number of coefficients

required

is determined

_by

the order of fit of the

_polynomials.

A

_{finite-dimensional,}

multivariate

_analysis

is

_equivalent

to ’full fit’ CF

analysis

where the order of fit is

_equal

to the number of ages

measured,

i.e. the

covariance matrices for the _{ages in}the data

_generated

_by

the estimated CFs

are

_equal

to the estimates that would have been obtained in a

_{conventional,}

multivariate

analysis.

In

_practice,

_however,

a reduced order fit often suffices. This reduces the number of

_parameters

to be estimated and thus

_sampling

errors,

resulting

in a

smoothing

of the estimated covariance structure.

Kirkpatrick

et al.

_{[15, 16]}

modelled CFs

_using

_orthogonal

_polynomials

of

age,

choosing Legendre polynomials.

Let E denote a covariance matrix of size

q x _q,and 4i of size q x k the matrix of

_{orthogonal polynomials}

evaluated at

the

_given

ages with elements

_{!2! _ !!(ti),}

the

jth polynomial

for the ith _age

t

i

.

The order of fit of the CF is

_{given by}

_{k <}

_q.This allows the covariance matrix to be rewritten as E

= 4iK4i’

with K =

f

k

i

j

a

matrix of

coefficients,

and

_gives

CF

(4)

powers of

t

m

and A the matrix of

_polynomial

coefficients. This

_gives

the mth

row of 4) as

_4!!

=

t

m

A.

For instance for k = 3 and

Legendre polynomials

and

Thus

_equation

₍₁₎

can be rewritten as

)

i

, t

m

(t

0

=

t!AKA’t!

=

t

mo

t§,

i.e.

the coefficient matrix !2 with elements c!2! is obtained from K

by including

the

terms of the

_polynomial

chosen.

Kirkpatrick

et al.

_[15]

described a

_generalized

_{least-squares procedure}

to

determine the coefficients of a CF from an estimated covariance matrix.

Often, however,

this is not available or

_{computationally}

_expensive

to obtain.

Meyer

and Hill

_[24]

showed that the coefficients of CFs can be estimated

directly

from the data

_by

restricted maximum likelihood

_(REML)

_through

a

simple

reparameterization

of

_existing,

’finite-dimensional’ multivariate REML

algorithms.

For the

_special

case of a

_simple

animal model with

_equal

_design

matrices

_{computational requirements}

were restricted to the order of fit of the

genetic

CF. In the

_general

case,

however,

their

approach required

a multivariate

mixed model matrix

_proportional

to the number of ages in the data to be set up and

factored,

even for a reduced order fit. This

_severely

limited

practical

applications, especially

for data with records at ’all

_ages’.

Polynomial

regressions

have been used to describe the

_growth

of animals for a

_long

time

_[35],

but

_only

_recently

has there been interest in random

regression

(RR)

models. These have

by

and

_large

been

_ignored

in animal

breeding applications

so

_{far, although they}

are common in other _{areas; see,}

for

_instance,

_Longford

_[19]

for a

general

_exposition.

RRs in a linear mixed

model context have been considered

_by

Henderson

_!9!.

Jennrich and Schluchter

[12]

included the ’random coefficients’ model in their treatment of REML and maximum likelihood estimation for unbalanced

_repeated

measures models

with structured covariance matrices. Recent

applications

include the

_genetic

evaluation of

_dairy

cattle

_using

test

_day

records

_([10,

_11,

_14]

Van der Werf

et al.

_{unpublished),}

and the

_description

of

_growth

curves in

_pigs

_[2]

and beef cattle

_!33!.

This paper describes an alternative

procedure

for the estimation of

covari-ance functions to that

_{proposed by Meyer}

and Hill

_[24],

which overcomes the

limitations discussed above. It is shown that the CF model is

equivalent

to a

RR model with

_polynomials

_{of age}as

_{independent variables,}

and that REML

estimates of the coefficients of the CF can be obtained as covariances _among the

_regression

coefficients. A mechanism is described to restrict the rank of the estimated covariance matrices

_{(of regression coefficients)}

and thus the

_CFs,

(5)

2. ESTIMATION OF COVARIANCE FUNCTIONS

2.1. Model of

_analysis

2.1.1. Finite-dimensional model

Consider an animal model

with _y2!the observation for animal i at time

_j,

_a2!and _r_ij the

_{corresponding}

additive

_genetic

and

_permanent

environmental effects due to the

_animal,

respectively,

Eij the measurement error

(or

_temporary

environmental

effect)

pertaining

to _y2!and F some fixed effects.

_Furthermore,

let

_t

_ij

denote the _age

(or

equivalent)

at which _y2!is

_recorded,

and assume there are _q_i records for

animal i and a total

_of

_{q different}

_{ages in}the data.

Commonly,

under a ’finite-dimensional’ model of

analysis,

data

represented

by equation

(2)

are

_analysed

either

_assuming

measures at different _agesare

different

_traits,

i.e.

_carrying

out a

q-dimensional,

multivariate

analysis,

or

fitting

the so-called

_{repeatability}

_model,

i.e.

_assuming

_a_ij =

a

i and rij =i for

all j

=

_{1, ... ,}

q

i and

carrying

out a univariate

_analysis.

In the

_{former, fully}

parametric

case, covariance matrices are taken to be unstructured.

_Fitting

a covariance function

_{model, however,}

we

_impose

some structure on the covariance matrices. This

_implies

the

_assumption

that the series of

_(up

_to)

q measurements

_represents

k different ’traits’ or

_variates,

with

_{1 <_}

k _<q

_denoting

the order of fit of the covariance function. 2.1.2. Random

_regression

model

As shown

_below,

the covariance function model is

_equivalent

to a ’random

regression’

model

_fitting

functions of age

(or

equivalent)

as covariables.

.

Kirkpatrick

et al.

_{[15, 16]}

used the well-known

_{Legendre polynomials}

_(see,

for instance Abramowitz and

_Stegun

_[1])

in

_fitting

covariance functions. These have a _rangeof —1 to 1.

_{Let tij}

denote the

_jth

_agefor animal i standardized

to this

_interval,

and let

₀

_{,(t* )}

be the mth

_{Legendre polynomial}

evaluated for

t!..

We can then rewrite

_equation

₍₂₎

as a RR model

with aim and !y2.&dquo;,

representing

the mth additive

genetic

and

permanent

environmental random

_regression

coefficients for animal

_i,

_{respectively,}

and

_k

_A

and

k

R

denoting

the

_respective

orders of fit.

This formulation

₍₃₎

_implies

that the vector

_of

_{q breeding}

values in a

’finite-dimensional’,

multivariate

_analysis

is

_replaced

_by

the vector of

k

A

additive

genetic,

random

_regression

coefficients.

_{Note, however,}

that with

_k

_A

chosen

(6)

employed

as an effective tool to reduce the number of traits to be handled

_(and

breeding

values to be

_reported)

for ’traits’ measured over a continuous time

scale such as

_weights

(e.g.

_{birth, weaning, yearling,}

final and mature

_weight)

in beef cattle or test

_day

records for

_dairy

cows.

_Moreover,

the RR model

₍₃₎

yields

a

_description

of the animal’s

genetic potential

for the

_complete

time

period

considered,

for

_instance,

an estimate of the

_growth

or lactation curve.

2.1.3. Covariance structure

The covariance between two records for the same animal is then

Generally

measurement errors are assumed to be i.i.d. with variance

_QE

_,

so that

Cov(e2!,e2!!)

=

a2

for j

= j’

and 0

otherwise,

but other

_assumptions,

such _as

heterogeneous

variances or

_{autoregressive}

_errors,are

_readily

accommodated.

Clearly,

the first two terms in

_equation

₍₄₎

are CF with the covariances

between random

_regression

coefficients

_equal

to the coefficients of the

corre-sponding

covariance functions

_(24!,

see

equation

(1)

above,

i.e. the RR model

is

_equivalent

to a CF model.

_Conversely,

the RR model

provides

an alternative

strategy

to estimate CFs. While the REML

_algorithm

described

_{by Meyer}

and Hill

_[24]

_required

mixed model

_equations

of size

_proportional

to the total

num-ber of _{ages q}to be set _upand factored in the

_general

_case,

_requirements

under the

equivalent

random

_regression

model are

proportional

to the orders of

_fit,

k

A

and

_k

_R

_.

_Hence,

this

_approach

offers

_considerably

more _scopeto handle data

coming

in ’at all

_ages’

and should be

_{especially advantageous}

for

_k

_A

or

_k

_R

« _q.

2.1.4. Fixed effects

In

_fitting

a RR model it is

_generally

assumed that

_systematic

differences

in _ageare taken into account

_by

the fixed effects in the model of

_analysis.

In

most cases, these include a fixed

_regression

of the same form as the random

regression

(e.g. (9, 11, 12!),

which can be

thought

of as

modelling

the

population

trajectory,

while the random

_regressions

for each animal

_represent

individuals’ deviations from this curve.

2.2. REML estimation

Considering

all

_{animals, equation}

₍₃₎

can be written in matrix form as

with _ythe vector of N observations measured on

_N

_D

_animals,

b the vector

(7)

coefficients

_(N

_A

_>

_N

_D

_denoting

the total number of animals in the

_analysis,

including

parents

without

_records),

y the vector of

_k

_R

x

_N

_D

_permanent

environmental random

_{regression coefficients,}

e the vector of N measurement

errors, and

X,

Z* and

Z

z

denoting

the

_{corresponding ’design’}

matrices. Here

_ZD

is the non-zero

_part

of Z*

_(for

_k

_A

=

h

R

),

_i.e.the

part

of Z*

corresponding

to animals in the data. The

_superscript

’*’ _{marks matrices}

incorporating

orthogonal polynomial

coefficients.

_Assuming

y is ordered for

animals,

ZD

is

_{blockdiagonal,}

the block for animal i is of

_{dimension q}

_i

x

_k

_R

_,

and has elements

_øm(tij).

Note that each observation

_gives

rise to

k

R

(or

k

A

for

_Z

_*

₎

non-zero elements rather than a

_single

element of 1 in the

_usual,

finite-dimensional

model,

i.e. the

_design

matrices are

_considerably

denser than in the latter case.

Let

K

A

with elements

_K

_Amt

=

Cov(am, a

l

)

and

K

R

with elements

K

Rml

=

Cov(

7 ,,,,,y

i

)

denote the coefficient matrices for the additive

_genetic

and

per-manent environmental covariance functions A and

R, respectively.

In terms

of

_analysis,

this is

_analogous

to

_treating

RR coefficients as correlated ’traits’.

Assume that the fixed

part

of the model accounts for

_systematic

_age

_effects,

so that a N

N(0, K

A

0

_A)

and y -

N(O,

K

R

0

I

ND

)

’

and that a and y are

uncorrelated. For

_generality,

let

_V(

_E

₎

=

_R,

but _assumeR _is

blockdiagonal

for animals with blocks

_equal

to submatrices of the _q_x

_{q matrix Sg.}

The mixed model matrix

_pertaining

to

_equation

₍₅₎

is then

where A is the numerator

_relationship

matrix between

_animals,

_IN

is an

identity

matrix of size

N,

and Q9 denotes the direct matrix

_product.

M* has

N

F

+

kAN

A

+

k

R

N

D

+ 1 rows and columns

_(with

_N

_F

_being

the total number of levels of fixed effects

_fitted),

i.e. its size and thus

_{computational requirements}

are

_proportional

to the order of fit of the CFs. For R =

o, 6 21, o!

can be be factored from

_M

_*

_,

_resulting

in a matrix which can be set _upas for a univariate

analysis.

Estimates of the distinct elements of

_K

_A

and

_K

_R

and the

parameters

determining

E

can be obtained

_{by REML,}

_applying

_{existing procedures}

for multivariate

_analyses

under a ’finite’ model. This _mayinvolve a

_simple,

derivative-free

_algorithm

_!21)

_or,more

_efficiently,

a method

_utilizing

information from derivatives of the

_likelihood,

such as Johnson and

_Thompson’s

_[13]

’average

information’

_algorithm;

see Madsen et al.

_[20]

or

_Meyer

_[22]

for a

description

of the latter in the multivariate case.

While true measurement errors are

_generally

assumed to be

i.i.d.,

there _may be cases in which we need to allow for

_{heterogeneous}

variances or correlations

between

_{’temporary’}

environmental effects. This _{may, to}some

_extent,

com-pensate

for

_suboptimal

orders of fit for

_permanent

environmental or

_genetic

(8)

autocorrelation p for measurement errors

following

a

_stationary

time

_series,

for

which

_V(y)

is non-linear and for which derivatives are thus not

_{straightforward}

to evaluate. In these

instances,

a

_two-step

_{procedure combining}

a

derivative-free search

_(e.g.

a

quadratic

approximation)

for the ’difficult’

parameter(s)

with an _averageinformation

_algorithm

to maximize

_{log G}

with

_respect

to the ’linear’

_parameters

can be

_envisaged.

A similar

_strategy

has been

_employed

by Thompson

[32]

in

_estimating

the

_regression

on maternal

_phenotype

as well

as additive

_genetic

and environmental

components

of variance.

_{Alternatively,}

estimation _maybe carried out in a

_Baysian

framework

_using

a Monte Carlo

based

_technique,

see Varona et al.

_[33]

for an

_application

in a linear RR model.

Calculation of the

_log

likelihood

_(G)

_requires

_factoring

M* to calculate the

log

determinant of the coefficient matrix

_{(log [C}

_*

_[)

and the residual sums of

squares

(y’P

*

y)

(see

Meyer

[21]

for

details).

The likelihood is then

For i.i.d. measurement _errors,the error variance can be estimated

_directly

as

QE

₌

y’P

*

y/(N -

r(X)),

as for univariate

_analyses.

2.2.1. Extensions to other models

So far

_only

the case of a

_simple,

’univariate’ animal model has been

considered. More

_{complicated models,}

_however,

are

_readily

accommodated

in the framework described. For

instance,

additional random effects such as

maternal

_genetic

effects or litter effects can be taken into account

_analogously

by modelling

each as a series of random

regression

coefficients. Correlations between random

_effects,

_e.g.non-zero direct-maternal

_genetic

_covariances,

can

be modelled

_{by allowing}

for covariances between the

_{respective regression}

coefficients,

which then

_yield

a CF

describing

the covariance between random

effects over time.

Similarly,

’multivariate’ CF

_[24]

for series of measurements for different traits

(e.g.

height

and

_weight

measured at different

_times)

can be estimated

_simply

by fitting

sets of RR coefficients for each trait and

_allowing

for covariances between

_{corresponding}

sets for different traits. An

_{expectation-maximization}

type

algorithm

for a bivariate

analysis

under a RR model has

recently

been

described

_by

Shah et al.

_!30!.

As mentioned

_above,

a

_variety

of

_assumptions

about the structure of the

_{within-individual,}

_temporary

environmental

covari-ance matrices can be

accommodated;

_see,for

_instance,

Wolfinger

[36]

for a

description

of some

_commonly

used models.

2.3. Reduced rank covariance functions

For _qcorrelated

_{measurements,}

the information

_supplied

_(or

most of

_it)

can

_generally

be summarized as a set _{of k G}

_{q linear}

combinations. These

can be determined

_by

a

_singular

value

_{decomposition}

of the

_{corresponding}

(9)

eigenvalues

with the remainder

_(q-k)

_being

small or zero.

_Setting

the latter to

zero and

_{backtransforming}

(by

_pre-and

_{postmultiplying}

the

_diagonal

matrix of

eigenvalues

with the matrix of

_eigenvectors

and its

_transpose,

_{respectively)}

then

yields

a

_modified,

reduced rank covariance matrix. In

_estimating

covariance

matrices,

this could be used to reduce the number of

_parameters

to be estimated and thus

_sampling

variation. A

_{parameterization}

to the elements of the

_{eigenvalue decomposition}

and

_setting

_eigenvalues

k +

1, ... ,

q and

the

corresponding

eigenvectors

to zero would achieve this but reduce the number

of

_parameters

to be estimated

_only

for k <

_q/2.

_Though

not

perceived

for this

_explicit

_purpose,the

_’symmetric

coefficients’ CF model of

_[15]

_provides

an

alternative _wayof

_estimating

reduced rank covariance matrices

_[22].

As outlined

_{by Kirkpatrick}

et al.

_[15],

there is an

_equivalent

to the

eigen-value

_{decomposition}

of covariance matrices for covariance

_functions,

with a

corresponding

interpretation.

Estimates of the

_eigenvalues

of a CF fitted to

order k are

_simply

the

_eigenvalues

of the

_{corresponding,}

estimated matrix of coefficients

_(K).

_Similarly,

estimates of the

_{eigenfunctions}

of a

CF,

the

infinite-dimensional

_equivalent

to

_{eigenvectors,}

can be obtained from the

_eigenvectors

of K. Let vi denote the ith

eigenvector

of K with elements _Vij and

_0;(t

_*

₎

the

jth

order

_Legendre

_polynomial.

The ith

_{eigenfunction}

of the CF is then

_[15]

Note that

_0;(t

_*

₎

is not evaluated for any

particular

age, but includes

polynomi-als of the standardized _aget*.

_{Hence, *}

_i

is a

_{continuous, polynomial}

function

in t*. As discussed

_by

_Kirkpatrick

et al.

_[15],

_{eigenfunctions}

of

_genetic

CF

are

_especially

of

_interest,

as

they

_represent

_possible

deformations of the mean

(growth)

trajectory

which can be effected

_{by selection,}

while the

correspond-ing

eigenvalues

describe the amount of

_genetic

variation in that direction. In

particular,

the

_{eigenfunction}

associated with the

_{largest eigenvalue}

_gives

the direction in which the mean

_trajectory

will

_change

most

_rapidly.

Fitting

a CF to order k

_requires

_k(k

+

_1)/2

_{coefficients,}

i.e. covariances between random

_regression

_{coefficients,}

to be

_estimated,

and

gives

estimates of the first k

_{eigenfunctions}

and

_eigenvalues

of the CF. In some

_instances,

one or several

_eigenvalues

of the CF may be close to zero or small

_compared

to the other

_eigenvalues.

This

_implies

that we

_require

a kth order fit to

model the

_shape

of the

_(growth)

curve

_adequately,

but that a subset of m

directions

_{(= eigenfunctions)}

suffices. In other

_words,

we

might

obtain a more

parsimonious

fit of the CF

_by

_estimating

a reduced rank coefficient

_matrix,

forcing k -

m

_eigenvalues

of K to be zero.

Consider the

_{Cholesky decomposition}

of

_{K, pivoting}

on the

_{largest diagonal}

(10)

ith element of

D, di,

can be

interpreted

as the conditional variance of variable

i,

given

variables

_{1, ... ,}

i -1. A

_{reparameterization}

to the non-zero

_off-diagonal

elements of L and the

_diagonal

elements of D has been advocated for REML estimation of covariance

components

to remove constraints on the

_parameter

space or

_improve

rate of convergence in an iterative estimation scheme

[6,

18,

25!.

Other

parameterizations

in this

_context,

based on the

_{eigenstructure}

of the

covariance

_matrix,

have been considered

_by

Pinheiro and Bates

_!26!.

An alternative form of the

_Cholesky

decomposition

is K =

L

*

L

*

’

where

L *

has

_diagonal

_{elements 1*i}

=

!2.

L* is often

interpreted

as

K

1/2

.

The

eigenvalues

of the _powerof a matrix are

_equal

to the _powerof the

_eigenvalues

of the

matrix,

and the

_eigenvalues

of a

triangular

matrix are

_equal

to its

_diagonal

elements

_(5!.

_Hence,

the estimate of K can be forced to have rank m

by

assuming

elements

_d,,,,

₊₁

to

_d

_k

in

_equation

₍₉₎

are zero

(elements d

i

are assumed to be

in

_descending

_order).

This

_yields

a modified matrix

The

vectors l

i

corresponding

to the zero

i

d

are then not

_needed,

i.e.

K

+

is described

_by

km —

_{m(m — 1)/2}

_parameters,

m elements

d

i

and

(k 1)m

-m(m — 1)/2

elements of

_l

_ij

_(j

>

_i)

of the

_l

_i

_.

_Clearly,

this is not

_equivalent

to

fitting

a

(full rank)

CF to the order m

_(which

would involve

_m(m

+

_1)/2

parameters) -

for instance for k = 4 and _m= 2 _wefit _acubic

_regression

assuming

there are

only

two

independent

directions in which the

trajectory

is

_likely

to

_change,

while for k = _m= 2 _wefit _alinear

regression.

Strictly speaking,

equation

(6)

has to be of full rank.

Hence,

for

_practical

computations, d

i

are set to a small

_positive

value

(e.g.

10-

4 ).

Alternatively,

a

REML

_algorithm

which allows for a semi

_positive

definite covariance matrix

of random effects could be

_employed,

c.f. Harville

_[8]

or

Frayley

and Burns

!3!.

Obviously,

this

_{parameterization}

can also be used to estimate reduced rank covariance matrices for

finite-dimensional,

multivariate

_analyses.

3. APPLICATION

3.1. Material and methods

Meyer

and Hill

_[24]

fitted covariance functions to

_January

_weights

of 913 beef

cows,

weighed

from 2 to 6 years of age, 2 795 records in total with up to

five records per cow available. Their

_analysis

used _{age at}

_weighing

in _years

and fitted measurement errors and fixed effects for each _age

separately.

These

data were

_re-analysed

_using

the random

_regression

model and

_fitting

age at

weighing

in months.

_Analyses

were carried out

_using

program DxMRR

[23],

employing

a derivative-free

_algorithm

to maximize

_log

G.

There were a total of 22 _agesin the

_data,

_ranging

from 19 to 70 months.

Figure

1

_gives

the mean

_weight

and number of records for _{each age}class.

Anal-yses were carried out

_fitting

a

_separate

measurement error variance

_component

(11)

weighing

subclasses

₍₈₆

_levels),

_yearof birth effects

₍₁₆

_levels)

and a cubic

re-gression

on _ageat

_weighing.

The model for fixed effects was

_{’univariate’,}

i.e.

the effects were assumed to be similar for cows of all _ages.Additive

_genetic

and

_permanent

environmental covariance functions were fitted to the same

order

_throughout

_(k

_A

= k

R

=

k).

Orders of fit considered

ranged

from 1 _to6.

In

_addition,

the usual

_{’repeatability}

model’ was

fitted,

i.e. a CF model with

k = 1 and a

_single

measurement error

_variance,

assumed to be the same for all

ages.

For each order of

fit,

the number of non-zero

_eigenvalues

allowed for each coefficient matrix to be estimated was set to the same value

(r)

for

_K

_A

and

K

R

,

considering

values of r < k of 1 to 3. In several

_{instances, analyses}

resulted in estimates of

K

R

with one small

_eigenvalue.

In these _cases,the rank of

K

R

was reduced

_by

one. In _{every case,}this

_yielded

a further

_improvement

in

_log

G when

_continuing

the

_analysis,

i.e. earlier _convergencehad been to a false

maximum as the search

_procedure

had become ’stuck’ at the bounds of the

parameter

space.

In

_{general, analyses}

took a considerable time to _converge,

_{markedly longer}

for an order of fit of k than for a

_{comparable k-variate,}

finite-dimensional

analysis. Furthermore,

several restarts were

_required

for each

_analysis

before

likelihoods stabilized.

_Convergence

was

_especially

slow when

_attempting

to estimate

_{’unnecessary’}

_parameters,

i.e. an order of fit or rank of CF with one

(12)

3.2. Results

3.2.1. Likelihoods

Maximum likelihood values from all

_{analyses together}

with

_eigenvalues

of the estimated coefficient matrices and estimates of the measurement error

variances are summarized in table 7.

_Clearly,

a

_{repeatability}

model

(first line)

was

_{inappropriate}

in this _case,estimates of

o,2

(for

k =

₁₎

_{being considerably}

higher

for older cows than for

_2-year-old

cows.

Forcing

the rank r of an

estimated coefficient matrix - and thus CF - to be 1 is

_equivalent

to the

assumption

that all

_{corresponding}

correlations have a value of

unity.

For r = 1

and k >

_1,

the

_higher

order coefficients then model

_{heterogeneity}

of variances. For all orders of

_{fit, increasing}

r from 1 to 2

_{yielded significant}

increases in

_log

G.

_Increasing

r further to

_{3, however,}

did not raise

_log

G

_{significantly,}

increases of 5.33

_(k

=

5)

and 5.75

(k

=

6)

being

outweighed by

the number

of additional

_parameters

₍₆

and

8,

respectively).

For r =

_2,

log G

increased

significantly

until k reached 4. Estimated CFs for this model were

with

t

i

denoting

the ith standardized _age.When

_{fitting polynomials,}

however,

we should also examine an order of fit of k +

_2,

i.e. the next order of fit with the same

type

(even

versus

uneven)

of

_exponent,

as an

_{odd-degree polynomial}

is

_likely

to contribute little when an

_{even-degree polynomial}

fits the data

(4!.

Indeed,

while

_adding

a

_quartic

coefficient

(k = 5)

did not increase

_log

G

significantly

over a cubic

_polynomial

_(k

=

4),

fitting

_a

_quintic

_term

(k

=

6)

did.

Fitting

years rather than months of age as meta-meter for the

_{CFs, Meyer}

and Hill

_[24]

found

_{log G}

not to increase

_{significantly beyond}

k = 2. As shown in

figure

1,

there was

_only

a small

_spread

of _ageswithin each _{year, i.e.}the

_change

in scale was

expected

to have little effect on estimates. The model

_{employed by}

Meyer

and Hill

_!24!,

_however,

was

_{multivariate,}

i.e. fitted

_separate

fixed effects for each _yearof age.

Presumably,

the

stronger

trend in covariances observed in

this

_analysis

can be attributed to less variation

_being

removed

_by

fixed effects. In

_addition,

likelihood ratio tests were carried out

_considering

full rank CFs and the associated number of

_parameters.

3.2.2.

_Phenotypic

variation

(13)

(14)

estimates of

_QE

_overyears. Estimates for k > 3 were

_similar,

the cubic term for k > 4

causing

estimates for

_6-year-old

cows to rise

_sharply.

As shown in table

_I,

this was

_accompanied

_by

estimates of

_a;

₅

of zero, i.e.

presumably

to some extent

due to a restriction

_imposed

on the

parameter

space,

forcing

QE

>

0.

Except

for these last _age

classes,

estimates

_{agreed closely}

with those from a

finite-dimensional multivariate

_{analysis treating}

records at different _yearsof _ageas

separate

traits,

which were

_40.0,

_60.5,

_65.8,

67.3 and 71.0

_(kg)

for average ages of

_{20.3, 32.3, 44.3,}

56.2 and 68.2

_{months, respectively}

_[24].

3.2.3.

_{Eigenfunctions}

As for

_phenotypic

standard

_deviations,

estimates of the first

_eigenvalue

(table I!

and

_{corresponding}

_{eigenfunction}

of

A

_(figure

₃₎

were similar for orders _{of fit k > 3. Values of the}

_{eigenfunction}

for individual ages were

_roughly

proportional

to the

_genetic

variance for _{the age,}as determined from the

corresponding

estimate of

_genetic

CF

,A.

Positive values

_throughout

_imply

that selection for increased

_weight

at _{any age is}

_likely

to increase

_weight

at

all other ages. Estimates for the second

eigenvalue

and

_{-function, however,}

changed considerably

from k =

3,

4 to k =

5, 6,

in

particular

for r = 3. This

was

_accompanied

_by

a

_significant

increase in

_log

G from k = 4 _tok = 5 and

from k = 5 to k = 6 for r = 3 but _notr = 2.

Eigenvalues

of estimates of A and

R _upto orders of fit of k = 4 were similar to those

_{reported by}

_Meyer

and Hill

!24!,

suggesting

that the third sizeable

_eigenvalues

_appearing

or

k )

5

_might

be an artefact of the order of the

polynomial

function and the ’univariate’ fixed

(15)

3.2.4. Genetic covariances

Figure

4 shows genetic

covariances obtained from estimates of ,A for k =

3,

6

(r

=

3),

evaluated for the

range of ages

spanned

by

data. For k = 3 and k =

4,

the

_resulting

surfaces were _very

_{smooth, clearly following}

a

quadratic

and cubic

function,

while for

k >

5 surfaces became

_{’wiggly’, following}

the data more

closely.

A

_peak

in

_genetic

variance _{at 4 years}of age with a

_subsequent

decline

was consistent with estimates of

_672,

_{1277, 2 221, 722}

and 1988

_(kg

₂

₎

for _ages

_2,

3, 4,

5 and 6 years,

respectively,

from a

_{finite-dimensional,}

multivariate

_analysis

[24].

Corresponding

genetic

correlations for k = 4 and 6 are shown in

_figure

5.

For k =

4,

the surface _wassmooth with _a

_{’plateau’}

close _to

_unity

for ages

from 3 years

onwards,

and correlations between

weights

at _{2 years}and later

ages

decreasing

with

increasing

time between measurements. This _agreeswith

our

_{biological expectations}

for the trait under consideration. In

_contrast,

correlations for k = 5

_(not

_shown)

and k =

_6,

fluctuated

considerably,

especially

at the

_edges

of the

_{surface, greatly}

_{magnifying sampling}

variation in the

_genetic

covariances for these orders of

fit,

exhibited in

_figure

_lf.

3.2.5. Permanent environmental variation

As shown in

_figure

_6,

estimates of 7Z

_produced

_permanent

environmental variances

_which,

_except

for

_weights

at 6 _yearsof _age,increased

_considerably

less

over time than their

_genetic

_{counterparts.}

_Moreover,

surfaces were

considerably

smoother for

_higher

orders of fit

_(k

>

_4).

For the last _{ages in}the

_data,

estimates of the

_permanent

environmental variances increased

_{dramatically,}

(16)

and

_causing

the increase in estimated

_phenotypic

standard deviation observed above. While observation in the four _ageclasses concerned were more variable

than in the earlier _years,this _{appears to}be

_spurious

and can

_only

be attributed to smaller numbers of records and the

_large

influence data

_points

at the

(17)

4. DISCUSSION

There is a wealth of statistical literature on

_modelling

of

_’repeated

records’

in

_general,

and structured covariance matrices in

_particular

_(see,

for

_instance,

Lindsey

[17]

or Vonesh and Chinchilli

[34]

and their extensive

bibliographies).

Applications

in animal

_breeding,

_however,

have until

_recently

been limited to

the extremes of a

simple repeatability

model or an

unstructured,

multivariate

model. An added

_complication

of

_{quantitative genetic analyses,}

not shared

_by

other

fields,

is the fact that we want to

_partition

the between

_subject

variation into its

_genetic

and environmental

components.

Covariance function and random

_regression

models enable us to model the covariance structure of such records more

_adequately,

_alleviating

the

problems

associated with an

oversimplification

(repeatability model)

or an

over-parameterization

(multivariate

model)

for traits which

_{change gradually}

_along

some continuous

scale,

such as time.

Moreover,

each source of

variation, genetic

or

environmental,

can be modelled. While a

_regression

model is

usually thought

of in the context of

_modelling

trends in means, covariance functions have been

_perceived

to describe and smooth

_dispersion

matrices. When

regression

coefficients are treated as

_{random, however, they implicitly impose}

a covariance structure _amongthe observations. Thus the two

_approaches

_convergeand

(18)

equivalent

to a

_special

class of random

_regression

models. While not common

practice,

covariance functions can be formulated for the sources of variation modelled

_by

random

_regression

coefficients in any RR model.

Regression models,

fixed or

_{random, require}

some

_assumptions

about the

parametric

form of the

_{regression equation.}

_Regressing

on

_orthogonal

polyno-mials of time

_(or

any other

meta-meter)

in order to fit

Kirkpatrick

et al.’s

_[15]

CF

_model,

_only

invokes the

_assumption

that the

_trajectory

to be modelled can

be described

_by

such

_polynomials.

As shown

above,

in a maximum likelihood framework we can let the data determine the order of

_polynomial

fit

_required

and the rank of the CF which suffices.

Likelihood ratio tests carried out to determine the order of fit of covariance functions

_required

should account for the fact that we are at the

_boundary

of the

parameter

space and thus have ’non-standard’ conditions where the likelihood

ratio test criterion A does not follow a

X

Z

distribution

!29!,

when

_testing

whether

additional random

_regression

coefficients

_might

have zero variance. Stram and Lee

_[31]

considered tests of variance

_components

for

_longitudinal

data in a

mixed model.

_They

showed that the

_large

_sample

distribution of A is a 50:50 mixture of

x2

distributions with

q and

q + 1

_degrees

of

freedom,

respectively,

when we test the

_hypothesis

that a matrix D is _a_{q x}_q

_positive

definite matrix

against

the

_hypothesis

that it is a

(q

+

₁₎

x

(q

+

₁₎

matrix. Hence ’naive’

_tests,

assuming A

has a

x2

distribution

with

_degrees

of freedom

equal

to the number of

parameters

tested

_(q+1)

are too conservative. Stram and Lee

_[31]

_argue

_though

that for such

_simple

tests

_(of

one additional

parameter),

_resulting

biases are

likely

to be small. In

practical applications,

the error

probability

a has been

’doubled’ to account for this

_{conservatism,}

i.e. A has been contrasted

_against

X!+1,2Ü’

instead of

_X

_q

₊

_,,,,,

_[27].

Alternatively,

we can decide on an order of fit a

_priori

or decide

_only

to

use the first n

_{eigenfunction}

of the CF to describe the covariance structure.

This choice may

depend

on the

_{computational}

resources

_available,

or _maybe

influenced

_by

the

_knowledge

that

_higher

order

_polynomials

are notorious for

magnifying

small

_sampling

errors and

_{producing ’wiggly’}

functions. Even if this choice does not maximize the

likelihood,

we are

likely

to obtain more

appropriate

estimates than under the

_{simple repeatability}

model when this

clearly

does not

_apply,

both in terms of estimates of

_genetic

parameters

and

prediction

of

_breeding

values. Further research is

_required

to

_{develop guidelines}

as how to choose the ’best’ model for

_particular

situations.

Great care has to be taken when

_defining

and

_interpreting

RR models. As shown in the numerical

_example,

data

_points

at the extremes of the

_independent

variable can have a

big

influence on estimates of

_regression

coefficients, yielding

quite

erratic estimates of CF or measurement error variances. Various authors

estimated

genetic

parameters

for test

day

milk

_yields

of

dairy

cows

_fitting

a

random

_regression

model. In most _cases,

_{resulting heritability}

estimates were

high

at the

_beginning

and end of lactation and low in the middle of

lactation,

in marked contrast to

_previous

estimates under a ’finite-dimensional’

model,

and thus

_regarded

with

_justified

_scepticism

_{([10, 14];}

Van der Werf et

_al.,

unpublished).

This

_emphasizes

the

_sensitivity

of RR models to the effects and orders of CFs fitted. It has to be stressed that

judicious

modelling

of fixed and random

_effects,

careful choice of the orders of fit of

CFs,

and meticuluous

(19)

As

_emphasized

_by

_Kirkpatrick

et al.

_!15!,

_manyfamilies of functions can be

employed

in the estimation of CF - while the choice of

_orthogonal

function does not affect the estimate of the CF at the _{ages in}the

_data,

it does affect

_{interpolation. Following}

their

_choice,

we have used

_{Legendre polynomials.}

These, however,

generate

a

_weight

function with

_{comparatively}

_heavy

_emphasis

on records at the outer

_parts

of the interval for which

_they

are defined

(compared

to,

say,

weights following

a kernel from a Normal

distribution)

(!7!;

section

_3.3).

Other functions

_might

thus be more suitable.

_{Alternatively,}

some

_scaling

of the observed _agescan be

_envisaged,

for

_instance,

a

_logarithmic

transformation should reduce the

_impact

of few observations on _veryold animals.

5. CONCLUSIONS

Random

_regression

models

_provide

a valuable tool to model

_repeated

records in animal

_{breeding adequately,}

_especially

if traits measured

_change

_gradually.

They

allow covariance functions to be formulated which describe

_genetic

and environmental covariances _amongrecords over time.

_{Moreover, they}

_impose

a structure on covariance matrices.

_Fitting

_regressions

on

_orthogonal

polyno-mials of time

_{(or equivalent)}

we can estimate

_genetic

covariance functions as

suggested by Kirkpatrick

et al.

_!15!,

whose

_eigenvalues

and

_{eigenfunctions}

pro-vide an

_insight

into the way selection is

likely

to affect the mean

_trajectory

of the records considered and can be used to characterize differences between

populations,

e.g. breeds of animals.

6. SOFTWARE

A _programis available for the estimation of covariance functions

_by

_REML,

fitting

a random

_regression

animal

_model,

as described above. ’DxMRR’ is

written in Fortran 90 and is

_part

of the DFREML

_package,

version

_3.0;

see

_Meyer

[23]

for further details. This is contained on the CD-ROM of the Sixth World

Congress

on Genetics

Applied

to Livestock

_Production,

or can be downloaded from the DFREML home _pageat

http:

//agbu.une.edu.au/&dquo;kmeyer/dfreml.html.

ACKNOWLEDGEMENTS

This work was

_{supported through}

_grantsfrom the British

Biotechnology

and

Biological

Sciences Research Council

_(BBSRC)

and the Australian Meat Research

Council

_(MRC).

REFERENCES

[1]

Abramowitz

_{M., Stegun LA.,}

Handbook of Mathematical

_{Functions, Dover,}

New

York,

1965.

[2]

Anderson

S.,

Pedersen

_B.,

Growth and food intake curves for

_group-housed

(20)

[3]

Frayley C.,

Burns

P.J., Large-scale

estimation of variance and covariance

components, SIAM J. Sci.

Comp.

16

₍₁₉₉₅₎

192-209.

[4]

Graybill F.A.,

An Introduction to Linear Statistical Models. Volume I,

McGraw-Hill, Inc.,

New

_York,

1961.

[5]

Graybill F.A.,

Matrices with

_Applications

in

_Statistics,

2nd

_ed.,

Wadsworth

Inc.,

1983.

[6]

Groeneveld E., A

_{reparameterisation}

to

_improve

numerical

_optimisation

in

multivariate REML

_(co)variance

component estimation, Genet. Sel. Evol. 26

₍₁₉₉₄₎

537-545

[7]

Hardle

_{W., Applied Nonparametric Regression, Cambridge University Press,}

1990.

[8]

Harville

D.A.,

Maximum likelihood

_approaches

to variance _component estima-tion and related

problems,

J. Am. Stat. Assoc. 72

₍₁₉₇₇₎

320-338.

[9]

Henderson C.R.

Jr.,

Analysis

of covariance in the mixed model:

_{Higher-level,}

non-homogeneous

and random

regressions,

Biometrics 38

₍₁₉₈₂₎

623-640.

[10]

Jamrozik

J.,

Schaeffer

L.R.,

Estimates of

genetic

parameters for a test

_day

model with random

regressions

for

production

of first lactation

Holsteins,

J.

Dairy

Sci. 80,

1997)

762-770.

[11]

Jamrozik

J.,

Schaeffer L.R., Dekkers

J.C.M.,

Genetic evaluation of

dairy

cattle

_using

test

_{day yields}

and random

_{regression model,}

J.

_Dairy

Sci. 80

₍₁₉₉₇₎

1217-1226.

[12]

Jennrich

R.L,

Schluchter

M.D.,

Unbalanced

repeated-measures

models with structured covariance _matrices,Biometrics 42

₍₁₉₈₆₎

805-820.

[13]

Johnson

D.L., Thompson

R., Restricted Maximum Likelihood estimation of variance components for univariate animal models

_using

sparse matrix

techniques

and

average

information,

J.

Dairy

Sci. 78

(1995)

449-456.

[14]

Kettunen

_{A., Mdntysaari E.,}

Stranden

_L,

P6s6

J.,

Genetic _parametersfor test

_day

milk

_yields

of Finnish

_Ayrshires

with random

_{regression model,}

J.

_Dairy

Sci. 80

_{Suppl. 1,}

197

₍₁₉₉₇₎

_(abstr.).

[15]

Kirkpatrick M.,

Lofsvold

_D.,

Bulmer

_{M., Analysis}

of the

_inheritance,

selec-tion and evolution of

_{growth trajectories,}

Genetics 124

₍₁₉₉₀₎

979-993.

[16]

Kirkpatrick M.,

Hill

_{W.G., Thompson R., Estimating}

the covariance struc-ture of traits

_{during growth}

and

aging,

illustrated with lactations in

_{dairy cattle,}

Genet. Res. 64

₍₁₉₉₄₎

57-69.

[17]

Lindsey J.K.,

Models for

_Repeated

Measurements, Oxford Statistical Science

Series,

Clarendon

_{Press, Oxford,}

1993.

[18]

Lindstrom

_M.J.,

Bates

_{D.M., Newton-Raphson}

and EM

_algorithms

for linear mixed-effects models for

_{repeated-measures data,}

J. Am. Stat. Assoc. 83

₍₁₉₈₈₎

1014-1022.

[19]

Longford N.T.,

Random Coefficient

Models,

Oxford Statistical Science

Series,

Clarendon

Press, Oxford,

1993.

[20]

Madsen

P., Jensen J., Thompson

R., Estimation of

_(co)variance

components

by

REML in multivariate mixed linear models

_using

_averageof observed and

expected

information,

Proc. 5th World

_Congr.

Genet.

_Appl.

Livest. Prod. Vol. 22

₍₁₉₉₄₎

19-22.

[21]

Meyer K., Estimating

variances and covariances for multivariate animal models

_by

Restricted Maximum

_Likelihood,

Genet. Sel. Evol. 23

₍₁₉₉₁₎

67-83.

[22]

Meyer K.,

An

_{&dquo;average}

information&dquo; Restricted Maximum Likelihood

algo-rithm for

_estimating

reduced rank

_genetic

covariance matrices or covariance functions

Estimating covariance functions for longitudinal data using a random regression model

HAL Id: hal-00894207

https://hal.archives-ouvertes.fr/hal-00894207

Submitted on 1 Jan 1998

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Estimating covariance functions for longitudinal data

using a random regression model

Karin Meyer

To cite this version:

Original

article

Estimating

covariance functions

for

longitudinal

data

using

a

random

regression

model

Karin

Meyer

Cell,

Population Biology, Edinburgh University,

Road, Edinburgh

3JT, Scotland,

(Received

August 1997; accepted

1998)

genetic

repeatedly

along

scale,

time,

directly

by

equivalence

regression

By

regressing

random, orthogonal polynomials

variable,

regression

parameterisation

restricted,

facilitating

highly

parsimonious description

procedure

application

weight

@

Inra/Elsevier,

/

/

/

/

séquence

génétique

génétiques

plusieurs

long

continue,

s’appuie

partir

l’équivalence

régression

figurant

_longitudinal

_using

_Meyer

_{Population Biology, Edinburgh University,}

_{Road, Edinburgh}

_{August 1997; accepted}

₁₉₉₈₎

_genetic

_repeatedly

_time,

_directly

_by

_equivalence

_regression

_By

_{random, orthogonal polynomials}

_variable,

_{parameterisation}

_restricted,

_facilitating

_highly

_application

_weight

_/

_/

_/

_génétique

_génétiques

_plusieurs

_long

_continue,

_s’appuie

_partir

_figurant

_régression

_{polynômes orthogonaux}

_temporelle.

_{paramétrage qui}

_possible

_description

_reprints:

_{Breeding Unit, University}

_exemple

©

_/

_/

_recognized

_genetic

_phenotypic

_longitudinal

_typically

_They

_especially

_changing

_repeated

_completely

_weights

_concept

_readily

_traditional,

_analysis

_indicates,

_higher

_implies

_fitting

_polynomial

_by

_polynomials.

_{finite-dimensional,}

_analysis

_equivalent

_equal

_generated

_by

_equal

_{conventional,}

_practice,

_however,

_parameters