An 'average information' restricted maximum likelihood algorithm for estimating reduced rank genetic covariance matrices or covariance functions for animal models with equal design matrices

(1)

HAL Id: hal-00894158

https://hal.archives-ouvertes.fr/hal-00894158

Submitted on 1 Jan 1997

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

An ’average information’ restricted maximum likelihood

algorithm for estimating reduced rank genetic

covariance matrices or covariance functions for animal

models with equal design matrices

K Meyer

To cite this version:

(2)

Original

article

An

_’average

information’ restricted

maximum

likelihood

_algorithm

for

_estimating

reduced rank

_genetic

covariance matrices

or

covariance functions for animal

models

with

_{equal design}

matrices

K

_Meyer

Animal Genetics and

Breeding Unit, University of New England,

Armidale,

NSW 2351, Australia

(Received

21

_{May 1996; accepted}

17

_January

₁₉₉₇₎

Summary - A

quasi-Newton

restricted maximum likelihood

_algorithm

that

_approximates

the Hessian matrix with the average of observed and

expected

information is described for the estimation of covariance _componentsor covariance functions under a linear mixed

model. The

computing

strategy outlined relies on _sparsematrix tools and automatic differentiation of a _matrix,and does not

require

inversion of

_large,

_{sparse matrices.}For the

special

case of a model with

_only

one random factor and

_{equal design}

matrices for all

traits,

calculations to evaluate the

likelihood,

first and

’average’

second derivatives can be carried

out trait

_{by trait, collapsing computational requirements}

of a multivariate

_analysis

to those of a series of univariate

_analyses.

This is facilitated

_by

a canonical

_{decomposition}

of the

covariance matrices and

_{corresponding}

transformation of the data to new, uncorrelated traits. The rank of the estimated

_genetic

covariance is determined

_by

the number of non-zero

_eigenvalues

of the canonical

_{decomposition,}

and thus can be reduced

by fixing

a

number of

_eigenvalues

at zero. This limits the number of univariate

_analyses

needed to

the

required

rank. It is

_particularly

useful for the estimation of covariance function when

a

potentially large

number of

highly

correlated traits can be described

_by

a low order

polynomial.

REML

_/

average information

/

covariance components

/

reduced rank

_/

covariance function

_/

_equal_designmatrices

Résumé - Algorithme de maximum de vraisemblance _restreint,basé sur l’« informa-tion moyenne », pour estimer les matrices de covariance génétique ou les fonctions de covariance de rang partiel, dans les modèles animaux avec matrices d’incidence identiques. On décrit un

_algorithme

de maximum de vraisemblance restreint de _type

quasi-Newton,

qui

approche

la matrice Hessienne par la moyenne de

l’information

observée

(3)

les outils de traitement des matrices creuses et sur la

différentiation

automatique d’une

matrice, sans nécessiter d’inversions de

_grandes

matrices creuses. Dans le cas

_particulier

d’un modèle avec un seul

facteur

aléatoire et une matrice d’incidence

identique

pour

tous les

_caractères,

les calculs de la

_{vraisemblance,}

de ses dérivées

_premières

et secondes

« moyennes» peuvent être

_effectués

caractère par

caractère,

ce _quiramène les besoins de

calcul liés à une

_analyse

multivariate au niveau de ceux d’une série

_d’analyses

univariates.

Ceci est rendu

_possible

par la

décomposition

canonique des matrices de covariance à _partir

de la

_{transformation}

des données en caractères _nouveaux,non corrélés entre eux. Le _rang

de la matrice de covariance

_génétique

estimée est déterminé par le nombre de valeurs propres non nulles de la

décomposition canonique,

et donc peut être réduit

quand

on

fixe

à zéro certaines valeurs propres. Le nombre

d’analyses

univariates est ainsi

_égal

au _rang.

Ceci est

particulièrement

utile pour l’estimation de la

fonction

de covariance, qui décrit les covariances entre un très

grand

nombre de caractères très corrélés par l’intermédiaire

d’un

polynôme

d’ordre

inférieur.

REML

_/

information moyenne

/

composantes de covariance

/

fonction de covariance

_/

rang partiel

INTRODUCTION

Estimation of

_(co)variance

components

by

restricted maximum likelihood

_(REML)

fitting

an animal model to date is

_mainly

carried out

_using

a derivative-free

(DF)

algorithm

as

_initially

_proposed

_by

Graser et al

_(1987).

While this has been found to be slow to _{converge, especially}for multi-trait and

multi-parameter analyses,

it

does not

_require

the inverse of a

_large

matrix and can be

implemented efficiently

using

sparse matrix

storage

and factorisation

_techniques,

_making

it

_{computationally}

feasible for models

involving

tens of thousands of animals.

Recently

there has been renewed interest in

_algorithms

_utilising

derivatives of the likelihood function to locate its maximum. This has been furthered

by

technical

advances,

making

computations

faster and

_allowing

_larger

and

larger

matrices to be stored.

_Moreover,

the

_rediscovery

of Takahashi et al’s

₍₁₉₇₃₎

_algorithm

to invert

large sparse

matrices has removed most of the constraints on

_algorithms

_imposed

previously by

the need to invert

_large

matrices.

In

_particular,

_’average

information’

_(AI)

_REML,

a

_quasi-Newton

_algorithm,

which

_requires

first derivatives of the likelihood but

_replaces

second derivatives with the average of the observed and

expected information,

described

by

Johnson and

_Thompson

₍₁₉₉₅₎

has been found to be

_{computationally highly advantageous}

over DF

procedures.

It is well

_recognised

that for several correlated

_traits,

most information available is contained in a subset of the traits or linear combinations thereof. This subset

is the smaller the

_higher

the correlations between traits. More

_technically,

several

eigenvalues

of the

_{corresponding}

covariance matrix between traits are _verysmall

or zero. If a modified covariance matrix were obtained

_{by setting}

all small

eigen-values to zero and

backtransforming

to the

_original

scale

_(using

the

_eigenvectors

corresponding

to non-zero

eigenvalues),

it would have reduced rank.

There has been interest in reduced rank covariance matrices in several areas.

(4)

covariance matrix and

_exploiting

a transformation to canonical scale.

Kirkpatrick

and Heckman

₍₁₉₈₉₎

introduced the

_concept

of ’covariance

_{functions’, expressing}

the covariance between traits as a

_higher

order

_polynomial

function.

_Polynomials

can be fitted to full or reduced order. In the latter _case,the

resulting

covariance

matrix has reduced

rank, ie,

a number of zero

_eigenvalues

(Kirkpatrick

et

_al,

_1990).

The covariance function

_(CF)

model was

developed

with the

analysis

of ’traits’

with

_{potentially infinitely}

_many

_repeated,

or almost

_repeated

records in

mind,

where the

_phenotype

or

_genotype

of individuals is described

by

a function rather

than a finite number of measurements

_(Kirkpatrick

and

_Heckman,

_1989).

A

_typical

example

is the

_growth

curve of an animal.

_Hence,

in essence, CFs are the

infinite-dimensional

equivalent

of covariance matrices.

_Analysis

under a CF model

_implies

that coefficients of the CF are estimated rather than individual covariances as under

the usual

_{multivariate,}

’finite’ linear

_model;

see

_Kirkpatrick

et al

₍₁₉₉₀₎

for further

details.

While it is

_possible

to

_modify

an estimated covariance matrix to reduce its rank

_(as

done

_by

_Kirkpatrick

et

_al,

_1990,

_1994),

it would be

_preferable

to

_impose

restrictions on the rank of covariance matrices

_{’directly’}

_during

(REML)

estimation.

Ideally,

this could be achieved

_by

_increasing

the order of fit

_(ie,

rank

_allowed)

sequentially

until an additional non-zero

eigenvalue

does not

_{significantly}

increase the likelihood.

Conceptually,

this could be

_implemented

_{simply by}

_{reparameterising,}

to the

eigenvalues

and

_{corresponding}

_eigenvectors

of a covariance

_matrix,

and

fixing

the

required

number of

_eigenvalues

at zero. Practical

_applications

of such

reparameter-isations,

however,

have been restricted to

simple

animal models with

_equal

_design

matrices for all

_traits;

see Jensen and Mao

(1988)

for a review. For

these,

a

canon-ical

_{decomposition}

of the

_genetic

and residual covariance matrix

_{together yields}

a

transformation to uncorrelated variables with unit residual

_variance,

_leaving

the number of

_parameters

to be estimated

_unchanged

_(for

full

_rank).

Meyer

and Hill

₍₁₉₉₇₎

described how REML estimates of CFs _or,more

_precisely,

their coefficients could be obtained

_using

a DF

algorithm through

a

simple

repa-rameterisation of the variance

_component

model.

_{However, they}

found it slow to

converge for orders of fit

greater

than three or four.

Moreover,

for simulated data sets the DF

_algorithm

failed to locate the maximum of the likelihood

_accurately

in several

_instances,

_especially

if CFs were fitted to a

higher

order than simulated.

This _{paper reviews}an AI-REML

_algorithm

for the

_general,

multivariate _case,

presenting

a

_computing

_strategy

that does not

_require

_sparsematrix inversion.

Sub-sequently,

simplifications

for the

_special

case of a

_simple

animal model with

equal

design

matrices for all traits are considered. Additional reductions in

computational

requirements

are shown for the estimation of reduced rank

_genetic

covariance

ma-trices or reduced order CFs.

THE GENERAL CASE Model of

_analysis

(5)

with _y,

_13,

u and e

_denoting

the vector of

_{observations,}

fixed

_effects,

random

effects and residual errors,

respectively,

and X and Z are the incidence matrices

pertaining

to

₍₃

and u. Let

V(u)

=

G,

V(e)

= R and

Cov(u,e’)

=

0,

_sothat

V(y)

= V =

ZGZ’

+

R.

For an animal

model,

u

_always

includes the vector of animals’ additive

_genetic

effects

_(a).

In

_addition,

it _maycontain other random

_effects,

such as animals’

maternal

_genetic

_effects,

_permanent

environmental effects due to the animal or

its

_dam,

or common environmental effects such as litter effects.

Let

_E

_A

=

{a

Ai

j}

’

denote the t x t matrix of additive

_genetic

covariances. For

u = a this

_gives

G

=

E

A

_{® A}where A is the numerator

relationship

matrix and 0

denotes the direct matrix

product.

If other random effects are

fitted,

G is

_expanded

correspondingly;

see

_Meyer

(1991)

for a more detailed

_description.

_{Assuming y is}

ordered

_according

to traits within animals

where N is the number of animals that have

_records,

and

2::

+

denotes the direct matrix sum

_{(Searle, 1982).}

Let

_E

=

{!E!!

be

the matrix of residual covariances between traits. For

_{t traits,}

there are a total of W =

2t -

_{1 possible}

combinations of

traits recorded

_(assuming

_single

records _per

_trait),

_eg,W = 3 for t = 2. For animal

i with combination of traits w,

R

i

is

equal

to

_!Ew’

the submatrix of

E

obtained

by

deleting

rows and columns

_pertaining

to

_missing

records.

Average

information REML

Assuming

a multivariate normal

_{distribution, ie,}

_y_N

_{N(Xb, V),}

the

_log

of the REML likelihood

_(G)

is

_(eg,

_Harville,

₁₉₇₇₎

where X* denotes a full-rank submatrix of

_X,

and

Let e denote the vector of

parameters

to be estimated with

_{elements O}

_i

for i =

_{1, ... , p.}

Derivatives of

(6)

The latter is

_commonly

called the observed

_information.

It has

_expectation

For V linear

_{in 9,}

_a2V/8Bi8B!

=

_0,

and the

average of observed

[5]

and

expected

[6]

information

is

_(Johnson

and

_Thompson,

₁₉₉₅₎

The

_right

hand side of

_[7]

is

_(except

for a scale

_factor)

equal

to the second derivative of

_y’Py

with

_respect

_{to B}

_i

and

_{0j ,}

_ie,

_{the average information}is

_equal

to the data

part

of the observed information.

REML estimates of e can then be obtained

_{by substituting}

the average infor-mation matrix for the Hessian matrix in a suitable

_optimisation

scheme which uses

information from second derivatives of the function to be

_maximised;

see

_Meyer

and Smith

₍₁₉₉₆₎

for a detailed discussion of

_{Newton-Raphson-type algorithms}

in this context.

Calculation of the

_log

likelihood

Calculation

_{of log G}

_pertaining

to

_[1]

has been described in detail

_{by Meyer}

_(1991).

It relies on

_rewriting

[3]

as

_(Graser

et

_{al, 1987; Meyer,}

₁₉₈₉₎

where C is the coefficient matrix in the mixed model

_equations

_(MME)

for

_{[1] (or}

a full rank submatrix

thereof).

The first two

_components

of

_{log L}

can

_usually

be evaluated

_{indirectly, requiring}

only

the

_log

determinants of matrices of size

_equal

to the maximum number of records or effects fitted per animal. For u = a

where

_N

_A

denotes the number of animals in the

_analysis

_(including

_parents

without

records).

log

IAI

is a constant and can be omitted for the

_purpose

of

_maximising

log

,C.

_Similarly,

with

_N

_w

_denoting

the number of animals

_having

records for

combination of traits w

The other two terms in

_[8],

_log

_{ICI and y’Py,}

can be determined in a

_general

way for all models of form

!1!.

Let M

(of

size M x

M)

denote the mixed model

(7)

right

hand sides

_(r)

and a

_quadratic

in the data vector

A

_Cholesky

_{decomposition}

of M

_gives

M =

LL’,

with L a lower

_triangular

matrix with

_{elements l}

_ij

_(l

_ij

= 0

for j

_>

i),

and

Factorisation of M for

_large

scale animal model

_analyses

is

_{computationally}

feasible

_through

the use of _{sparse matrix}

_techniques;

see, for

instance,

George

and Liu

_(1981).

Calculation of first derivatives

Differentiating

[8]

gives

partial

first derivatives

Analogously

to the calculation of

_{log £}

the first two terms in

_[14]

can

_usually

be determined

_indirectly

while the other two terms can be evaluated

extending

the

Cholesky

factorisation of the MMM

_(Meyer

and

_Smith,

_1996).

Let

_D!

=

å’5:.

A/

å()

i

be a matrix whose elements are 1 if

B

i

is

_equal

to the klth element of

E

A

and zero otherwise.

Further,

let

6 ki

denote Kronecker’s

Delta, ie,

6 kl

= 1 for k = and zero

_otherwise,

and

QA

l

denotes the klth element of

EA

1 .

For

oj

=

a A kl

Similarly,

with

_D!

=

BE

Ew/

8Bi

and

a’lL

the klth element of

_Y.

_l

_¿

while all other first derivatives of

_log

_!G!

and

_log

_!R!

are zero.

Smith

₍₁₉₉₅₎

describes a

procedure

for automatic differentiation of the

Cholesky

decomposition.

In _essence,it is an extension of the

Cholesky

factorisation which

gives

not

_only

the

_Cholesky

factor of a matrix but also its

_{derivatives, provided}

the

_{corresponding}

derivatives of the

_original

matrix can be

specified.

In

particular,

(8)

It involves

_computation

of a lower

_triangular

matrix F. This is initialised to

10f(L)Ial

ijl

.

On

completion

of the backwards

differentiation,

F contains the

derivatives of

_f(L)

with

_respect

to the elements of M. Smith

₍₁₉₉₅₎

states that the calculation of F

_(not

_including

the work needed to

_compute

_L)

_requires

about twice as much work as one likelihood evaluation. Once F has been determined first

derivatives of

_{f (L)}

can be obtained one at a time as

tr(F<9M/<9!t),

_ie,

only

one

matrix F is

_required.

Meyer

and Smith

₍₁₉₉₆₎

describe a REML

algorithm utilising

this

technique

to determine first and

_(observed)

second derivatives of

_log

G for the case considered

here. For

_{f (L)}

=

log

I C +

y’Py,

the scalar is _afunction of the

diagonal

elements of L

_(see

_[12]

and

_[13]).

_Hence,

_{8f(L)/8l

_ij

_}

is a

_diagonal

matrix with elements

n¡¡

1 for

i =

_{1, ... , M -}

1 and

21,!,1,!

in row M.

The non-zero derivatives of M have the same structure as the

corresponding

_part

(data

versus

pedigree)

_part

of M

As outlined

_above,

R is

_{blockdiagonal}

for animals.

_Hence,

matrices

_{aR -}

₁

_{/ alJ}

_Ekl

have submatrices

_-E-’Do!E-1

_ie,

derivatives of M with

respect

to residual

(co)variances

can be

_{set up}

in the same _wayas the ’data

_part’

of M.

The

_strategy

outlined for the calculation of first derivatives of

_{log L}

does not

_require

the inverse of the coefficient matrix C. In

_contrast,

Johnson and

Thompson

(1994, 1995)

and Gilmour et al

₍₁₉₉₅₎

for the univariate _case,and Madsen et al

₍₁₉₉₄₎

and Jensen et al

₍₁₉₉₅₎

for the multivariate case derive

expressions

for

_8logG/aB

_i

based on

_[4],

which

_require

selected elements of

C-

1 .

Their scheme is

_{computationally}

feasible

_owing

to the _{sparse matrix inversion} method of Takahashi et al

_(1973).

Misztal

₍₁₉₉₄₎

claimed that each sparse matrix inversion took about two to three times as

_long

as one likelihood

_evaluation,

_ie,

computational requirements

for both alternatives to calculate first derivatives of

log C appear

comparable.

Calculation of the _averageinformation Define

For

_B

_i

=

(9)

where

_In

is an

_identity

matrix of size _n,

Z.,,,

the submatrix of Z and amthe subvector of a for trait m,

ie, b

i

is

simply

a

_weighted

sum of solutions for animals in the data.

For O

i

=

(J&dquo; Ek¡

and 6 =

y -

Xb -

Zu the vector of residuals for

[1]

with subvectors

8 m

for m =

_1,...,

_t

Extension to models

_fitting

additional random effects such as litter effects or

maternal

_genetic

effects is

_{straightforward;}

see, for

instance,

Jensen et al

₍₁₉₉₅₎

for

corresponding expressions.

Using

!19!, [6]

can be rewritten as

Johnson and

_Thompson

₍₁₉₉₅₎

calculated vectors

_Pb

_j

as the residuals from

repeatedly solving

the mixed model

_{equations pertaining}

to

_[1]

with _y

_replaced

by

b

j

for j

=

1, ... , !.

On

completion,

[22]

could be evaluated as

simple

vector

products.

Alternatively,

define a matrix B =

[bi

I b

2 I ...

bp!.

Then consider the mixed model matrix with _y

_{replaced by}

_{B, ie,}

with the last row and column

(for

right

hand

_sides)

_{expanded to p}

rows and columns

Factoring

M

B

or,

equivalently,

’absorbing’

C into the last p rows and columns of

M

B

then overwrites

B’R-

1 B

with B’PB which has elements

_{{b’ iPb}

_j

_}

_(Smith,

1994 pers

comm).

With the

Cholesky

factorisation of C

already

determined

(to

calculate

_log

_G),

this is

_{computationally undemanding.}

EQUAL

DESIGN MATRICES

For a

_simple

animal model with all traits recorded at the same or

_{corresponding}

times,

design

matrices are

equal, ie,

[1]

can be rewritten as

Meyer

(1985)

described a method of

_scoring

_algorithm

for this case,

exploiting

a canonical transformation to reduce a t-variate

_analysis

to

_{t corresponding}

univariate

analyses.

For

_E

_positive

definite and

_E

_A

_positive

_{semi-definite,}

there exists a matrix

Q

(10)

Transforming

the data to

then

_yields

t new, ’canonical’ traits which are uncorrelated and have unit residual

variance. This makes the

corresponding

coefficient matrix in the MME

blockdiag-onal for

_{traits, ie,}

Meyer

(1991)

described how the

log

likelihood

_(on

the

_original

_scale)

in this case can be

computed

trait

by

trait as the sum of univariate likelihoods on the canonical

scale

_plus

an

_adjustment

for the transformation

(last

term in

!29!)

with

_y*

the subvector

of y

*

for trait i and

_P*

the ith

_diagonal

block of the

projection

matrix on the canonical scale P*

which,

like

C

*

,

is

blockdiagonal

for traits.

Terms

_required

in

_[29]

can be calculated

by

_setting

_upand

factoring,

as described

above,

univariate MMM

_(on

the canonical

_scale),

_{Mi ,}

of size

M

o

=

(M -

1)/t

_{+ 1}

each

Moreover,

all first derivatives of

log

G as _{well the average information}

_matrix,

both on the canonical

scale,

can be determined trait

by

trait.

First derivatives on the canonical scale

Consider the

_{parameterisation}

of

_Meyer

₍₁₉₈₅₎

where

₀

_*

_,

the vector of

parameters

on the canonical

scale,

has elements

Ag

and _wi!for i

< j =

1, ... , t,

ie, parameters

are the

(co)variances

on the canonical scale.

The

_log

likelihood on the canonical scale can be accumulated trait

by trait,

(11)

On the

_original

_scale,

L and F have the same

_sparsity

structure

_{(Smith, 1995).}

However,

while

_Ay

=

Wij = 0 for

_{given E}

A

and

E

,

the

corresponding

derivatives and estimates are

_not,

unless the maximum of the likelihood has been attained.

Hence,

while the

_off-diagonal

blocks of

_LC

are _zero,the

_{corresponding}

blocks of

F *

=

_{8f (L}

_*

_)/c7M

_*

are not.

It can be shown that both the

_diagonal

blocks of F*

_{corresponding}

to

_L*

_F*

and the row vectors

_{corresponding}

to

_{1*’, fi}

are

identical to those obtained

_by

backwards differentiation of

_L!.

In other

words,

first derivatives with

respect

to the variance

_components

on the canonical scale

_(!ii

and

_Wii

₎

can be obtained trait

_by

trait from univariate

_analyses.

Calculation of derivatives with

respect

to

_!2!

and

Wij

,

however, requires

the

off-diagonal

blocks of F*

corresponding

to traits i and

j,

_{FC_! .}

Fortunately,

as outlined in the

_A

_PP

_endix,

matrices

_FC.!

can be determined

indirectly

from terms

_{arising using}

the

_{Cholesky decomposition}

and backwards differentiation for individual traits on the canonical scale.

From

_[17]

and

_!18!,

first

_derivatives

of

_{f (L}

_*

₎

=

log

I C*

+

y

*

’P

*

y

*

are then

with

_MM

_F

the Mth

_diagonal

element of F*. For

_{f (L}

_*

₎

=

*

1

1 g1C

0 +

y

*

’P

*

y

*

,

F

M

= 1.

Other terms

_required

to determine the first derivatives on the canonical scale are

where G* and R* are the canonical scale

_equivalents

to G and

R,

respectively.

Average

information on the canonical scale

For G* = A _®

A,

R* =

,

tN

I

and thus V* =

Var(y

*

)

blockdiagonal

for

_traits,

[20]

(12)

ie,

vectors

_bi

are zero

_except

for subvectors for traits k and

l,

_bi

_k

= _siand

bi

l

= _{sk ,} with _Sj

_standing

in turn for

_A!Zoa!

and

_{8) .}

With P*

_{blockdiagonal}

for

_traits,

_0g

=

).,kl

_or_u_ki

(k x l)

and 0) =

A

mn

or umn

(m <

n),

this

_gives

Hence,

calculation of the average information on the canonical scale

_requires

all terms

_s’P*s

_j

for i

_{! j}

=

1,...,

2t and k =

_{1, t.}

These _canbe obtained trait

by

trait,

analogously

to the

procedure

described above for the

_general

case. After the

Cholesky

factorisation

_{of Mk has}

been carried

out,

solution

_{a* for}

animals’ additive

genetic

effects and residuals

_ek

_for

trait k are

_{obtained, storing}

the

Cholesky

factor

L*.

Define a matrix S of size N x 2t with columns

_equal

to vectors si. S is the canonical scale

_equivalent

to B above. Once all columns of S have been

evaluated,

set _upa matrix

- vi 1

-for each trait. This is the MMM for trait k on the canonical scale with

yk

replaced by

S. The matrix S’S has elements

_s’s

_j

which are the sum of _squares

and

_{crossproducts}

of the vectors of

_(weighted)

solutions and residuals.

_Absorbing

Ck

into

S’S

_(using

the stored matrix

_L*)

then overwrites this matrix with

_S’PkS

which has elements

_S!PkSj’

After all t traits have been

_processed

_bi!P*b!,

_(twice)

the average

information,

can be ’assembled’

according

to

_!38!.

Derivatives on the

original

scale

Let

H

*

,

with elements

_{2 bi ! P* b! ,}

and

g

*

,

with elements

_{810g £* /80g,}

denote the average information matrix and vector of

gradients

on the canonical

scale,

respectively. Corresponding

terms on the

original

scale are then

where J with elements

_<9!/c)!

is the Jacobian matrix of e with

respect

to

e

*

.

From

_[25]

and

_[26],

J has non-zero elements

(13)

Alternative

_{parameterisation}

The above

_{parameterisation requires}

_switching

between the

_original

and canonical scale

_(or

_accumulating

the

_{transformations).}

_{Alternatively,}

as

_performed

_by

_Meyer

(1991)

for a derivative-free

algorithm,

0 *

can be defined to have t elements

_A!

(i

=

1, ... , t)

and t

2

elements

q2!

(i, j

=

1, ... , t),

ie,

the

’genetic’

variances _on

the canonical scale and the elements of the transformation matrix

_Q.

This allows estimation to be carried out on the canonical scale.

_Moreover,

as outlined

_{by Meyer}

(1991)

for a derivative-free

_algorithm,

evaluation of

_log

G for

_given

Aii

requires

scalar calculations

_involving

the elements of

_{Q only,}

_{ie, maximising}

the conditional

log,C

(for

given

Àii

)

with

respect

to _q_ij is

_{computationally}

_inexpensive.

This allowed

the maximum of

_{log G}

to be located in a

_two-step

_procedure

with

_{computational}

requirements equivalent

to those of a

_{t parameter}

_(rather

than

_t(t

+

_{1) )}

_analysis.

While derivatives with

respect

to _q2!

_{(not shown)}

_require

the

_off-diagonal

blocks of

F,

derivatives with

_respect

to

_A,

do not

_(see

_!32!).

Hence a nested

_{maximisation,}

using

an _averageinformation

_step

to estimate

_parameters

_A

_zz

and a derivative-free

procedure

to maximize G with

_respect

to _q2!for

_{given A}

_zz

_{appears computationally}

advantageous

in this case.

REDUCED RANK COVARIANCE MATRICES

Forcing

the estimate of the

_genetic

covariance matrix

_E

_A

to have reduced rank

(k

A

<

_t),

results in a number of zero

_eigenvalues

on the canonical scale. In this

case, several terms

contributing

to

_log

G or its derivatives are constant or

_depend

on the canonical transformation

_(Q)

_only.

Thus

_they

need to be evaluated

_only

once _{per analysis, effectively reducing}the

_{computational}

_requirements

per round of iteration to those for univariate

_analyses

for the

_k

_A

non-zero

_eigenvalues.

Likelihood: for

Aii

!

_0,

contributions _to

_log

G

[29]

become

While the former is a

_constant,

determined

_by

the data structure

_only,

the latter

_depends

on the canonical transformation.

_However,

as

[45]

shows,

it can be calculated for _{any Q}from the

_{corresponding}

residual sums of _squares

(SS)

and

crossproducts

(CP)

on the

_original

scale. Let Y denote a matrix of size N x t with columns

_equal

to vectors of observations y2. Both

log

IX’X

O

and

quadratics

ym (I - Xo(XoXo)-Xo)

yn can then be determined

_{by factoring}

the matrix

(14)

the

_eigenvalues

for both canonical traits

_(A

_ii

and

_A

_jj

₎

are non-zero. This reduces the

number of

_Cholesky

decompositions

(of

matrices

_{MZ )}

and

_{corresponding}

backwards differentiations

_(to

obtain matrices

_F*

_and

vectors

_{f2 )}

to be carried out to the number of non-zero

_eigenvalues,

_k

_A

_.

_Moreover,

_only

k

A

(k

A

-1)/2

instead

of t(t-1)/2

off-diagonal

blocks

_Fc

_*

_{.. ’J}

need to be evaluated. This can reduce

computational

requirements

per

round

of iteration

dramatically.

Average

information:

as shown

above,

the _averageinformation matrix can be constructed from residual SS and CP in the vectors of random effects solutions and residuals. For

_A

_kk

=

0,

ak

= 0. Residuals on the canonical scale are then a linear combination of residuals on the

_{original scale,}

_ek

₌

!m-1

_q

_km8

_m

with

8 m

=

y&dquo;, -Xo(3&dquo;!.

Again

the latter need _tobe evaluated

only

_once.Terms

required

for the AI can then be obtained as

_before,

_replacing

_Ms

_k

in

_[39]

with

if

_A

_kk

= 0. For

multiple eigenvalues equal

_to

zero,

[47]

only

needs to be evaluated

once

(per iterate).

COVARIANCE FUNCTIONS

The

_{’infinitely-dimensional’}

model

Consider

_{’repeated’}

measurements taken at a number of _ages

(or equivalent),

with

potentially infinitely

many records. The covariance between records taken at _ages l and m can be

expressed

as

where

7is the covariance function and K with elements

_K

_ij

is the

pertaining

matrix of

_{coefficients,}

and a&dquo;, is the mth age, standardised to the interval for which the

polynomials

are defined. 4! is a matrix of

_{orthogonal polynomial}

functions evaluated at the

_given

ages with elements

_{!2! _}

<!(0t),

the

jth polynomial

evaluated for age

i,

and k denotes the order of

_polynomial

fit.

_Conceptually

k = _oo,but in

practice

k <

t for observations at

_{t ages.}

_Kirkpatrick

et al

_{(1990, 1994)}

_suggested

the use of the so-called

_{Legendre polynomials}

_(see

Abramowitz and

_Stegun,

₁₉₆₅₎

which span the interval from -1 to 1.

Note that

_polynomials

include a scalar

_{term, ie,}

for t records a full order fit involves terms to the power

0, ... , t -

1. From

_[48],

a covariance matrix can be rewritten as

(15)

Let

_E

_A

=

4l

A

K

A4

l

Q

for _anorder of fit

k

A

,

and

_K

_A

the matrix of coefficients for the

_{corresponding}

covariance function A.

_{Further, partition}

the error covariance matrix into

_components

_owing

to

_permanent

and

_temporary

environmental

_effects,

E

=

_E

_R

₊

_E,.

Under the ’finite

_model’,

these

_usually

_cannotbe

_disentangled

unless there are

_repeated

records for the same _age.Assume the latter

_represent

independent

’measurement

_{errors’, ie, E,}

=

Diag

{&OElig;;i

}

’

and that the former can be

described

_by

a

CF, R,

_{ie, E}

R

=

£

4l

K

R

4l

with order of fit

k

R

.

_Fitting

_measurement

errors

_{separately together}

with R to the order t - 1

_yields

an

_equivalent

model to

a full order fit for

E

.

Hence the maximum for

k

R

is t - 1 rather than t. General case

Estimates of the elements of the coefficient matrices of the covariance functions

(and

the measurement

_errors)

can be estimated

_by

REML

_{using algorithms}

for the multivariate estimation of covariance

_components

_together

with a

simple

reparameterisation.

Likelihood: as outlined

_{by Meyer}

and Hill

_(1997),

_{log L}

[9]

can be rewritten as

Under the CF

_model,

the vector of

_parameters

to be estimated is rlwith elements

K

A

,,

for i

_{# j = 1, ... k}

_A

_,

_KR!!

_{for i ! j}

₌

_{1, ... k}

_R

and

_&OElig;;i

_for

i =

_1,...,

t. For r

CFs to be

estimated,

it has minimum

_length

r + t

_(fitting

all

CFs to order 1 and

assuming

all

_&OElig;;i

to

be

_distinct)

and maximum

_length

_{rt(t+1)/2 (fitting}

all CFs to full

_order).

&dquo;

Derivatives: note that the elements of matrices 1)

_only

_depend

on the _agesat which records were taken and the

_polynomial

function chosen. For _anyvector t

l a

_{corresponding}

vector of covariance

components

(8)

can be calculated

_using

[49].

First and

_{’average’}

second derivatives of

_log

,C can thus be determined with

respect

to the elements of 8 as described above

(section

on estimation of covariance

components

in the

_general

_cases)

and then be transformed to the ’covariance function

_{scale’, using}

where J with elements

_OO

_i/

₀

_77j

is the Jacobian matrix of 0 with

_respect

to ₇₁_.From

(16)

for

77 i

=

<7!

and 0

j

=

aEmm’

Note that a reduced order fit for ,A

(k

A

<

_t)

_implies

a

_genetic

covariance matrix

E

A

of reduced rank. This _maylead to

_{computational}

_problems

when

_applying

’standard’

_methodology

to factor the MMM

_(such

as the

Cholesky decomposition

and its automatic

_{differentiation)}

as described above. For

_{practical applications,}

this can be overcome

by

_setting

the t -

k

A

zero

eigenvalues

to an

operational

_zero,

ie,

a small non-zero value such as

10-

5

or

10-

6 .

As for the estimation of covariance

components,

extensions to other models

including

additional random effects are

_{straightforward.}

Equal

design

matrices

Assuming

measurement errors are

_greater

than _zero,

_E

=

E

R

₊

E

is

positive

definite,

regardless

of the order of fit for R.

_Hence,

the transformation to canonical scale exists and the

_log

likelihood can be determined as for the ’finite

_model’,

summing

terms from univariate

_analyses

on the canonical scale as described above. For the estimation of covariance

_components,

the canonical transformation doubled as a tool to reduce

_{computational requirements}

_(allowing

calculations to be carried out for one trait at a

time)

and as a

_{reparameterisation,}

_ie,

the likelihood

was maximised with

respect

to the

_eigenvalues

and elements of

_eigenvectors

of the canonical

decomposition

of

E

A

and

E

.

For the CF model there are

three matrices to be

_{considered, K}

_A

and

_K

_R

_,

or

_E

_A

and

_E

_R

derived from

them,

and

_E

_.

_{Decompositions}

_{diagonalising}

several matrices exist. Lin and Smith

(1990),

for

_instance,

used the common

_principal

_components

_algorithm

of

_Flury

and Constantine

₍₁₉₈₅₎

as an

_equivalent

to the canonical

decomposition

for a

multivariate mixed model with several random effects.

However,

this

required

the matrices to be transformed to be

_{positive definite,}

and is thus not suitable for this

application.

Hence

_only

the first

_{parameterisation}

described

_{above, namely}

0 *

_having

ele-ments

_A

_ij

and _Wij

_{for i < j}

=

_{1, ... ,}

_t

is suitable for the estimation of

CFs.

First derivatives and average information on the canonical scale can then be determined

and transformed back to the

_original

scale as described above. The Jacobian J in this case has non-zero elements

(17)

DISCUSSION

A

_strategy

has been outlined for

_computing

the

_log

likelihood in a multivariate mixed

_{model, together}

with its first and

_{’average’}

second derivatives with

_respect

to covariance

_components

or the coefficients of covariance functions. These can be used in a

(modified)

_Newton-type

estimation

_procedure

(Marquardt, 1963),

together

with an additional transformation to ensure estimates

to be within the

_parameter

_space;see

_Meyer

and Smith

₍₁₉₉₆₎

for details.

For a model with

_only

one random factor and

_{equal design}

matrices for all

_traits,

calculations can be carried out trait

_{by trait,}

_reducing

_{computational}

_requirements

to those of a series of univariate

_analyses.

This is feasible

_through

a canonical

decomposition

of the

_genetic

and residual covariance matrices and a

_{corresponding,}

linear transformation of the data to _new,uncorrelated variables.

In that case,

fitting

a covariance function to less than full order or

_forcing

es-timated

genetic

covariance matrices to be of reduced rank is

_equivalent

to

_fixing

eigenvalues

on the canonical scale at zero. Most terms

_pertaining

to zero

eigen-values then

_only

need to be calculated once _per

_{analysis, resulting}

in considerable reductions of

_{computational requirements.}

In _{essence, it}reduces work

_required

in each round of an iterative solution scheme to that

_equivalent

to

_k

_A

univariate

analyses.

This is

_particularly

useful for a

_{comparatively large}

number of

_highly

cor-related measurements where a covariance function of low order suffices to describe

the data

_adequately.

In

_contrast,

for

_analyses

where a transformation to canonical scale is not

_feasible,

the

_complete

t-variate MMM needs to be _{set up,}factored and

differentiated in each round of

_iteration,

even if CFs are

_only

fitted to order

_k

_A

_.

Making

computational

demands

_proportional

to the order of fit for the

_genetic

CF

_{(or matrix)}

_encouragesan

_{’upwards’}

_strategy:

_increasing

the order of fit one

_by

one allows a likelihood ratio test to be

_performed

at each

_step.

The minimum number of

_parameters

_describing

the data is found when the likelihood ceases

to increase

_{significantly}

when the order of fit is increased. It is envisioned that estimation of reduced order covariance matrices or functions will become the standard

_procedure

for

_{high-dimensional}

multivariate

_analyses.

ACKNOWLEDGEMENT

This work was

_supported

under _grantUNE35 of the Australian Meat Research

_Corporation

(MRC).

REFERENCES

Abramowitz

_M,

_Stegun

IA

₍₁₉₆₅₎

Handbook

of Mathematical

Functions.

_Dover,

New York

Flury B,

Constantine G

₍₁₉₈₅₎

_Algorithm

AS 211: The F-G

_{diagonalization algorithm.}

Appl Stat,

177-180

(18)

Gilmour

_{AR, Thompson R,}

Cullis BR

₍₁₉₉₅₎

_Average

information REML: an efficient

algorithm

for variance _parameterestimation in linear mixed models. Biometrics 51,

1440-1450

Graser

HU,

Smith

_SP,

Tier B

₍₁₉₈₇₎

A derivative-free

_approach

for

_estimating

variance components in animal models

_by

restricted maximum likelihood. J Anim Sci

_64,

1362-1370

Graybill,

FA

₍₁₉₆₉₎

Introduction to Matrices with

_Applications

in Statistics. Wadsworth

Publishing Company, Inc, Belmont,

California

Harville,

DA

₍₁₉₇₇₎

Maximum likelihood

approaches

to variance _componentestimation and related

_problems.

J Amer Stat Ass

_72,

320-338

Jensen

_J,

Mao IL

₍₁₉₈₈₎

Transformation

_algorithms

in

_analysis

of

_single

trait and of multitrait models with

_{equal design}

matrices and one random factor _pertrait: a review.

J Anim Sci 66, 2750-2761

Jensen

_{J, Mantysaari E,}

Madsen

_{P, Thompson}

R

₍₁₉₉₅₎

Restricted Maximum Likelihood

estimation of

_(co)variance

_componentsin multivariate linear models

_using

_average of observed and

_expected

information. In: 2!cd

_{European Workshop}

on Advanced

Biometrical Methods in Animal

_{Breeding, Salzburg,}

12-20

_{June, 1995, Mimeo,}

17 _p Johnson

_{DL, Thompson}

R

₍₁₉₉₄₎

Restricted maximum likelihood estimation of variance

components for univariate animal models

_using

sparse matrix

techniques

and _average information. Proc 5th World

_Congr

Genet

_Appl

Livest

_{Prod, University of Guelph,}

Guelph,

vol 18, 410-413

Johnson

DL, Thompson

R

₍₁₉₉₅₎

Restricted maximum likelihood estimation of variance components for univariate animal models

_using

_{sparse matrix}

_techniques

and average

information. J

_Dairy

Sci

_78,

449-456

Kirkpatrick M,

Heckman N

₍₁₉₈₉₎

A

_{quantitative genetic}

model for

_{growth, shape}

and other infinite-dimensional characters. J Math Biol

_27,

429-450

Kirkpatrick M,

Lofsvold

_D,

Bulmer M

₍₁₉₉₀₎

_Analysis

of the

inheritance,

selection and evolution of

growth trajectories.

Genetics

_124,

979-993

Kirkpatrick M,

Hill

_{WG, Thompson}

R

₍₁₉₉₄₎

_Estimating

the covariance structure of traits

during growth

and

_aging,

illustrated with lactations in

_dairy

cattle. Genet Res

_{64, 57-69}

Lin

_CY,

Smith S.P.

₍₁₉₉₀₎

Transformation of multi-trait to unitrait mixed model

_analysis

of data with

_multiple

random effects. J

_Dairy

Sci

_73,

2494-2502

Lindstrom

_MJ,

Bates DM

₍₁₉₈₈₎

_{Newton-Raphson}

and EM

_algorithms

for linear mixed-effects models for

_{repeated-measures}

data. J Amer Stat Ass 83, 1014-1022

Madsen

_P,

Jensen J,

Thompson

R

₍₁₉₉₄₎

Estimation of

_(co)variance

_components

_by

REML

in multivariate mixed linear models

_using

_averageof observed and

_expected

information. Proc 5th World

_Congr

Genet

_Appl

Livest

_Prod,

_University

_{of Guelph, Guelph,}

Vol 22, 19-22

Marquardt

DW

₍₁₉₆₃₎

An

algorithm

for least squares estimation of nonlinear _parameters.

SIAM J 11, 431-441

Meyer

K

₍₁₉₈₅₎

Maximum likelihood estimation of variance _componentsfor a multivariate mixed model with

equal design

matrices. Biometrics

_41,

153-166

Meyer

K

₍₁₉₈₉₎

Restricted maximum likelihood to estimate variance components for animal models with several random effects

_using

a derivative-free

_algorithm.

Genet Select

_{Evol 21,}

317-340

Meyer

K

₍₁₉₉₁₎

_Estimating

variances and covariances for multivariate animal models

_by

restricted maximum likelihood. Genet Selct Evol _23,67-83

Meyer K,

Smith SP

₍₁₉₉₆₎

Restricted maximum likelihood estimation for animal models

using

derivatives of the likelihood. Genet Select Evol

_28,

23-49

Meyer K,

Hill WG

₍₁₉₉₇₎

Estimation of

_genetic

and

_phenotypic

covariance functions for

longitudinal

or

_{’repeated’}

records

_by

restricted maximum likelihood. Livest Prod Sci

(19)

Misztal I

₍₁₉₉₄₎

_Comparison

of

_{computing properties}

of derivative and derivative-free

algorithms

in variance component estimation

_by

REML. J Anim Breed Genet 111,

346-355

Searle SR

₍₁₉₈₂₎

Matrix

_{Algebra Useful for}

Statistics. John

_{Wiley &}

_Sons,New York Smith SP

₍₁₉₉₅₎

Differentiation of the

Cholesky algorithm.

J

Comp Graph

Stat 4, 134-147 Takahashi

_{K, Fagan J,}

Chin MS

₍₁₉₇₃₎

Formation of sparse bus

impedance

matrix and its

application

to short circuit

_study.

Proc 8th Inst PICA

_{Conf Minneapolis, Mineapolis,}

63-69;

cited

by

Smith

₍₁₉₉₅₎

APPENDIX

Calculation of

_{off diagonal}

blocks of F

Smith

₍₁₉₉₅₎

_gives

_pseudo-code

for backwards differentiation of the

_Cholesky

factor of a

_symmetric

matrix. This can be

_adapted

to

_calculating

a selected submatrix

_only

for the case where the

_{corresponding}

part

of the

_Cholesky

factor has zero elements.

Calculating

the

_off-diagonal

block of F* for traits i and

_j,

_Fc

_i

_,

_indirectly,

requires

components

from univariate

_analyses

on the canonical

_scale,

_namely

_L

_C.

with elements

_Lk

_l

and

_1*

with elements

_l!,

the

parts

of the

_Cholesky

factor of

_M*

corresponding

to the coefficient matrix and vector of

_right

hand

_sides,

_{respectively,}

and

_f!

with elements

_/!,

the ’F matrix’

_analogue

to the vector of

right

hand sides for

_{trait j.}

_{F*e.. ’1}

with elements

_F!!

can then be calculated as follows:

(1)

initialise

_Fèij

to

fl1:’,

_ie,

for

_k,

l =

_{1,..., Mo - 1,}

_F!l

_{:= fkl;&dquo;;}

(2)

calculate columns of

_F!,2!

one at a

_time,

_starting

with the last

_{column, ie,}

for

k = Mo - 1, ... ,

I

(a)

adjust

for columns

_already

_{evaluated, ie,}

for m =

_M

_o

_-

_{1, ... ,}

k + 1 and

(b)

divide

_by

the

_pivot,

_ie,

for l =

_1,...,

_{Mo - 1, F}

_lk

:=

-F

lk

/ L

%

k

where ‘:_’ indicates that the left hand side of the

_equation

is overwritten with the

quantity

on the

right.

As noted

_by

Smith

_(1995),

_only

selected elements of F

_(in

the

_general

_case)

need to be

_evaluated,

_{adjustments only occurring}

for non-zero elements of the

_Cholesky

factor. Hence F has the same

_sparsity

structure as L. Similar considerations hold in this case,

reducing

computational

requirements

in

_{practical applications}

substantially.

For

_instance,

columns of

_{F*c.. ’J}

are not

_required

for

_adjustments

to other columns if the

_{corresponding}

row of

_L*

has no non-zero

_off-diagonal

elements. Hence in these columns

only

elements

_{corresponding}

to

_potentially

non-zero elements in the derivatives of M* need to be evaluated. Other redundancies could be

_perceived.

For a

_particular

_analysis

it

_might

be worth

_carrying

out a

symbolic

factorisation of the MMM for all traits on the

_original

scale to determine the

_sparsity

structure of the ’full L’ and thus the minimum number of elements of

Fc..

which need to be evaluated. This would have to be carried out

_only

once _per

(20)

Numerical

example

Consider the

_example

_given

_by

_Meyer

_(1991),

_consisting

of two traits measured

on 284 mice. Table I summarises

_{starting values,}

intermediate results for the first iterate and estimates over rounds of iterations.

Convergence

is reached

_rapidly

even

though

starting

values for covariances were far from the eventual

_estimates,

the AI

algorithm performing

similar to an

_{algorithm using}

observed or

_expected

second

(21)

with

_eigenvalues

_All

= 1.96406 and

A

22

= 0.50389.

Quadratics

on the canonical scale

_{(y2 !Pi Yi )}

are 314.77078 and 375.36955 for the two traits and the

correspond-ing

log ICil

are 241.85744 and

554.08056,

respectively.

With 329 animals in the

analysis

and

_r(X

_o

₎

=

_2,

this

_gives

log

G

_omitting

the _term

tlog

IAI,

of -1172.3761

(cf

[29]).

First derivatives of

_log

_IG

_*

_I

_and

_log

_]R

_*

_calculated

_according

to

_[34]

and

_[35]

are

_given

in table I. Terms

_{y!’yt 2}

_{( i ! j}

₌

_{1, 2)}

and

_{f2 !r!}

₍

_{i, j}

=

1, 2)

are

42285.252,

50449.049 and

_64091.809,

and

_{-83940.962, -101237.320,}

-101089.518 and

_{-127432.879, respectively.}

_Application

of

_[32]

and

_[33]

then

_gives

the first derivatives

_(canonical

_scale)

_{of f (L}

_*

_),

with

_{corresponding}

derivatives

_{ðlog£/8()i =}

-

1/2

(8 log

IG*I/8()i

+

8 log

IR

*

I/8()i

+

_{af (L}

_*

_{)/8Bi )}

as shown in table I. From

[42]

and

_(43],

the Jacobian is

and

_[41]

_yields

derivatives on the

_original

scale as shown

(table I).

The _average information on the canonical scale

_(upper

_triangle

in table

_I)

is calculated