Restricted maximum likelihood estimation of covariances in sparse linear models

(1)

HAL Id: hal-00894192

https://hal.archives-ouvertes.fr/hal-00894192

Submitted on 1 Jan 1998

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Restricted maximum likelihood estimation of

covariances in sparse linear models

Arnold Neumaier, Eildert Groeneveld

To cite this version:

(2)

Original

article

Restricted maximum likelihood

estimation of

covariances

in

_sparse

linear models

Arnold

Neumaier

Eildert Groeneveld

a

Institut fur

Mathematik,

Universitat Wien,

Strudlhofgasse

4, 1090

_Vienna,

Austria

b

Institut

fiir Tierzucht und

Tierverhalten, Bundesforschungsanstalt

fur Landwirtschaft

_H61tystr.

10, 31535

_{Neustadt, Germany}

(Received

16 December

_{1996; accepted}

30

_September

₁₉₉₇₎

Abstract - This paper discusses the restricted maximum likelihood

(REML)

approach

for the estimation of covariance matrices in linear stochastic

models,

as

_implemented

in the current version of the VCE

package

for covariance _componentestimation in

_large

animal

breeding

models. The main features are:

₁₎

the

_{representation}

of the

_equations

in an

_augmented

form that

simplifies

the

_{implementation;}

₂₎

the

parametrization

of the covariance matrices

by

means of their

_{Cholesky factors,}

thus

_{automatically ensuring}

their

positive definiteness;

3)

explicit

formulas for the

_gradients

of the REML function for the

case of

_large

and sparse model

equations

with a

_large

number of unknown covariance

components and

_{possibly incomplete data, using}

the sparse inverse to obtain the

_gradients

cheaply;

4)

use of model

_equations

that make _separateformation of the inverse of the

numerator

_relationship

matrix unnecessary.

Many large

scale

_{breeding problems}

were

solved with the new

_{implementation,}

_{among them}an

_example

with more than 250 000 normal

_equations

and 55 covariance _components,

_taking

41 h CPU time on a Hewlett

Packard 755.

_{© Inra/Elsevier,}

Paris

restricted maximum likelihood

_/

variance component estimation

_/

_missingdata

_/

sparse inverse

/

analytical gradients

Résumé - _{Estimation par maximum}de vraisemblance restreinte de covariance dans les systèmes linéaires peu denses. Ce

papier

discute de

_l’approche

par maximum de vraisemblance restreinte

_(REML)

_pourl’estimation des matrices de covariances dans les modèles

linéaires, qu’applique

le

logiciel

VCE en

_génétique

animale. Les

_{caractéristiques}

principales

sont :

₁₎

la

représentation

des

_équations

sous forme

_{augmentée qui simplifie}

les

calculs ;

2)

le

_{reparamétrage}

des matrices de variance-covariance

grâce

aux facteurs de

Cholesky qui

assure leur caractère défini

positif ;

3)

les formules

_explicites

des

_gradients

de la fonction REML dans le cas des

_{systèmes d’équations}

de

_grande

dimension et peu denses avec un

_grand

nombre de _composantesde covariances inconnues et éventuellement des données _{manquantes :}elles utilisent les inverses peu denses pour obtenir les

gradients

de

*

(3)

manière

_{économique ;}

₄₎

l’utilisation des

_équations

du modèle

qui dispense

de la formation

séparée

de l’inverse de la matrice de

_parenté.

Des

problèmes

de

_génétique

à

_grande

échelle

ont été résolus avec la nouvelle _version,et

_parmi

eux un

exemple

avec

_plus

de 250 000

équations

normales et 55 _composantesde covariance, demandant 41 h de CPU sur un

Hewlett Packard 755.

_©

_{Inra/Elsevier,}

Paris

maximum de vraisemblance restreinte

_/

estimation des composantes de variance

_/

données manquantes

/

inverse peu dense

/

gradient analytique

1. INTRODUCTION

Best linear unbiased

_prediction

of

_genetic

merit

_[25]

_requires

the covariance struc-ture of the model elements involved. In

_practical

_situations,

these are

_usually

un-known and must be estimated.

_During

recent _yearsrestricted maximum likelihood

(REML)

[22, 42]

has

_emerged

as the method of choice in animal

_breeding

for

vari-ance

_component

estimation

_[15-17,

_34-36].

Initially,

the

_expectation

maximization

_{(EM) algorithm}

_[6]

was used for the

optimization

of the REML

_objective

function

_{[26, 47].}

In 1987 Graser et al.

_[14]

introduced derivative-free

optimization,

which in the

following

years led to the

development

of rather

_general

_computing

_algorithms

and

_packages

_[15,

_{28, 29,}

_34]

that were

_mostly

based on the

_simplex

_algorithm

of Nelder and Mead

_[40].

Kovac

_[29]

made modifications that turned it into a stable

algorithm

that no

_{longer converged}

to noncritical

_points,

but this did not

_improve

its inherent

_inefficiency

for

_increasing

dimensions. Ducos et al.

_[7]

used for the first time the more efficient

quasi-Newton

procedure

approximating

gradients by

finite differences. While this

_procedure

was faster than the

simplex algorithm

it was also less robust for

higher-dimensional problems

because the covariance matrix

could become

indefinite,

often

_leading

to false convergence.

Thus,

either for lack of

robustness

_and/or

excessive

_computing

time often

_only

subsets of the covariance matrices could be estimated

_{simultaneously.}

A

_comparison

of different

_packages

_[45]

confirmed the

_general

observation of Gill

_[13]

that

simplex-based optimization

algorithms

suffer from lack of

_stability,

sometimes

_converging

to noncritical

_points

while

_breaking

down

_completely

at more

than three traits. On the other hand the

_{quasi-Newton procedure}

with

_optimization

on the

Cholesky

factor as

_implemented

in a

general

_purposeVCE

package [18]

was

stable and much faster than _anyof the other

_general

_purpose

_algorithms.

While this led to a

_speed-up

of between two for small

_problems

and

_(for

some

examples)

200 times for

_larger

ones as

compared

to the

_{simplex procedure, approximating}

gradients

on the basis of finite differences was still

_{exceedingly costly}

for

_higher

dimensional

problems

[17].

It is well-known that

_optimization

_{algorithms generally}

perform

better with

analytic

gradients

if the latter are

_cheaper

to

_compute

than finite difference

approximations.

In this _paperwe

derive,

in the context of a

general

statistical

model, cheap

analytical gradients

for

_problems

with a

_large

number _pof unknown covariance

components

using

sparse matrix

techniques.

With

hardly

any additional

storage

requirements,

the cost of a combined function and

_gradient

evaluation is

_only

(4)

advantage

over finite difference

gradients.

Misztal and Perez-Enciso

[39] investigated

the use of _sparsematrix

technique

in the context of an EM

algorithm

which is

known to have much worse _convergence

properties

as

compared

to

_quasi-Newton

(see

also

_Thompson

et al.

_[48]

for an

_improvement

in its _space

complexity),

using

an

LDL

T

factorization and the Takahashi inverse

[9];

no results in a REML

application

were

_given.

A _{recent papers}

by Wolfinger

et al.

[50] (based

again

on

the W

_{transformation)}

and

_Meyer

_{[36] (based}

on the

simpler

REML

objective

formulation of Graser et al.

_[14])

also

_provide

_gradients

_(and

even

_Hessians),

but

there a

_gradient

_computation

needs a factor of

O(p)

more work and _spacethan

in our

approach,

where the

complete gradient

is found with

hardly

any additional space and with

(depending

on the

_{implementation)}

two to four times the work for

a function evaluation.

Meyer

[37]

used her

_analytic

second derivatives in a

Newton-Raphson algorithm

for

optimization.

Because the

_optimization

was not restricted to

_positive

definite covariance matrix

_{approximations}

_(as

our

algorithm

does),

she found the

algorithm

to be

_markedly

less robust than

_(the

already

not _very

_robust)

_simplex

_algorithm,

even for univariate models.

We test the usefulness of our new formulas

_{by integrating}

them into the VCE

covariance

_component

estimation

_package

for animal

_(and

_plant)

breeding

mod-els

_[17].

Here the

_gradient

routine is combined with a

quasi-Newton optimization

method and with a

parametrization

of the covariance

parameters

by

the

Cholesky

factor that ensures definiteness of the covariance matrix. In the

_past,

this

combi-nation was most reliable and had the best convergence

properties

of all

_techniques

used in this context

_[45].

Meanwhile,

VCE is

_being

used

_widely

in animal and even

plant breeding.

In the

_past,

the

_largest

animal

breeding

problem

ever solved

([21],

_using

a

quasi-Newton

_procedure

with

optimization

on the

Cholesky

factor)

comprised

233 796

linear unknowns and 55 covariance

_components

and

_required

48

_days

of CPU time

on a 100 MHz HP

9000/755

workstation.

Clearly,

speeding

up the

algorithm

is of

paramount

importance.

In our

_{preliminary implementation}

of the new method

(not

yet

optimized

for

_speed),

we

successfully

solved this

(and

an even

_{larger problem}

of more than 257 000

unknowns)

in

only

41 h of CPU

time,

with a

_speed-up

factor

of

_nearly

28 with

respect

to the finite difference

approach.

The new VCE

implementation

is available free of

charge

from the

ftp

site

ftp://192.108.34.1/pub/vce3.2/.

It has been

_{applied successfully}

throughout

the world to hundreds of animal

breeding

problems,

with

_comparable

_performance

advantages [1-3,

19, 21, 38, 46,

49].

In section 2 we fix notation for linear stochastic models and mixed model

equations,

define the REML

_objective

function,

and review closed formulas for its

_gradient

and Hessian. In sections 3 and 4 we discuss a

general setting

for

practical large

scale

_modeling,

and derive an efficient _wayfor the calculation of

REML function values and

_gradients

for

_large

and sparse linear stochastic models.

All our results are

_completely

_general,

not restricted to animal

_breeding.

How-ever, for the formulas used in our

_{implementation,}

it is assumed that the covariance

matrices to be estimated are block

diagonal

with no restrictions on the

(distinct)

(5)

The final section 5

_applies

the method to a

_simple

demonstration case and several

large

animal

_breeding

_problems.

2. LINEAR STOCHASTIC MODELS AND RESTRICTED LOGLIKELIHOOD

Many applications

(including

those to animal

_breeding)

are based on the

gener-alized linear stochastic model

with fixed effects

_)3,

random effects u and noise ₁₁_.Here

_cov(u)

denotes the covariance matrix of a random vector u with zero mean.

_Usually,

G and D are

block

_diagonal,

with _manyidentical blocks.

By

combining

the two noise

_terms,

the model is seen to be

_equivalent

to the

simple

model _y=

X(3

₊

11 ’,

where

rl’

is _arandom _vectorwith _{zero mean}and

(mixed model)

covariance matrix V =

ZGZ

T

₊D.

Usually,

V is

_huge

and no

longer

block

_{diagonal, leading}

to

_hardly

_manageable

normal

_equations

_involving

the inverse of V.

_However,

Henderson

_[24]

showed that the normal

_equations

are

equivalent

to the mixed model

_equations

This formulation avoids the inverse of the mixed model covariance matrix V and is the basis of most modern methods for

_obtaining

estimates of u and

_j3

in

equation

(1).

Fellner

_[10]

observed that Henderson’s mixed model

_equations

are the

normal

equations

of an

augmented

model of the

_simple

form

where

Thus,

without loss in

_generality,

we _maybase our

algorithms

on the

simple

model

_[3],

with a covariance matrix C that is

_typically

block

_diagonal.

This

automatically

produces

the formulas that

_previously

had to be derived in a less

transparent

way

by

means of the W

transformation;

cf.

[5,

_11,

_23,

50J.

The ’normal

_equations’

for the model

_[3]

have the form

(6)

Here

AT

denotes the

_transposed

matrix of A.

By

solving

the normal

_equations

(4),

we obtain the best linear unbiased estimate

(BLUE)

and,

for the

predictive

variables,

the best linear unbiased

prediction

(BLUP)

for the vector x, and the noise e = Ax - b is estimated

by

the residual

If the covariance matrix C =

C(w)

contains unknown

parameters

w

(which

we shall call

’dispersion

parameters’,

these can be estimated

by minimizing

the

’restricted

_{loglikelihood’}

quoted

in the

following

as the ’REML

objective function’,

as a function of the

parameters

w.

(Note

that all

_quantities

in the

right-hand

side of

equation

(6) depend

on C and hence on

w.)

More

_precisely,

equation

(6)

is the

_logarithm

of the restricted

likelihood,

scaled

_by

a factor

of - 2 and

shifted

by

a constant

_depending

_only

on the

problem

dimension.

Under the

_assumption

of Gaussian

_noise,

the restricted likelihood can be derived

from the

_ordinary

likelihood restricted to a maximal

_subspace

of

_independent

error

contrasts

_(cf.

Harville

_[22];

our formula

(6)

is the

special

case of his formula when

there are no random

effects).

Under the same

_assumption,

another derivation as

a

_limiting

form of a

parametrized

maximum likelihood estimate was

_given

by

Laird

_[31].

When

_applied

to the

_generalized

linear stochastic model

₍₁₎

in the

_augmented

formulation discussed

above,

the REML

_objective

function

₍₆₎

takes the

computa-tionally

most useful form

_given

_by

Graser et al.

_[14].

The

_following

_proposition

contains formulas for

_computing

derivatives of the REML function. We write

for the derivative with

respect

to a

_parameter

_w!

occurring

in the covariance matrix.

Proposition [22,

32, 42,

50].

Let

where A and B are as

previously

defined and

(7)

where

(Note

that,

since A is _nonsquare,the matrix P is

_generally

nonzero

_although

it

always

satisfies PA

_{= 0.)}

3. FULL AND INCOMPLETE ELEMENT FORMULATION

For the

_{practical modeling}

of linear stochastic

_systems,

it is useful to

_split

model

₍₃₎

into blocks of uncorrelated model

_equations

which we call ’element

equations’.

The element

_equations

_usually

fall into several

types,

distinguished by

their covariance matrices. The model

_equation

for an element v of

_{type y}

has the form

Here

_All

is the coefficient matrix of the block of

_equations

for element number v.

Generally,

All

is _{very sparse}with few rows and _many

_columns,

most of them _zero,

since

_only

a small subset of the variables occurs

_explicitly

in the vth element. Each model

_equation

has

_only

one noise term. Correlated noise must be

put

into one element. All elements of the same

_type

are assumed to have

_{statistically}

independent

noise

_vectors,

realizations of

_(not

_necessarily

_Gaussian)

distributions with zero mean and the same covariance matrix.

_(In

our

_{implementation,}

there are no constraints on the

_{parametrization}

of the

_C

_o

_y,

but it is not difficult to

_modify

the formulas to handle more restricted

cases.)

Thus the various elements are

_assigned

to the

_types

_according

to the covariance matrices of their noise vectors. 3.1.

_Example

animal

_{breeding applications}

In covariance

_component

estimation

_problems

from animal

_breeding,

the vector

x

splits

into small vectors

_/3

_k

of

_(in

our

_present

_{implementation}

_constant)

size ntrait

called ’effects’. The

_right-hand

side b contains measured data _{vectors y,}and zeros.

Each index v

_corresponds

to some animal. The various

_types

of elements are as

follows.

Measurement elements: the measurement vectors y&dquo; E

lR

nt

ra

’t

are

_explained

in

terms of a linear combination of effects

₍₃

_i

C

_7Rnt!a’t,

Here the

_i

_w

_i

form an _n_recx _ff_e_nindex

matrix,

the J.1vl form an nrecx _n_effcoefficient

(8)

measurement matrix are concatenated so that a

single

matrix

_containing

the

floating

point

numbers results. If the set of traits

_splits

into _groupsthat are measured on

different sets of

_animals,

the measurement elements

_split

_accordingly

into several

types.

Pedigree

elements: for some

_animals,

identified

_by

the index T of their additive

genetic

effect

_(3T,

we _mayknow the

_parents,

with

_{corresponding}

indices V

(father)

and M

_(mother).

Their

_genetic

_dependence

is modeled

_by

an

equation

The indices are stored in

_pedigree

records which contain a column of animal indices

T(v)

and two further columns for their

_parents

_{(V(v), M(v)).}

Random effect elements: certain effects

_/3

_R(

_-y

₎

_h

=

3, 4, ...)

_areconsidered _as

random effects

_by

including

trivial model

_equations

As

_part

of the model

_(13),

these trivial elements

_{automatically produce}

the traditional mixed model

_equations,

as

_explained

in section 2.

We now return to the

_general

situation. For elements numbered

_{by v}

=

1, ..., N,

the full matrix formulation of the model

₍₁₃₎

is the model

₍₃₎

with

where

_-y(v)

denotes the

type

of element v.

A

_{practical algorithm}

must be able to account for the situation that some

components

of

b,

are

_missing.

We allow for

_incomplete

data vectors b

_{by simply}

deleting

from the full model the rows of A and b for which the data in b are

_missing.

This is

_appropriate

whenever the data are

_missing

at random

_[43];

note that this

assumption

is also used in the

_missing

data

_{handling by}

the EM

_approach

_{[6, 27].}

Since

_dropping

rows

_changes

the affected element covariance matrices and their

Cholesky

factors in a nontrivial way, the derivation of the formulas for

incomplete

data must be

performed

carefully

in order to obtain correct

_gradient

information. We therefore formalize the

_incomplete

element formulation

_{by introducing}

projec-tion matrices

_{P, coding}

for

missing

data

pattern

[31].

If we define

P,

as the

_(0,1)

matrix with

_exactly

one 1 per row

(one

row for each

_{component present}

in

b,),

at most one 1 _percolumn

(one

column for each

_component

of

b,),

then

P&dquo;A&dquo;

is the matrix obtained from

A,

by deleting

the rows for which data are

_missing,

and

P,b,

is the vector obtained from

b,

by deleting

the rows for which data are

missing.

Multiplication

by

p

T

on the

_right

of a matrix removes the columns

cor-responding

to

_missing

_components.

_{Conversely, multiplication by}

_p

_T

on the left or

P on the

_right

restores

_missing

rows or

_columns,

_{respectively, by}

_filling

them with zeros.

(9)

element

_equations

where

The

_incomplete

element

_equations

can be combined to full matrix form

_(3),

with

and the inverse covariance matrix takes the form

where

Note that

_C!,

_M

_v

_,

and

_{log det}

_{C! (a}

byproduct

of the inversion via a

_Cholesky

factorization,

needed for the

_gradient

_calculation)

_depend

_only

on

_type

_q(v)

and

missing

data

_pattern

_P,,

and can be

_computed

in

_advance,

before the calculation of the restricted

_{loglikelihood}

_begins.

4. THE REML FUNCTION AND ITS GRADIENT IN ELEMENT FORM

From the

_{explicit representations}

₍₁₆₎

and

_(17),

we obtain the

_following

formulas

for the coefficients of the normal

_equations

After

_assembling

the contributions of all elements into these sums, the coefficient

matrix is factored into a

_product

of

_triangular

matrices

using

sparse matrix routines

[8, 20].

Prior to the

_{factorization,}

the matrix is

reor-dered

_by

the

_multiple

minimum

_degree

_algorithm

in order to reduce the amount

(10)

evaluation,

together

with a

_symbolic

factorization to allocate

storage.

Without loss of

_generality,

and for the sake of

_simplicity

in the

_{presentation,}

we _mayassume

that the variables are

_already

in the correct

_ordering;

our _programs

_perform

this

ordering

automatically, using

the

_multiple

minimum

_{degree ordering ’genmmd’}

as

used in

_{’Sparsepak’ [43].}

Note that R is the

_transposed

_Cholesky

factor of B.

_{(Alternatively,}

one can

obtain R from a _sparse

_QR

factorization of

_A,

see _e.g.Matstoms

_[33].)

To take care of

_dependent

(or

_nearly

dependent)

linear

_equations

in the model

formulation,

we

_replace

in the factorization small

_{pivots <}

_sB

_2i

_by

1.

_(The

choice

E =

(macheps)2!3,

where

macheps

is the machine

accuracy,

proved

to be suitable. The

_exponent

is less than 1 to allow for some accumulation of roundoff _errors,but

still

_guarantees

_2/3

of the maximal

_accuracy.)

To

_justify

this

_replacement,

note

that in the case of consistent

_equations,

an exact linear

_dependence

results in a

factorization

_step

as in the

following

In the _presenceof

_rounding

errors

(or

in case of near

dependence)

we obtain

entries of order

eB

ii

in

_place

of the

_diagonal

zero.

_(This

even holds when

_B

_ii

is small but nonzero, since the usual bounds on the

_rounding

errors scale

_naturally

when the matrix is scaled

_{symmetrically,}

and we _maychoose the

_scaling

such that nonzero

_diagonal

entries receive the value one. Zero

diagonal

elements in a

_positive

semidefinite matrix occur for zero rows

_only,

and remain zero in the elimination

process.)

If we add

B

i

to

R

i

when

Rii

<

eB

ii

and set

Rii

= 1 when

Bii

=

0,

the

near

_dependence

is

_correctly

resolved in the sense that the extreme

_sensitivity

or

arbitrariness in the solution is removed

by

forcing

a small

entry

into the ith

_entry

of the solution

vector,

thus

_avoiding

the introduction of

_large

_components

in null

space directions.

(It

is useful to issue

diagnostic

warnings giving

the indices of the

column indices i where such near

_dependence

_occurred.)

The determinant

is available as a

_byproduct

of the factorization. The above modifications to _cope

with near linear

_dependence

are

_equivalent

to

_{adding prior}

information on the distribution of the

parameters

with those indices where

_pivots

changed. Hence,

provided

that the set of indices where

_pivots

are modified does not

_change

with the

_iteration,

they

produce

a correct behavior for the restricted

_{loglikelihood.}

If this set of indices

_changes,

the

_problem

is

_ill-posed,

and would have to be treated

(11)

seen a failure of the

_algorithm

because of the

_possible

_{discontinuity}

in the

_objective

function caused

by

our

_procedure

for

_{handling (near) dependence.}

Once we have the

_{factorization,}

we can solve the normal

_{equations R}

T

Rx

= a

for the vector x

_{cheaply by solving}

the two

_triangular

_systems

(In

the case of an

orthogonal

factorization one has instead to solve Rx =

y, where

y

= Q

T

b.)

From the best estimate

x

for the vector x, we _maycalculate the residual as

with the element residuals

Then we obtain the

_objective

function as

Although

the formula for the

_gradient

involves the dense matrix

_B-

₁

_,

the

gradient

calculation can be

_{performed using}

_only

the

_components

of

B-

1

within the

_sparsity

_pattern

of

R

T

+ R. This

part

of

B-’

is called the

_’sparse

inverse’ of B and can be

computed cheaply;

cf.

Appendix

1. The use of the _sparseinverse for

the calculation of the

_gradient

is discussed in

_Appendix

2.

The

_{resulting algorithm}

for the calculation of a REML function value and its

gradient

is

_given

in table

_I,

in a form that makes

_good

use of dense matrix

_algebra

in the case of

_larger

covariance matrix blocks

_Cl,.

The

symbol

EB denotes

adding

a

dense subvector

_{(or submatrix)}

to the

_{corresponding}

entries of a

large

vector

_(or

matrix).

In the calculation of the

_symmetric

matrices

_B’,

_W,

M’ and

_K’,

it suffices

to calculate the upper

triangle.

Symbolic

factorization and matrix

_reordering

are not

present

in table I since these are

_performed

_only

once before the first function evaluation. In

large-scale

_{applications,}

the bulk of the work is in the

_computation

of the

_Cholesky

factorization and the sparse inverse.

Using

the _sparse

_inverse,

the work for function and

_gradient

calculation is about three times the work for function evaluation alone

(where

the sparse inverse is not

_needed).

In

_particular,

when the number p of

estimated covariance

_components

is

_large,

the

_{analytic gradient}

takes

_only

a small

fraction

_2/p

of the time needed for finite difference

_{approximations.}

Note also that for a combined function and

_{gradient evaluation, only}

two _sweeps

through

the data are

needed,

an

_important

asset when the amount of data is so

(12)

(13)

5. ANIMAL BREEDING APPLICATIONS

In this section we

_give

a small numerical

_example

to demonstrate the

_setup

of

various

_matrices,

and

_give

less detailed results on two

_{large problems. Many}

other animal

_{breeding problems}

have been

solved,

with similar

_advantages

for the new

algorithm

as in the

_{examples given}

below

_[1-3,

_{19, 38, 49].}

5.1. Small numerical

_example

Table II

_gives

the data used for a numerical

_example.

There are in all

_eight

animals which are listed with their

_parent

codes in the first block under

’pedigree’.

The first five of them have

_{measurements,}

i.e.

_dependent

variables listed under

’dep

var’. Each animal has two traits measured

_except

for animal 2 for which the second

measurement is

_missing.

Structural information for

independent

variables is listed under

_’indep

var’. The first column in this block denotes a continuous

independent

variable,

such as

_weight,

for which a

_regression

is to be fitted. The

_following

columns

are some fixed

_effect,

such as _sex,a random

_component,

such as herd and the animal

identification. Not all effects were fitted for both traits. In

_fact,

_weight

was

_only

fitted for the first trait as shown

by

the model matrix in table IIZ

The

_input

data are translated into a series of matrices

given

in table IV. To

_improve

numerical

_stability,

_dependent

variables are scaled

_by

their standard

deviation and mean, while the continuous

dependent

variable is shifted

by

its mean

(14)

Since there is

_only

one random effect

(apart

from the correlated animal

effect),

the full element formulation

_[13]

has three

_types

of model

_equations,

each with an

independent

covariance structure

_C.

_Y

_.

Measurement elements

_{(type y}

=

1):

the

dependent

variables

give

rise to

_type

= 1 as listed in the second column in table IV. The second

_entry

is

special

in that

it denotes the residual covariance matrix for this record with a

_missing

observation.

To take care of

this,

a new

_mtype

is created for each

_pattern

of

_missing

values

(with

mtype

=

type

if no value is

missing)

[20];

i.e. the different values of

_mtype

correspond

to the different matrices

_C!.

_However,

it is still based on

_C

₁

as

_given

in table V which lists all

types

in this

_example.

Pedigree

elements

_{(type q}

=

2):

the next nine rows in table IV are

generated

from the

_pedigree

information. With both

_parents

known,

three entries are

generated

in both the address and coefficient matrices. With

_only

one

_parent

_known,

two

addresses and coefficients are

_needed,

while

_only

one

_entry

is

_required

if no

_parent

information is available. For all entries the

_type

_{is -y}

= 2 with the covariance

matrix

_C

₂

_.

Random effect elements

_{(type y = 3):}

the last four rows in table IV are the entries

due to random effects which

_comprise

three herd levels in this

_{example. They}

have

type -y

= 3 with the covariance matrix

C

3 .

All covariance matrices are 2 x

_2,

so that

_{p = 3 + 3 + 3 = 9 dispersion}

_parameters

need to be estimated.

The addresses in the

_following

columns in table IV are derived

_directly

from

the level codes in the data

_{(table 77)}

_allocating

one

_equation

for each trait within

(15)

covariables

_only

one

_equation

is

_{created, leading}

to the address of 0 for all five

measurements.

The coefficients

_{corresponding}

to the above addresses are stored in another matrix as

_given

in table IV. The entries are 1 for class effects and continuous

variables in the case of

_regression

(shifted

_by

the

mean).

The address matrices and coefficient matrices in table IV form a _sparse

repre-sentation of the matrix A of

equation

(3)

and can thus be used

_directly

to set up the normal

equations.

Note that

only

one pass

through

the model

equations

is

required

to handle

_data,

random effects and

_pedigree

information.

_Also,

we would

like to

_point

out that this

_algorithm

does not

_require

a

_separate

treatment of the

numerator

_relationship

matrix.

_Indeed,

the historic

_problem

of

_obtaining

its inverse is

_completely

avoided with this

_approach.

As an

example

of how to set _upthe normal

equations,

we look at line 12 of table IV

_(because

it does not

_generate

as _manyentries as the first five

lines).

For

the animal labelled T in table

_IV,

the variables associated with the two traits have index T + 1 and T + 2. The contributions

_generated

from line

_12,

(16)

(17)

Starting

values for all

_C

_w

for the scaled data were chosen

as 3

for all variances

and 0.0001 for all

covariances, amounting

to a

_point

in the middle of the

_parameter

space. With

C

w

specified

as above we have for its inverse

Optimization

was

_performed

with a BFGS

_algorithm

as

_{implemented by}

Gay

[12].

For the first function evaluation we obtain a

_gradient

_given

in table VI with

a function value of 17.0053530.

Convergence

was reached after 51 iterations with

solutions

given

in table VII at a

_{loglikelihood}

of 15.47599750.

5.2. A

_{large problem}

A

_{large problem}

from the area of

_pig

_breeding

has been used to test an

implementation

of the above

algorithm

in the VCE

_package

_[17].

The data set

comprised

26 756 measurement records with six traits. Table IX

_gives

the number of levels for each effect

_leading

to 233 796 normal

_equations.

The columns headed

by

’trait’

_represent

the model matrix

_(cf.

table

_III)

_mapping

the effects on the traits.

As can be _seen,the statistical model is different for the various traits.

Because traits 1

_through

4 and traits 5 and 6 are measured on different animals

no residual covariances can be

estimated,

resulting

in two

_types

la and

_lb,

with 4 x 4 and 2 x 2 covariance matrices

_C

_la

and

_C

₁₆

_.

_Together

with the 6 x 6 covariance matrices

_C

₂

and

C

3

for

_pedigree

effect 9 and random effect

_8,

_{respectively,}

a total

(18)

We

_compared

the finite difference

_{implementation}

of VCE

_[17]

with an

_analytic

gradient

implementation

based on the

_techniques

of the

_present

_paper.An

uncon-strained minimization

algorithm

written

_by

Schnabel et al.

_[44]

that

_approximates

the first derivatives

_by

finite differences was used to estimate all 55

_components

simultaneously.

The run

_performed

37 021 function evaluations at 111.6 s each on

a Hewlett Packard 755 model

_amounting

to a total CPU time of 47.8

days.

To our

knowledge,

it was the first estimate of more than 50 covariance

_components

simul-taneously

for such a

large

data set with a

completely

_general

model. Factorization

was

performed

_by

a block _sparse

Cholesky algorithm

due to

_Ng

and

_Peyton

_[41].

Using

analytic gradients,

convergence was reached after 185 iterations

taking

13 min

_each;

the less efficient factorization from Misztal and Perez-Enciso

_[39]

was

used here because of the

_availability

of their sparse inverse code. An even

_slightly

better solution was reached and

_only

41 h of CPU time were

_{used, amounting}

to

a measured

_speed-up

factor of

_nearly

28.

_However,

this

_speed-up

underestimates the

superiority

of

_{analytical gradients}

because the factorization used in the Misztal and Perez-Enciso’s code is less efficient than

_Ng

and

_Peyton’s

block _sparse

_Cholesky

factorization used for

_{approximating}

the

gradients by

finite differences.

Therefore,

the

_following

_comparison

will be based on CPU time measurements made on Misztal

and Perez-Enciso’s factorization code.

For the above data set the CPU _usageof the current

_{implementation -}

which has not

_yet

been tuned for

speed

(so

the sparse inverse takes three to four times

the time for the numerical

_{factorization) -}

is

_given

in table X. As can be seen from

this table

_computing

one

approximated gradient by

finite

differencing

takes around

202.6 * 55 = 11 143 _s,while _one

analytical gradient

_costs

only

around four times the

set-up

and

_solving

of the normal

_equations,

i.e. 812 s.

_Thus,

the

_expected

speed-up would be around 14. The 37 021 function evaluations

required

in the run with

approximated

gradients

(which

include some linear

searches)

would have taken 86.8

days

with the Misztal and Perez-Enciso code.

_Thus,

the resultant

_superiority

of our

new

algorithm

is

nearly

51 for the model under consideration. This is much

_larger

than the

_expected

_speed-up

of 14

_{mainly because,}

with

_approximated

_gradients,

673

optimization

steps

were

_performed

as

_compared

to the 185 with

_analytical

gradients.

Such a

_high

number of iterations with

approximated gradients

could be observed

in many runs with

higher

numbers of

dispersion

variables and can be attributed

(19)

optimization

process even aborted when

_{using approximated}

_gradients,

whereas

analytical

gradients yielded

correct solutions.

5.3. Further evidence .

Table

_{XI presents}

data on a number of different runs that have been

_performed

with our new

_algorithm.

The statistical models used in the datasets _vary

substan-tially

and cover a

_large

_rangeof

problems

in animal

_breeding.

The new

algorithm

showed the same behaviour also on a

_plant

_breeding

dataset

(beans)

which has a

quite

different structure as

_compared

to the animal data sets.

The datasets

_(details

can be obtained from the second

author)

cover a whole

range of

problem

sizes both in terms of linear and covariance

_components.

Accord-ingly,

the number of nonzero elements varies

_{substantially}

from a few ten thousands

up to _manymillions.

_Clearly,

the number of iterations increases with the number of

dispersion

variables with a maximum well below 200. Some of the runs estimated

covariance matrices with _very

_high

correlations well above 0.9.

_Although

this is close

to the border of the

_parameter

_spaceit did not seem to slow down _convergence,a

(20)

For the above datasets the ratio of

_obtaining

the

_gradient

after and relative to

the factorization was between 1.51 and 3.69

_{substantiating}

our initial claim that the

analytical gradient

can be obtained at a small

_multiple

of the CPU time needed to

calculate the function value alone.

_(For

the

_large

animal

_{breeding problem}

described in table

_X,

this ratio was

2.96.)

So

_far,

we have not

_experienced

_anyratios that were

above the value of 4. From this we can conclude that with

increasing

numbers of

dispersion

variables our

_algorithm

is

_inherently

_superior

to

_{approximated gradients}

by

finite differences.

In

_conclusion,

the new version of VCE not

_only

_computes

_analytical

_gradients

much faster than the finite difference

_{approximations}

_(with

the

_superiority

increas-ing

with the number of covariance

_components),

but also reduces the number of iterations

_by

a factor of around

three,

thereby

expanding

the scope of REML

covari-ance

_component

estimation in animal

breeding

models

_{considerably.}

No

_previous

code was able to solve

_problems

of the size that can be handled with this

imple-mentation.

ACKNOWLEDGEMENTS

Support by

the H. Wilhelm Schaumann Foundation is

_{gratefully acknowledged.}

REFERENCES

[1]

Brade

W.,

Groeneveld

_{E., Bestimmung genetischer Populationsparameter}

fur die

Einsatzleistung

von

_Milchkuhen,

Arch. Anim.

_Breeding

2

₍₁₉₉₅₎

149-154.

[2]

Brade

W.,

Groeneveld

E.,

Einfluf3 des Produktionsniveaus auf

_genetische

Popula-tionsparameter

der

_{Milchleistung}

sowie auf

_{Zuchtwertschatzergebnisse,}

Arch. Anim.

Breeding

38

₍₁₉₉₅₎

289-298.

[3]

Brade

W.,

Groeneveld

E., Bedeutung

der

_speziellen

_{Kombinationseignung}

in der

Milchrinderzuchtung. Zuchtungskunde

68

₍₁₉₉₆₎

12-19.

[4]

Chu

_E.,

_{George A.,}

Liu

_{J., Ng E.,}

SPARSEPAK: Waterloo sparse matrix

package

user’s

_guide

for

SPARSEPAK-A,

Technical

Report CS-84-36, Department

of

Com-puter

Science, University

of

Waterloo, Ontario, Canada,

1984.

[5]

Corbeil

R.,

Searle

_S.,

Restricted maximum likelihood

_(REML)

estimation of variance

components in the mixed

_model,

Technometrics 18,

(1976)

31-38.

[6]

Dempster A.,

Laird

_N.,

Rubin

_D.,

Maximum likelihood from

incomplete

data via the EM

_algorithm,

J.

Roy.

Statist. Soc. B 39

₍₁₉₇₇₎

1-38.

[7]

Ducos

A.,

Bidanel

_{J., Ducrocq V.,}

Boichard

_D.,

Groeneveld

_E.,

Multivariate

re-stricted maximum likelihood estimation of

_genetic

_parametersfor

_growth,

carcass

and meat

_quality

traits in French

_Large

White and French Landrace

_Pigs,

Genet. Sel. Evol. 25

₍₁₉₉₃₎

475-493.

[8]

Duff

L,

Erisman

_A.,

Reid

J.,

Direct Methods for

Sparse Matrices,

Oxford Univ.

Press, Oxford,

1986.

[9]

Erisman

_{A., Tinney W.,}

On

_computing

certain elements of the inverse of a _sparse

matrix, Comm. ACM 18

₍₁₉₇₅₎

177-179.

(21)

[11]

Fraley C.,

Burns

P., Large-scale

estimation of variance and covariance _components, SIAM J. Sci.

Comput.

16

₍₁₉₉₅₎

192-209.

[12]

Gay D., Algorithm

611 - subroutines for unconstrained minimization

_using

a

model/trust-region

approach,

ACM Trans. Math. Software 9

₍₁₉₈₃₎

503-524.

[13]

Gill J., Biases in balanced experiments with uncontrolled random

_factors,

J. Anim. Breed. Genet.

₍₁₉₉₁₎

69-79.

[14]

Graser H., Smith

S.,

Tier B., A derivative-free

approach

for

_estimating

variance

components in animal models

by

restricted maximum

likelihood,

J. Anim. Sci. 64

(1987)

1362-1370.

[15]

Groeneveld

E.,

Simultaneous REML estimation of 60 coariance _componentsin an

animal model with

_missing

values

_using

a Downhill

_{Simplex algorithm,}

in: 42nd Annual

_Meeting

of the

_European

Association for Animal Production,

Berlin,

1991,

vol. 1, pp. 108-109.

[16]

Groeneveld

_E.,

Performance of direct sparse matrix solvers in derivative free REML

covariance _component

_estimation,

J. Anim. Sci. 70

₍₁₉₉₂₎

145.

[17]

Groeneveld E. REML VCE - a multivariate multimodel restricted maximum likeli-hood

_(co)variance

component estimation

package,

in:

_Proceedings

of an EC

Sympo-sium on

_Application

of Mixed Linear Models in the Prediction of Genetic Merit in

Pigs,

Mariensee, 1994.

[18]

Groeneveld

_E.,

A

_{reparameterization}

to

_improve

numerical

_optimization

in

multivari-ate REML

_(co)variance

_componentestimation, Genet. Sel. Evol. 26

₍₁₉₉₄₎

537-545.

[19]

Groeneveld

_E.,

Brade

W.,

Rechentechnische

_Aspekte

der multivariaten REML

Ko-varianzkomponentenschatzung, dargestellt

an einem

Anwendungsbeispiel

aus der

Rinderzuchtung,

Arch. Anim.

Breeding

39

₍₁₉₉₆₎

81-87.

[20]

Groeneveld

_E.,

Kovac

_M.,

A

_{generalized computing procedure}

for

_setting

_upand

solving

mixed linear

models,

J.

Dairy

Sci. 73

₍₁₉₉₀₎

513-531.

[21]

Groeneveld

_E.,

Csato

_L.,

Farkas J., Radnoczi L., Joint

_genetic

evaluation of field and station test in the

_{Hungarian Large}

White and Landrace

_populations,

Arch. Anim.

_Breeding

39

₍₁₉₉₆₎

513-531.

[22]

Harville D., Maximum likelihood

_approaches

to variance _componentestimation and

to related

problems,

J. Am. Statist. Assoc. 72

₍₁₉₇₇₎

320-340.

[23]

Hemmerle

W., Hartley H., Computing

maximum likelihood estimates for the mixed A.O.V. model

using

the W

transformation,

Technometrics 15

₍₁₉₇₃₎

819-831.

[24]

Henderson

C.,

Estimation of

genetic

parameters, Ann. Math. Stat. 21

₍₁₉₅₀₎

706.

[25]

Henderson

_{C., Applications}

of Linear Models in Animal

_{Breeding, University}

of

Guelph

(1984).

[26]

Henderson

C.,

Estimation of variances and covariances under

_multiple

trait

_models,

J.

_Dairy

Sci. 67

₍₁₉₈₄₎

1581-1589.

[27]

Jennrich R., Schluchter M., Unbalanced

repeated-measures

models with structured covariance

_{matrices, Biometry}

42

₍₁₉₈₆₎

805-820.

[28]

Jensen

_J.,

Madsen

_P.,

A User’s Guide to

DMU,

National Institute of Animal

Science,

Research Center Foulum Box 39, 8830

_{Tjele, Denmark,}

1993.

[29]

Kovac

_M.,

Derivative free methods in covariance _{component estimation,}Ph.D.

thesis,

University

of Illinois,

Urbana-Champaign.

[30]

Laird

_N.,

_Computing

of variance _components

_using

the EM

_algorithm,

J. Statist.

Comput.

Simul. 14

₍₁₉₈₂₎

295-303.

[31]

Laird N.,

Lange N.,

Stram

_D.,

Measures:

_Application

of the EM

_algorithm,

J. Am. Statist. Assoc. 82

₍₁₉₈₇₎

97-105.

(22)

[33]

Matstoms

P., Sparse QR

factorization in

_MATLAB,

ACM Trans. Math. Software 20

(1994)

136-159.

[34]

Meyer K.,

DFREML - a set of programs to estimate variance _componentsunder an

individual animal

model,

J.

Dairy

Sci. 71

_{(suppl. 2) (1988)}

33-34.

[35]

Meyer K.,

Restricted maximum likelihood to estimate variance _componentsfor animal models with several random effects

_using

a derivative free

_algorithm,

Genet.

Sel. Evol. 21

₍₁₉₈₉₎

317-340.

[36]

Meyer K., Estimating

variances and covariances for multivariate animal models

_by

restricted maximum

_likelihood,

Genet. Sel. Evol. 23

₍₁₉₉₁₎

67-83.

[37]

Meyer K.,

Derivative-intense restricted maximum likelihood estimation of

covari-ance _componentsfor animal

_models,

in: 5th World

_Congress

on Genetics

Applied

to Livestock

_{Production, University}

of

_Guelph,

7-12

_{August 1994,}

vol.

18, 1994,

pp. 365-369.

[38]

Mielenz

_N.,

Groeneveld

_E.,

Muller J.,

Spilke

J., Simultaneous estimation of

covari-ances with REML and Henderson 3 in a selected chicken

population,

Br. Poult. Sci

35

_(1994).

[39]

Misztal L, Perez-Enciso M.,

Sparse

matrix inversion for restricted maximum likeli-hood estimation of variance _components

_{by expectation-maximization,}

J.

_Dairy

Sci.

(1993)

1479-1483.

[40]

Nelder _J.,Mead R., A

_simplex

method for function minimization,

_Comput.

J. 7

(1965)

308-313.

’

[41]

Ng E., Peyton B.,

Block sparse

Cholesky algorithms

on advanced

_uniprocessor

computers, SIAM J. Sci.

Comput.

14

₍₁₉₉₃₎

1034-1056.

[42]

Patterson _H.,

_{Thompson R., Recovery}

of inter-block information when block sizes are

_unequal,

Biometrika 58

₍₁₉₇₁₎

545-554.

[43]

Rubin

D.,

Inference and

_{missing data,}

Biometrika 63

₍₁₉₇₆₎

581-592.

[44]

Schnabel

_{R., Koontz J., Weiss B.,}

A modular system of

algorithms

for unconstrained

minimization, Technical

Report

CU-CS-240-82

Comp.

Sci. Dept.,

University

of

Colorado, Boulder,

1982.

[45]

Spilke J.,

Groeneveld

E., Comparison

of four multivariate REML

_(co)variance

com-ponent estimation

_packages,

in: 5th World

Congress

on Genetics

_Applied

to Livestock

Production, University

of

Guelph,

7-12

_{August 1994,}

vol.

_22,

1994, pp. 11-14.

[46]

Spilke J.,

Groeneveld

E.,

Mielenz

N.,

A Monte-Carlo

study

of

_(co)variance

compo-nent estimation

_(REML)

for traits with different

design

matrices, Arch. Anim. Breed. 39

₍₁₉₉₆₎

645-652.

[47]

Tholen E.,

Untersuchungen

von Ursachen und

_{Auswirkungen heterogener}

Varianzen der Indexmerkmale in der Deutschen

Schweineherdbuchzucht,

Schriftenreihe

Land-bauforschung V61kenrode,

Sonderheft 111

_(1990).

[48]

Thompson R., Wray N., Crump R.,

Calculation of

_prediction

error variances

_using

sparse matrix

methods,

J. Anim. Breed. Genet. Ill

(1994)

102-109.

[49]

Tixier-Boichard M., Boichard _D.,Groeneveld E., Bordas

_A.,

Restricted maximum likelihood estimates of

_genetic

parameters of adult male and female Rhode Island Red Chickens

_divergently

selected for residual feed

_consumption,

Poultry

Sci. 74

(1995)

1245-1252.

(23)

APPENDIX 1:

_Computing

the sparse inverse

A

_cheap

_wayto

_compute

the sparse inverse is based on the relation

for the inverse B =

B-

1 .

By comparing

coefficients in the upper

triangle

of this

equation,

noting

that

_(R-

₁

₎

_ii

=

(R

ii

)-

1 ,

_wefind that

where

₆

_ik

denotes the Kronecker

_symbol;

hence

To

_{compute B}

_ik

from this

_formula,

we need to know the

_B

_jk

for

_{all j}

> i with

Ri! !

0. Since the factorization _process

_produces

a

_sparsity

structure with the

property

(ignoring

accidental zeros from cancellation that are treated as

_explicit

zeros),

one

can

_compute

the

_components

of the inverse

B

within the

_sparsity

_pattern

of R

T

+R

by

equation

(A2)

without

_calculating

_anyof its entries outside this

_sparsity

pattern.

If

_equation

_(A2)

is used in the

_ordering

_{i = n, n - 1, ....,1,}

the

_only

additional

space needed is that for a _copyof the

RZ! !

_0,

(j

>

_i),

which must be saved before

we

_compute

the

B,!(R,! 7!

_{0, k >}

_i)

and overwrite them over

R

ik

.

(A

similar

analysis

is

_performed

for the Takahashi inverse

_by

Erisman and

_Tinney

_[9],

based

on an

LDL

T

_{factorization.)}

Thus the number of additional

_storage

locations needed

is

_only

the maximal numbers of nonzeros in a row of R.

The cost is a small

_multiple

of the cost for

_factoring

_B,

_excluding

the

_symbolic

factorization;

the

proof

of this

_by

Misztal and Perez-Enciso

_[39]

for the _sparse inverse of an

LDL

T

factorization

_applies

almost without

_change.

APPENDIX 2: Derivation of the

_algorithm

in table I

For the derivative with

respect

to a variable that occurs in

_C

_o

_y

_only,

_equation

(15)

implies

that

(The

computation

of

_C

_o

_y

is addressed

_below.)

_Using

the notation

_{[... ],}

for the vth

(24)

(a

consequence of the

Proposition)

the formula

hence

i ik

with the

_symmetric

matrices

Therefore,

Up

to this

_point,

the

_dependence

of the covariance matrix

_C

_o

_y

on

_parameters

was

_arbitrary.

For an

_{implementation,}

one needs to decide on the

_independent

parameters

in which to _expressthe covariance matrices. We made the

_following

choice in our

_{implementation, assuming}

that there are no constraints on the

parametrization

of the

_C,y;

other choices can be handled

_similarly,

with a similar

cost

_resulting

for the

_gradient.

Our

parameters

are, for each

type

-y, the nonzero

entries of the

_Cholesky

factor

_L,y

of

_C

_o

_y,

defined

_by

the

_equation

together

with the conditions

since this

_{automatically}

_guarantees

_positive

definiteness.

_(In

the

_limiting

case, where

a block of the true covariance matrix is semidefinite

_only,

this will be revealed in the minimization

_procedure

_by

_converging

to a

_singular

_L.y

while each

_computed

L

o

y

is still

_{nonsingular.)}

We now consider derivatives

with

_respect

to the

_parameter

where -y is one of the

_types,

and the indices

i,

k

satisfy

i ! k.

Clearly,

L

o

y

is zero

_except

for a 1 in

_position

_(i,

_k),

_{and, using}

the notation

e’

for the ith column of an

_identity

_matrix,

we can _expressthis as

Restricted maximum likelihood estimation of covariances in sparse linear models

HAL Id: hal-00894192

https://hal.archives-ouvertes.fr/hal-00894192

Submitted on 1 Jan 1998

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Restricted maximum likelihood estimation of

covariances in sparse linear models

Arnold Neumaier, Eildert Groeneveld

To cite this version:

Original

article

Restricted maximum likelihood

estimation of

covariances

in

sparse

linear models

Arnold

Neumaier

Eildert Groeneveld

Mathematik,

Strudlhofgasse

Vienna,

b

Institut

Tierverhalten, Bundesforschungsanstalt

H61tystr.

Neustadt, Germany

(Received

1996; accepted

September

1997)

(REML)

approach

models,

implemented

package

large

breeding

1)

representation

equations

augmented

simplifies

implementation;

2)

parametrization

by

Cholesky factors,

automatically ensuring

positive definiteness;

3)

explicit

gradients

large

equations

large

possibly incomplete data, using

gradients

cheaply;

4)

equations

relationship

Many large

breeding problems

implementation,

example

equations

taking

_sparse

_Vienna,

_H61tystr.

_{Neustadt, Germany}

_{1996; accepted}

_September

₁₉₉₇₎

_implemented

_large

₁₎

_{representation}

_equations

_augmented

_{implementation;}

₂₎

_{Cholesky factors,}

_{automatically ensuring}

_gradients

_large

_large

_{possibly incomplete data, using}

_gradients

_equations

_relationship

_{breeding problems}

_{implementation,}

_example

_equations

_taking

_{© Inra/Elsevier,}

_/

_/

_/

_l’approche

_(REML)

_génétique

_{caractéristiques}

₁₎

_équations

_{augmentée qui simplifie}

_{reparamétrage}

_explicites

_gradients

_{systèmes d’équations}

_grande

_grand

_{économique ;}

₄₎

_équations

_parenté.

_génétique

_grande

_parmi

_plus

_©

_{Inra/Elsevier,}

_/

_/

_prediction

_genetic

_[25]

_requires

_practical

_situations,

_usually

_During

_emerged

_breeding

_component

_[15-17,

_34-36].

_expectation

_{(EM) algorithm}

_[6]

_objective