A reparameterization to improve numerical optimization in multivariate REML (co)variance component estimation

(1)

HAL Id: hal-00894050

https://hal.archives-ouvertes.fr/hal-00894050

Submitted on 1 Jan 1994

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

A reparameterization to improve numerical optimization

in multivariate REML (co)variance component

estimation

E Groeneveld

To cite this version:

(2)

Original

article

A

_{reparameterization}

to

_improve

numerical

_optimization

in

multivariate

REML

_(co)variance

_component

estimation

E

Groeneveld

Institute

_of

Animal

_Husbandry

and Animal

_Behaviour,

Federal

_Agricultural

Research

Center,

Höltystr

10, 31535

_{Neustadt, Germany}

(Received

28

_February

_1994;

_accepted

15 June

₁₉₉₄₎

Summary - Multivariate restricted maximum likelihood

_(REML)

_(co)variance

_component

estimation

_using

numerical

optimization

on the basis of

_{Downhill-Simplex}

_(DS)

or

quasi-Newton

_(QN)

_procedures

suffers from the

_problem

of undefined ’covariance matrices’ as are

_{produced by}

the

_optimizers.

So

far,

this

_problem

has been dealt with

_{by assigning}

’bad’ function values. For this

procedure

to

work,

it is

_implied

that the information this ’bad’ function value conveys is sufficient to avoid

_going

in the same direction in the

_following

optimization

step. To a limited

_degree

DS can _cope_{with this situation. On the other hand}

QN usually

breaks down if this situation occurs too

_frequently.

This contribution

_analyzes

the

_problem

and _proposesa _{reparameterization}of the covariance matrices to solve it. As

a

_result,

faster

_{converging QN optimizers}

can be

_used,

as

_they

no

_longer

suffer from lack of robustness. Four real data sets were

_{analyzed using}

a multivariate model

_estimating

between 17 and 30

_(co)variance

_components

_{simultaneously. Optimizing}

on the

_Cholesky

factor instead of on the

_(co)variance

_componentsthemselves reduced the

_computing

time

by

a factor of 2.5 to more than

_250,

when

_comparing

the robust modified DS

_optimizer

operating

on the

_original

covariance matrices to a

QN

_optimizer

_using

_{reparameterized} covariance matrices.

multivariate REML

_/

optimization

/

quasi-Newton

/

Downhill-Simplex

/

reparame-terization

Résumé - Un _{reparamétrage}_{pour améliorer}_{l’optimisation numérique}dans une

estimation REML multivariate de _composantesde variance-covariance. L’estimation

du ma!imum de vraisemblance restreinte

_(REML)

des _composantesde variance-covariance

à l’aide des

_{procédures numériques d’optimisation Simple!-Descendant}

_(SD)

ou quasi-Newton

_(QN)

se heurte à la

_difficulté

résultant de la

production

de matrices de covariances

non

_{définies. Jusqu’à présent,}

cette

_difficulté

a été résolue en attribuant une vraisemblance

arbitraitement mauvaise à de telles _matrices,de

_façon

à éviter de revenir dans cette

(3)

converge

plus lorsque

cette situation se

_reproduit

_tropsouvent. Cette note propose un

reparamétrage

pour résoudre le

problème.

Il devient ainsi

possible

d’utiliser la

procédure

QN

dont la _{convergence est}

_rapide

et la robustesse assurée.

_{Quatre fichiers}

de données

ont été

_analysés

pour estimer simultanément de 17 à 30 composantes de

_{(co)-variance.}

L’optimisation

du

_facteur

de

_Cholesky

au lieu des _composanteselles-mêmes réduit le _temps de calcul d’un

_facteur

compris entre 2,5 et

_plus

de

_500,

quand

on

compare la procédure

QN

avec

_{reparamétrage}

des matrices de covariance à la

_procédure

SD

_{modifiée appliquée}

aux matrices de covariance

_d’origine.

REML multivariate

_/

_optimisation

_/

_quasi-Newton

_/

_{Simplex-Descendant}

_/

repara-métrage

THE PROBLEM

In restricted maximum likelihood

_(REML)

_(Patterson

and

_Thompson,

_1971),

maximization of the likelihood is done

_using

either the EM

_algorithm

_(Dempster

et

al,

1977)

or

_procedures

that do not

_require

_explicit

derivatives. A

_problem

_specific

to

the latter class of

optimizers

is addressed in this _paper.Graser et al

₍₁₉₈₇₎

_proposed

a

sampling technique

that

spanned

the

complete

_parameter

_spacefor a

_single

trait

analysis.

Meyer

(1989)

used a

_{Dowhnill-Simplex}

_(DS)

and

_quasi-Newton

_(QN)

technique,

Kovac

₍₁₉₉₂₎

modified the DS

procedure

and also

expanded

Powell’s method of

_{conjugate gradients.}

All these authors had to deal with

_problems

_arising

from

_parameters

and

_posed

by optimizers

that lie outside the

_parameter

_space.

_Despite

this

problem,

a host of

REML estimates have been

_reported

with a

_varying

number of simultaneous traits

(Groeneveld,

1991; Meyer, 1991;

Tixier-Boichard et

_{al, 1992;}

Ducos et

_{al, 1993;}

Spilke

et

_{al, 1993;}

Mielenz et

_al,

_1994).

Consider a mixed linear model with one random effect and no

_missing

values as

specified

in

_equations

[1-5].

With a residual and an additive

_genetic

effect the

log

likelihood in

_equation

_[6]

must be maximized. The

_{computational}

_procedure

requires setting

up and

solving

the mixed-model

equations

(MME).

where:

y - vector of observations

(4)

Z - incidence matrix for the animal effect

j3 -

vector of unknown

_parameters

for fixed effects

u - vector of unknown

parameters

for the animal effect

e - vector of residuals

A -

_relationship

matrix of order number of animals and their known ancestors

Ro -

residual

_(co)variance

_amongtraits

Go -

covariance matrix for additive

_genetic

effects _{among traits} - Kronecker

_product

This

_requires

a set of covariance matrices for both the residual and the additive

genetic

components

R

o

,

Go.

Their inverses are used in

_setting

_upthe MME.

After

_setting

_upthe MME the

_system

of

_equations

_maybe solved

_{by Cholesky}

factorization and backward substitution.

where

LV - value

_proportional

to the

_logarithm

of the likelihood function

b° -

solution vector of the MME C

*

- inverse of the coefficient matrix of the MME W -

_(XIZ)

n a

- number of animals

n - number of observations

In

_DS,

a

_complete

vertex is

_computed

before the

_{optimization begins.}

The DS

procedure

used here follows Kovac

_(1992),

and

_is,

_thus,

_verydifferent from the

original

DS as

_proposed

_by

Nelder and Mead

(1965).

_Initially,

it

_{performs frequent}

restarts,

terminating

the iteration at

_increasing

_accuracy.This

_procedure

alleviated the well-known

_problem

of the DS to

_get

stuck at

_{suboptimal points.}

In

_QN,

gradients

are

_{required. They}

_mayeither be

_supplied

in their

_analytical

form or

approximated

using

finite differences

_(Schnabel

et

_al,

_1982).

REML is an iterative

procedure

and so the

R

°

and

Go

matrices have to be valid

for each round when the MME are set up.

Thus,

initially,

valid covariance matrices have to be

_provided

to the

_algorithm.

For a 2-trait model with 1 additive

_genetic

component

the residual and additive

_genetic

covariance matrices

_R

_o

and

_Go

of dimension 2 have to be estimated

_amounting

to an

_optimization

in a 6-dimensional

parameter

space.

However,

in the

_following

iteration,

new sets of

_(co)variance

are

_provided

_by

the

_optimizers.

For both DS and

QN,

the covariance matrices are an unstructured

vector of

_parameters,

in this case a vector of 6 values. The constraints of covariance

matrices,

ie that

_eigenvalues

have to be

positive

(equation

[7]),

are therefore unknown to the

_optimizers.

Given this

_background

it is not

_surprising

that the

outcome of an

_optimization

_step,

ie a new set of

_{(co)variances,}

is not

_guaranteed

to meet the

_requirements

of covariance matrices. In

practical

terms,

this means that

the determinant of the coefficient matrix

_generated

on the basis of these covariance

matrices will become less than zero, thus

_aborting

the whole process, because the

log

of a

_negative

number cannot be taken.

(5)

Groeneveld,

1994).

Furthermore,

when the true correlation between the traits is

high,

the ’covariance matrices’

_{proposed by}

the

_optimizers

are more

likely

to lie outside the

parameter

space, and the true covariance matrices to be located close

to the

_edge

of the

_parameter

_space.

An obvious

_(at

least in the context of

_DS)

solution to the treatment of undefined covariance matrices is to

_assign

a ’bad’ likelihood

value,

should

negative

eigenvalues

occur. This will tell the

optimizer

(DS)

to avoid this direction in

subsequent

optimization

steps.

While this

_procedure

may work

reasonably

well with

sampling-based

_{optimization algorithms,}

it

_{produces major problems}

with

_QN,

which

_requires

the function to be continuous and differentiable. If the condition occurs

_during

the proces of

approximating

the

gradients

by

finite

differencing,

assigning

a ’bad’

likelihood will result in a nonsensical

gradient.

When used

during

the

following

optimization

step

obviously

likewise nonsensical directions are chosen. In

short,

assigning

’bad’ likelihood values when undefined covariance matrices occur will

often result in

_aborting

the

_QN

_optimization

_step.

We can thus observe that the

optimizers

that have super linear convergence

properties

(and

are thus much faster than

sampling-based

procedures

like DS

(Dennis

and

Torczon,

1991))

fail

_increasingly

as the

dimensionality

of the

problem

increases.

_Thus,

the

problem

of undefined

_parameters

_{arising during optimization}

leads to the

_paradoxical

situation that an efficient class of

optimizers

can

only

be

used with confidence on small

_problems

with 1 or 2

_traits,

where

computing

time is not

_important

and

efficiency

of

_optimization

not an

_issue,

whereas inefficient

sampling-based

optimizers,

which are at best

_only

linearly converging,

must be used on

larger

problems.

A SOLUTION

Part of the

_problem

can be solved

_by

performing

a constrained

optimization.

This is

relatively

easy for the variances in which the constraint is

only

that

positive

numbers

be chosen.

_However,

no

technique

seems to be available to

_impose

a set of constraints such that no

_negative

_{eigenvalues A}

occur for a subset of the

dimensionality

of the

optimization

space as

_given

in

equation

(7!.

Instead,

we _proposeto

_perform

the

_optimization

on the

Cholesky

factor of the covariance matrices. Let:

Thus,

instead of

optimizing

on the

R

o

,

its

_Cholesky

factor

_C

_r

is used

applying

the same

_operation

to all covariance matrices in the model.

_{Operationally,}

this

implies

the

_following

steps:

1)

User

_supplies

initial covariance matrices.

2)

The

Cholesky

factorization is

_performed

on all covariance matrices.

(6)

4)

The function SMME is called

_by

the

_optimizer.

This function sets _upand solves MME and

_computes

the likelihood value

_passing

the

_Cholesky

factor as

parameters.

5)

SMME

_computes

the

_original

_R

_o

and

_Go

from the

_factors,

sets _upand solves the MME and

_computes

the likelihood value.

6)

Control refers back to the

_optimizer

_{(Step 4)}

to have it decide on the next

_step

based on the last LV and the current set of factors.

This process results in a matrix that

_always

has the

_properties

of a covariance

matrix, irrespective

of the values that the

_optimizers

_maycome _upwith. This is

because:

As a

result,

undefined ’covariance’ matrices cannot occur.

A

special

case arises when certain covariances are not estimable. This _may

hap-pen with residual covariances when measurements on different traits are

_mutually

exclusive,

a situation

_frequently

_occurring

in

_joint

_analyses

of data from 2 test

en-vironments.

_During

_{optimization,}

the non-defined

_component

is

_{skipped, thereby}

reducing

its

_{dimensionality.}

The current

_{implementation}

of the

_{reparameterization}

computes

the

_Cholesky

factor on the basis of the

_complete

covariance matrix with zeros inserted for the undefined

_parameters.

_Although

the values of the

_Cholesky

factor

_depend

on the zero inserted for undefined

_{convariances,}

the same

_optimum

has

_always

been reached for

_optimization

on the

_{reparameterized}

and on the

_original

scale.

RESULTS

Four mutivariate runs are

_given

to assess the effect of

_optimizing

on the

_Cholesky

factors. The

_timings

listed refer to a Hewlett Packard 7100

_{computer system}

with the 99 MHz processor.

Run 1 was an

_analysis

of a selection

_experiment

in chickens with 5

_traits,

2 fixed

effects,

year and

barn,

and the animal

component.

As no traits were

_missing

a

canonical transformation was

_performed,

which reduced the number of numerical

operations

dramatically.

Thirty

components

must be estimated

_{simultaneously.}

The data set was

kindly

_supplied

_by

C

_Hagger

_(Swiss

Federal Institute of

_Technology).

Run 2 was an

_analysis

of 3 meat

_quality

traits on around 2 000

_pigs.

Not all records were

_complete.

There were 6 class effects and 1 covariable in the

_model,

which was identical for all traits. Random

components

were common litter and

animal, resulting

in 18

_components

to be estimated

_(Dietl

et

_al,

_1993).

Run 3 was an

analysis

of 4 048 station test records from swine

_comprising

the 4 traits

_daily

_gain,

feed conversion

_efficiency,

valuable

cuts,

and a meat

_quality

parameter.

There were 2

covariables,

4 fixed class

_effects,

1 random litter

_component

and the random correlated animal effect. The covariable

_weight

at the end of test

(7)

the field or in test

_stations,

but not in both

_{environments,}

the

_{corresponding}

residual covariance

_component

was not estimable. This results in 17 variances and

(co)variances

to be estimated. All 4 models included the full

relationship

among

animals. A summary

description

of the runs is

_given

table I.

Table II shows results from DS and

_QN

_using

this

_procedure.

The number of

function evaluations declines

_{substantially}

for both.

The

_general

_picture

shows a much smaller number of function evaluations for the

QN optimizer

compared

with DS. This is to be

_expected

as

QN

_optimizers

have a

super linear rate of _{convergence, ie}the number of iterations decreases for a fixed

increase _{in accuracy}as _convergenceis

_{approached. However, QN optimizers}

aborted in 3 out of the 4 models when

_optimization

was done on the

_original

(co)variance

matrices

attesting

to the

_problems

of undefined

_parameters

outlined above. With

_optimization

on the

_components

of the

_(co)variance

_matrices,

_only

the

modified DS succeeded in

_locating

the

_optimum

_(DS

column in table

_II).

The

costs,

however,

were

_large,

_particularly

for run 1

_(at

3.7 s _perfunction

evaluation).

This

_converged

at the number of rounds indicated with an _accuracyof

10-

6

on the distance between the worst and best

_parameter

set in the vertex, but had still not

_quite

reached the

_{optimum. Interestingly,}

_apart

from the 8 557

_illegal

points

that were

_produced

_by

the DS

_optimizer,

a loss of rank of the coefficient

system

occurred 23 120 times. This condition is encountered

_during

factorization when a zero

_pivot

is detected. In this _case,factorization is aborted

_{and, again,}

(8)

are

_partly

due to

_badly

conditioned covariance

_matrices,

that

_just

about _passthe

test for

_positive

_eigenvalues,

but still do not render the coefficient matrix

_positive

definite. It is thus an effect of limited _accuracyon

_digital

_computers

for these 2 tests.

_Obviously,

this

_phenomenon

also

_produced

a

_large

amount of directional

misinformation, wasting

a substantial amount of CPU time.

Runs 1-3 had _{to cope}with a

_large

number of undefined

_parameters

_nearly

1

undefined for the 3 or 4 that were within the

_parameter

_space,

_resulting

in a

substantial amount of

_conflicting

directional information. The situation was better

in run

_4,

where DS

_only

left the

_parameter

_space13 times.

QN

aborted in the first 3 runs after a

_varying

number of function evaluations because of a discontinuous function surface introduced

_by

the ’bad’ function values

given

to undefined

_points.

_Only

run 4 was

completed successfully,

_despite

the occurrence of 96

_illegal

_points.

With

_optimization

on the

_Cholesky

factor of the

_(co)variance

matrices the

_picture

changes drastically.

The extent to which the DS

_optimizer

benefited was

_dependent

on the number of

_illegal

_points

_prior

to

_{reparameterization.}

_Accordingly,

run 1

finished in less than a 20th of the time

_converging

to the best

_point,

while run 2

and 3 finished twice as fast.

_Only

run 4 did not

_benefit,

which was to be

_expected.

In

_fact,

the

_{reparameterization}

resulted in more function evaluations. Whether this is a chance result or an indication of a more

_general

_pattern,

cannot be

_conclusively

decided at this

_{point. However, experience}

from a

_large

number of other runs has

shown a substantial

_variability

in the number of function evaluation till convergence

for

_seemingly

identical models in terms of number of

_parameters.

It is therefore assumed that the observed slow down is more

_likely

a chance result than a

_general

phenomenon.

QN

found the solution in all runs.

_Depending

on the number of

_illegal

_points

encountered

_before,

the

_speed

_upwas substantial. This was

_computed

as the ratio

of the number of function evaluations of

_QN

with

_Cholesky

factor versus DS on the

original

scale. If

_QN

and DS can

_only

be used on the basis of the

_original

covariance

matrices,

only

DS will

_reliably

_give

results.

_Thus,

this is considered as the reference

point.

Run 1 was

_particularly

_impressive:

with DS

_operating

on the

_original

covariance

matrices, optimization

was

_basically

_{impracticable}

as

_computing

time was at around 3 weeks or more of CPU time

_{prohibitively high}

_(and

the best

_point

was still not

quite

reached).

QN,

on the other

_hand,

solved the

_problem

in less than 3 h. The other 3 runs showed a

_speed-up

factor of between 2.5 and 15.7.

CONCLUSIONS

Multivariate REML estimates for

_general

statistical models suffer from

_high

com-putational

demands and the rather low

_{dimensionality}

in terms of number of traits that

_they

could handle. In most _cases,

_only

bivariate

_analyses

could be done with

general

models

_{producing suboptimal}

covariance matrices of

_higher

order

_(Spilke

and

_Groeneveld,

_1994).

The indicated

_improvement

in

_speed

_by

_optimizing

on the

Cholesky

factor of the covariance matrices will have a 2-fold effect:

_firstly,

it will

(9)

much more efficient

_QN

_algorithms,

which will

_considerably

extend the _scopeof

multivariate REML

_(co)variance

component

estimation. Most

_importantly,

it will allow users to increase the

_{dimensionality}

of the models that can be

handled,

thus

helping

close the gap between the number of traits used in

_genetic

evaluation and

(co)variance

component

estimation.

This

_analysis

was done with the VCE _program,a

multivariate,

multimodel

REML

_(co)variance

_component

estimation

_package

_{(Groeneveld, 1994),}

which is

available for research purposes on the _anonymous

ftp

server under the Internet

number 192.103.38.1.

ACKNOWLEDGMENT

The valuable

_suggestions

of the reviewers are

_{gratefully acknowledged.}

One reviewer

referred to a _paper

_by

Lindstrom and Bates

₍₁₉₈₈₎

who also

_{suggested reparameterization}

of covariance matrices in mixed models for

_{repeated-measures}

data. Their conclusion that

&dquo;reparameterization

is

_key

to

_ensuring

_{consistent convergence}of the

Newton-Raphson

algorithm&dquo;

for this class of models agrees with our

_findings.

REFERENCES

Dempster APN,

Laird

_NM,

Rubin DB

₍₁₉₇₇₎

Maximum likelihood from

_incomplete

data

via the EM

_algorithm.

J

R Stat Soc Ser B 1-38

Dennis

_JE,

Torczon V

₍₁₉₉₁₎

Direct search methods on

parallel

machines. SIAM J Sci

Comput 1, 448

Dietl

_G,

Groeneveld

E,

Fiedler I

₍₁₉₉₃₎

Genetic parameters of muscle structure traits in

_pigs.

In:

44th

Annual

_{Meeting of}

the

European

Associatiore

_for

Animal

Production,

Aarhus,

Denmark

Ducos

_A,

Bidanel

J, Ducrocq

V, Boichard

_D,

Groeneveld E

₍₁₉₉₃₎

Multivariate restricted maximum likelihood estimation of

_genetic

parameters for

_growth,

carcass and meat

quality

traits in French

Large

White and French Landrace

Pigs.

Genet Sel

Evol 25,

475

Graser

_HU,

Smith

_SP,

Tier B

₍₁₉₈₇₎

A derivative-free

_approach

for

_estimating

variance

components in animal models

_by

restricted maximum likelihood. J Anim Sci 64, 1362

Groeneveld E

₍₁₉₉₁₎

Simultaneous REML estimation of 60 covariance _componentsin

an animal model with

_missing

values

_using

a

_{Downhill-Simplex algorithm.}

In:

_42nd

Annual

_{Meeting of}

the

European

Association

_for

Animal

Prod!ctio!c, Berlin, Germany

Groeneveld E

₍₁₉₉₄₎

REML VCE - a multivariate multimodel restricted maximum

likelihood

_(co)variance

_componentestimation

_package.

In:

_{Proceedings of}

an EC

Symposium

on

Application of

Mixed Linear Models in the Prediction

of

Genetic Merit

in

_Pigs

_(E

_Groeneveld,

_{ed) (in press)}

Kovac M

₍₁₉₉₂₎

Derivative-free methods in covariance component estimation. Ph D

thesis,

University

of Illinois at

_{Urbana-Champaign,}

USA

Lindstrom

_MJ,

Bates DM

₍₁₉₈₈₎

_{Newton-Raphson}

and EM

_algorithms

for linear mixed-effects models for

_{repeated-measures}

data. J Amer Stat Assoc 83, 1014

Meyer

K

₍₁₉₈₉₎

Restricted maximum likelihood to estimate variance components for animal models with several random effects

_using

a

_{derivative-free algorithm.}

Genet Sel Evol 21, 317

(10)

Mielenz

_N,

Groeneveld

_E,

Muller

_{J, Spilke}

J

₍₁₉₉₄₎

Simultaneous estimation of covariances with REML and Henderson 3 in a selected chicken

population.

Br Poult Sci

(in print)

Nelder

_JA,

Mead R

₍₁₉₆₅₎

A

_simplex

method for function minimization.

_Computer

J

_7,

308

Patterson

_HD,

_Thompson

R

₍₁₉₇₁₎

Recovery

of inter-block information when block sizes

are

_unequal.

Biometrika 58, 545

Schnabel

R,

Koontz

_J,

Weiss B

₍₁₉₈₂₎

A Modular

_System

of

_Algorithms

for Unconstrained Minimization. Technical

_{Report CU-CS-240-82, Comp}

Sci

_Dept,

Univ

Colorado,

Boulder, CC,

USA

Spilke J,

Groeneveld E

₍₁₉₉₄₎

_Comparison

of four multivariate REML

_(co)variance

component estimation

_packages.

In: 5th World

_Congress

on Genetics

Applied

to

Livestock

Production, Guelph,

vol 22, 11-14

Spilke J,

Groeneveld

_E,

Mielenz N

₍₁₉₉₃₎

Monte-Carlo simulation in

variance-covariance-component estimation in mixed linear

_multiple

trait models. Arch Anim Breed 6, 679 Tixier-Boichard

M.,

Boichard

D,

Bordas

A,

Groeneveld E

₍₁₉₉₂₎

Genetic parameters

of residual feed intake in adult males and females from a

_layer

Rhode Island Red