HAL Id: hal-00893952
https://hal.archives-ouvertes.fr/hal-00893952
Submitted on 1 Jan 1992
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Solving animal model equations through an approximate incomplete Cholesky decomposition
V Ducrocq
To cite this version:
V Ducrocq. Solving animal model equations through an approximate incomplete Cholesky decompo-
sition. Genetics Selection Evolution, BioMed Central, 1992, 24 (3), pp.193-209. �hal-00893952�
Original article
Solving animal model equations through an approximate incomplete
Cholesky decomposition
V Ducrocq
Institut National de la Recherche
Agronomique,
Station deGénétique Quantitative
et
Appliquée,
F- 78352Jouy-en-Josas Cedex,
France(Received
5September 1991; accepted
3 March1992)
Summary - A
general
strategy is described for thedesign
of more efficientalgorithms
tosolve the
large
linear systems Bs = rarising
in(individual)
animal model evaluations.This strategy, like Gauss-Seidel iteration,
belongs
to thefamily
of&dquo;splitting
methods&dquo;based on the
decomposition
B = B* +(B - B * ), but,
in contrast to othermethods,
ittries to take maximum
advantage
of the knownsparsity
structure of the mixed model coefficient matrix: B* is chosen to be anapproximate incomplete Cholesky
factor of B.The
resulting procedure requires
the solution of 2triangular
systems at each iteration and2
readings
of the data andpedigree
file. Thisapproach
wasapplied
to an animal model evaluation on 15 type traits andmilking
ease score from the French Holstein Association with 955 288 animals and 4 fixedeffects, including
group effect for animals with unknownparents. Its convergence was
compared
with a standard iterativeprocedure.
genetic evaluation
/
animalmodel / computing
algorithm/
type traitRésumé - Résolution des équations du modèle animal à l’aide d’une décomposition de Cholesky incomplète et approchée. Une
stratégie générale
est décrite pour l’obtentiond’algorithmes plus efficaces
dans le but de résoudre lesgrands systèmes
linéaires Bs =r,
caractéristiques
des évaluations de type «modèle animal». Cettestratégie,
comme l’itération deGauss-Seidel, appartient
àla famille
des « méthodes d’éclatement» basées surla
décomposition
B =(B-B ), * + * B
mais contrairement aux autresméthodes,
elle tente demettre à
profit
autant quepossible
la structure creuse(connue)
de la matrice descoefficients
des
équations
du modèle mi!te: B* est priseégale
à une approximation de ladécomposition
de
Cholesky incomplète
de B. Laprocédure
résultante nécessite la résolution de 2systèmes triangulaires
àchaque
itération et 2 lectures dufichier
de données et degénéalogie.
Cetteapproche
a étéappliquée
à une évaluation de type « modèle animal», sur 15 caractères demorphologie
et une note defacilité
de traite provenant de l’Unité pour la Promotion de larace
Prim’Holstein,
concernant 955288 animauxet 4 e f fets f i ±>es,
ycôinpris
uneffet
groupe pour les animaux issus de parents inconnus. Sa vitesse’de-converge!.ce
a étécomparée
àune
approche
itérative courante. .. - ..évaluation génétique
/
modèle animal / algorithme de calcul/
caractère de mor- phologieINTRODUCTION
In most
developed
countries and for allmajor
domesticspecies, joint genetic
evaluations of up to a few millions of animals isroutinely computed using
BestLinear Unbiased Prediction
(BLUP).
When an individual animal model isused,
this supposes the solution of a linear
system
whose size islarger
than the total number of animals to evaluate. Such atask, repeated
a few times a year and for several traits of economicimportance
can be a realchallenge,
even on the mostadvanced
computers.
To reduce thiscomputational burden, algebraic manipulations
of the
equations
have beenproposed
with 2 main aims:1),
a decrease of thesystem
sizeby absorption
of some effects(eg
thepermanent
environmenteffects) by
useof an
equivalent
model(eg
the reduced animal model ofQuaas
andPollak, 1980)
or
by
use ofparticular
transformations(eg
the canonical transformation which reduces multitrait evaluations to sets ofsingle
traitanalyses (Thompson, 1977;
Arnason, 1982, 1986); 2),
an increase in thesparsity
of the coefficient matrix(eg Quaas’
transformation which makespossible
the estimation of group effects for unknownparents only,
with no more difficulties than ifthey
wereparents
(Quaas
andPollak, 1981; Quaas, 1988)
or theQP
transformationwhich,
when traitsare
sequentially available,
makes the coefficient matrix in amultiple
traitanalysis
much sparser,
(Pollak, 1984). Unfortunately,
such tools are notalways applicable
either because
they
are restricted tospecial
data structure or becausethey
arenot
always computationally advantageous.
Slow convergence is not a realproblem
for moderate-size researchapplications,
for whichgeneral
purpose programs are available(eg
Groeneved etal, 1990).
But it can make routine evaluationsprohibitive
in the case of
large-scale applications.
In any case, with or without
algebraic manipulation,
the linearsystem
isvirtually always
toolarge
to be solveddirectly
and an iterative solution has to beperformed.
The
algorithms
chosen have been in most cases verysimple
andinitially designed
for the solution of
general problems,
without anyparticular
attention to thetype
ofproblems
animal breeders aredealing
with. The mostfrequently
used ones arethe
Gauss-Seidel,
Successive Overrelaxation and Jacobi iterations(Golub
and VanLoan, 1983). Unfortunately,
for animalmodels,
these can beextremely
slow toconverge
(requiring
up to several hundreds ofiterations) especially
when groups of unknownparents (Quaas, 1988)
areadded,
or when more than one fixed effect is considered(Ducrocq
etal, 1990).
An
important breakthrough
in the search for an efficient solution of animal models was thediscovery
of thepossibility
to &dquo;iterate ondata&dquo;,
astrategy proposed by
Schaeffer andKennedy (1986)
which avoids the actualcomputation
andstorage
of the whole coefficient matrix. This can be considered as one of the first uses of the known nonzero structure of the mixed modelequations
indesigning
more efficientalgorithms.
Indeed,
a careful look at this structure gave birth to newapproaches
for a fastsolution of some
parts
of theseequations,
in a block iterative context:Poivey (1986)
showed that
by considering
in the inverseA- 1
of therelationship matrix, only
the
diagonal
and the termsrelating
an animal to itsparent
of agiven
sex andby correcting accordingly
theright-hand
side for the other terms ofA- 1 using
solutionsfrom the
previous iteration,
theresulting system
has a verysimple
and sparseCholesky decomposition
and can be solveddirectly. Likewise, Ducrocq
et al(1990) proposed
for the solution of the additivegenetic
valuepart
of the mixed modelequations
the use of 2decompositions
of thetype
describedby Poivey (1986) - considering
first therelationship
between an animal and its dam and second betweenan animal and its sire.
They
alsoproposed
to absorb theequations
foradditive genetic
values into theequations
for fixed effects after correction of theright-hand
side for all
off-diagonal
elements ofA- 1 (&dquo;pseudo-absorption&dquo;). Convergence
wasimproved,
but not as much asmight
be desired forhuge
routineapplications.
Themain drawback to such an
approach
is the rather tediousprogramming.
This paper
presents
a moregeneral procedure
for thedesign
of newalgorithms taking
maximumadvantage
of the knownsparsity
structure of the mixed modelequations.
Anapplication
to alarge
animal model evaluation is described and itsperformance
iscompared
with a standard iterativeprocedure.
MATERIAL AND METHODS
Principles
for a newalgorithm
be the linear
system
to be solved. If B is verylarge, system [1]
can be solveddirectly only
ifB- 1
orC,
theCholesky
factor ofB(C
=CC’,
C lowertriangular)
is sparse and easy to obtain. If this is not thecase,
considerB * ,
a matrix &dquo;close to&dquo; B and whose inverse orCholesky
factor is sparse andeasily computed
and write[1]
as:Then,
thefollowing
functional iterativeprocedure
can beimplemented.
Atiteration
(k
+1),
solve:Expressions [3]
is verygeneral
and is thebase
of afamily
of iterativealgorithms
known as
Splitting
Methods(Coleman, 1984).
If B* issimply
adiagonal
matrixwhose
diagonal
elements are those ofB, [3]
is the Jacobi iteration. If B* is the lowertriangular part
ofB, including
thediagonal, [3]
is the Gauss-Seidel iteration(Golub
and VanLoan, 1983).
If the
starting
value for s is taken to bes(°)
=0, expression [3]
leads to:ie,
theright-hand
side in[4]
isupdated
at each iteration. An even moregeneral
it-erative
algorithm
is obtainedby mimicking
the method of successive overrelaxation(Golub
and VanLoan, 1983)
where cv is a relaxation
parameter,
whichcorresponds
to thesplitting
of B:The next
paragraph
will illustrate how B* can be chosen in aparticular
situation.Consider the
following
animal model:where: y is a vector of
observations;
b is a vector of fixed
effects;
a
o is an na-vector of additive
genetic
values for all animals with and withoutrecords;
e is a vector of
normally
distributed residuals withE(e)
=0;
X and
Z o
are incidence matrices.Assume
E(a o )
=Qg
where g is anng-vector
of groupeffects,
definedonly
for theanimals with unknown
parents. Q
is a matrixrelating
animals in ao with groups(Westell, 1984; Robinson, 1986; Quaas, 1988).
The mixed model
equations
used tocompute
Best Linear Unbiased Estimates of b and g and BLUP’s of ao can be written(Quaas, 1988):
and, accordingly,
Z =[Z o 0]
Then
[7]
is:If model
[6]
includesonly
one fixed effect(which,
forclarity,
will be called aherd-year effect),
X’X isdiagonal
with hthdiagonal
elementequal
to the numberof observations in
herd-year
h. There is at most one nonzero element perrow j
ofZ’X (or
column ofX’Z).
This nonzero element is in column h and isalways equal
to 1 when
animal j has
its record inherd-year
h. Z’Z isdiagonal
withdiagonal
element
equal
to 1 for rowscorresponding
to animals with one record and 0 for animals without record andgroups. Finally,
ifequations
are ordered in such a way that progenyprecede parents,
and such thatparents precede
groups in a, A* is of the form A* =Ln- 1 L’ (Quaas, 1988)
where L is a(n a
+ng)
x nd matrix with 3non zero elements per column. If
j s and j d represent
the indices of the sire(or
thesire’s
group)
and the dam(or
the dam’sgroup)
of animalj, column j has
a 1 inrow j (=
thejth diagonal
element ofL)
and a - 0.5 inrows j s
andjd ! n- 1
is a(n
a
xn a ) diagonal
matrixwith jth
elementequal to 8 j
=4/(m j
+2)
where !rt! is the number of knownparents
ofj(rri!
=0,
1 or2).
Given this rather
simple
structure forB,
the coefficient matrix in[81,
thefollowing
choice of B* in[5]
issuggested:
take B* =T * T * ’,
where T* is theincomplete Cholesky
factor ofB, ie,
the matrix obtainedby setting t ij
to zero inthe
Cholesky
factorization of B each time thecorresponding
elementb2!
in B is zero(Golub
and VanLoan, 1983,
p376). Equivalently,
B* = TDT’ where T ={t ij }
is lower
triangular
with unitdiagonal
elements and D isdiagonal
withpositive diagonal
elementsd!,
and T and D arecomputed using
thealgorithm
sketched infigure
1. The TDT’ factorization has animportant advantage
over the standardCholesky
factorization: it does notrequire computation
of square roots.’
A few remarks need to be made at this
stage. First,
it is known that in thegeneral
case, theincomplete Cholesky
factorization of apositive
definite matrix isnot
always possible (negative
numbers appear in Die,
thecomputation
ofdiagonal
elements in T*
requires
square roots ofnegative numbers).
It will be shown that this is never the case here.Secoild,
the coefficient matrix in[8]
can be rewritten as:and it
clearly
appears that column hcorresponding
toherd-year
effect h has nh + 1 nonzero elements: a 1 on the
diagonal,
and1/n h
on each of the nh rowscorresponding
to animals with records inherd-year
h.Hence,
theincomplete
TDT’factorization can be
applied
to the lowerright part
of theproduct
in!9!, ie,
onQPQ’ = (Z’Z + oA* ) - Z’X(X’X) -1 X’Z,
which is also the lowerright part
of[8]
after
absorption
of the fixed effectequations.
Third,
a strictapplication
of thealgorithm
infigure
1 would lead to nonzeroelements
relating
mates, as in A*. Given theparticular
structure of A* =LA-’L’,
we would like to have these elements
being
0too,
such that the lowerright part
of T has the same nonzero structure as L.However,
L is a(n a +ng)
x na matrix whereasthe lower
right part
of T issupposed
to be aa (n
+n . )
x(n a
+ng)
squarematrix.
We will assume
(or choose)
that the lowerright part
of Tcorresponding
to groups is dense.Consequently,
TDT’ isonly
anapproximation
of the trueincomplete Cholesky decomposition.
Algorithm
The
computation
of T and D does notrequire
the coefficient matrix B to beexplicitly
set up. For animalj, b jj
in B isequal
to:with xi =
1 - 1 if j has
its record inherd-year hand Xj
= 0 otherwise.nh
Given the structure
imposed
onT,
theonly
elements that are nonzero in columnj
aret a j,
a =j s
or ad = j
wherej s and j d
are the indices of the sire(or
siregroup)
and the dam(or
damgroup)
orj.
Henderson’s rules(Henderson, 1976)
ofconstruction of A*
imply:
-Another consequence of the chosen structure for T is that the
product t.z&dquo;,t!&dquo;,,
infigure
1 isalways
0except
when m is theherd-year
effect where iand j
have theirrecords or m is a progeny of i and
j.
Sinceij t
iscomputed only
for i = a= j s
andi = a =
j d , t jm im t
is nonzeroonly: 1), when j
and itsparent
have their record in the sameherd-year;
and2), when j is
mated to its own sire or dam. Both eventsare
sufficiently
rare to beignored (as
B* is not exact,anyway)
and then:The fact
that j,
orj d
may or may notcorrespond
to a group is irrelevant here.It is also essential to notice that
aj t
need not bestored,
as it iseasily
obtained fromd j .
Replacing tjm by [12]
infigure
1 andusing (10!,
weget:
For columns
corresponding
to groups, wehave,
from the structure of A*:and
ti&dquo;,,t!&dquo;!
infigure
1 is different from 0 each time m is a progeny of groups i andJ.
Therefore, just
beforeundertaking
the dense factorization of G where G is the currentpart corresponding
to groups, we have:(minor adaptations
for termscorresponding
to groups have to be made when the unknown sire and the unknown dam are in the samegroup).
Now, by noting
that:and that
d j
= zj +û8 j
>û8 j
for animals without progeny(all
assumed to havea
record),
it followsby
recurrence thatC 1
- d P J > 0 for all p. Then cp > 0.
B dp )
Therefore,
from[13]
and[15], d j
> 0 forall j
and also gjj > 0: theincomplete
factorization isalways possible.
Equations [12]
and[13]
lead to thepractical algorithm given
infigure
2.Iterative solution
From
(5!, [8]
and(9!,
itappears
that thegeneral
iterativealgorithm
involves at eachiteration the
following steps:
where
fl-hl is
theupdated right-hand
side.ra
Update
rb and ra as:Steps [17]
to[20]
can be condensed in such a way thatonly
2readings
of thedate file are necessary at each iteration.
Indeed, algebraic manipulation
of theseequations
leads to thefollowing requirements:
The
general
sketch of theresulting algorithms
isgiven
inAppendix
1.Dealing
with several fixed effectsIn many
instances,
the routine animal model evaluations involve more than onefixed effect. To
adapt
thealgorithms
to thisfrequent situation,
one candistinguish
between on one
hand,
a vector of fixed effects b with many levels(like
the herd-year-(season)
effects or thecontemporary
group effects in manyapplications)
andon the other
hand,
a vector f of other &dquo;environmental effects&dquo; with fewer levels.Model
[6]
becomes:where X and F are the incidence matrices
corresponding
to effects in b and frespectively.
Theresulting system
of mixed modelequations
can be written:A block iterative
procedure
can beimplemented involving
at each iteration first the solution of:using
Gauss-Seidel iteration andthen,
the solution of:using
thealgorithm
described in this paper.A LARGE-SCALE APPLICATION
Description
A BLUP evaluation based on an animal model was
implemented
to estimate cows’and bulls’
breeding
values for 15 lineartype
traits andmilking
ease score in the French Holstein breed. Records were collectedby
the French Holstein Association between 1986 and 1991 on 4G21G2 first and second lactation cows. The model considered in theanalyses
included an&dquo;age
atcalving (10 classes)
x year(5)
xregion (8)&dquo;
fixedeffect,
a&dquo;stage
of lactation(15 classes)
x year xregion&dquo;
fixedeffect,
a &dquo;herd x round x classifier&dquo;(36420 classes)
fixedeffect
and a randomadditive
genetic
value effect.Tracing
back thepedigree
of recordedanimals,
a totalof 955 288 animals were evaluated.
Sixty-six
groups were definedaccording
to year ofbirth,
sex andcountry
oforigin
of the animals with unknownparents. Here,
bin
[26]
will refer to &dquo;herd x round x classifier&dquo; effects and f to other fixed effects.The block iterative
procedure
described inDealing
with severalfixed effects
wasimplemented, using
a relaxationparameter
cv = 0.9 in allanalyses.
To compare thisprocedure
with a standard iterativealgorithm,
the mixed modelequations
werealso solved
using
a programalong
the lines of Misztal and Gianola(1987),
wherethe
equations
for fixed effects were solvedusing
Gauss-Seidel iterations and theequations
for additivegenetic
values were solved via second-order Jacobi iteration(see
Misztal and Gianola(1987)
fordetails). Group
solutions wereadjusted
toaverage 0 at the end of each
iteration,
asproposed by Wiggans
et al(1988). Indeed,
this constraint had very little effect on convergence
rate,
as the average group effects solutions tended to a value very close to 0 anyway. Several relaxationsparameters
were used for the second-order Jacobi
step.
The same convergence criteria werecomputed
in all cases and intermediate solutions werecompared
to&dquo;quasi-final&dquo;
results.
RESULTS
Figures
3 and 4 illustrate for one of the traits -&dquo;rump
width&dquo;(o-p
=1.4, h 2
=0.25)
- the evolution of 2 convergence criteria: the norm of the
change
in solution between 2 consecutive iterations dividedby
the norm of the current solution vector(both considering
elements in aonly)
and the maximum absolutechange
between 2iterations.
Rump
width was considered as a traitrepresentative
of the average convergence rate of theprocedure
based on theincomplete Cholesky decomposition (hereafter
referred to asICD)
for the 16analyses:
the value of the standardized normof the
change
between 2 iterations obtained for rump width after 40 iterations was reached after 25 iterations for onetrait,
33 to 40 iterations for 10 traits and between 41 and 46 iterations for 5 other traits. For rumpwidth,
200 iterations with ICD and 300 iterations with the standardprocedure (GS-J)
were carried out andcompared
to intermediate solutions. The results are summarized in table I.
Figure
5 showsthe distribution of the
changes by
class of variation between 2 iterations for ICD.Figures
3-5 and table Iclearly
show that convergence was much faster with ICD. Whatever the value of the relaxationparameter used,
the evolution of the convergence criteria in GS-J tends to be very slow after a rather fast declineduring
the first iterations. This
phenomenon
was mentionedby
Misztal et al(1987)
andseems to worsen because of the size of the data set and the existence of 3 fixed effects in the model. The fact that ICD does not exhibit this undesirable characteristic may prove its robustness to not so well conditioned
problems.
For
practical
purposes, 3 exactfigures
may be consideredsatisfactory
for aproper
ranking
of all animals.Starting
from 0 and with no accelerationprocedure
implemented (in
contrast with egDucrocq
etal, 1990),
thisrequirement
for all animals was reached for ICD after about 40 to 45 iterations. Even faster convergence may be achieved whenstarting
from solutions of aprevious
evaluation orby
optimizing
the value of the relaxationparameter
used.Figures
3 and 4 and table Ishow that the same convergence
requirements
are reached in about 150 iterations with GS-J. But the cost of each iteration is not the same: in the case ofICD,
2
copies
of the data(+ pedigree) file,
sorted inopposite direction,
are read at eachiteration,
whereas GS-Jrequires
onereading
of the data file per fixed or random effect included in the model( ie,
a total of 4 per iterationhere)
and onereading
ofthe
pedigree
file. CPU time per iteration withoptimized I/O operations
was 12 s onan IB1VI 3090-17T
computer
for ICD and 24 s for GS-J: in both cases, thelimiting
factor for
computation speed
remains theI/O operations.
Note however that abetter &dquo;standard&dquo;
strategy
could have beendesigned:
forexample,
it ispossible
to treat the first 2 fixed effects
together
in a block-iterative way as in[28]
andsolve for the herd-round-classifier
(HRC)
effect at the same time as the additivegenetic
value effectby reading
a data file sortedby HRC,
asproposed
inWiggans
et al
(1988).
This reduces to 2 the number ofcopies
of the data file read at each iteration. The fact that the first 2 fixed effects are not defined within HRC effectsor vice-versa
prevents
any furtherreduction,
as in the case describedby Wiggans
et al
(1988).
Inconclusion,
ICD appears to be at least 3 to 4 times faster than astandard
procedure
in theparticular
situation studied here.For
ICD,
the maincomputing requirements
in terms of corestorage
can be divided into 2independent parts: 1),
for thecomputation
ofd 7l (fig 2):
a vectorof
length
p, where p is the size of the coefficient matrix in!8! ; 2),
for the iterativesolution
(Appendix 1):
the nonzero blocks of matrices F’F and F’X(otherwise,
2supplementary readings
of the data file are necessary, in[28]
and[29]
and 3 vectorsof
length
p(the right-hand side,
and the current andprevious
vectors ofsolutions).
This can be further reduced
by noting
that theprevious
vector of solutions for ais
required only
tocompute
convergence criteria and that current solutions for aneed not to be stored for
nonparent
animals.DISCUSSION AND CONCLUSIONS
It has been shown that in the context defined here the
approximate incomplete Cholesky
factor of the coefficient matrixalways
exists.However,
this does not guar- antee that B* will be very close to B. Infact,
when alarge
number ofgenerations
are
considered,
the terms from the trueCholesky
factor of B which areignored
in the
incomplete
factorization may becomelarge
and one may wonder whether theproposed algorithm
converges at all.Indeed,
in theapplication presented
herewhere
pedigrees
were traced up to 12generation
back in some cases, thealgorithm diverges
when no relaxationparameter
is used: convergence seemed faster at first but about 300 estimates ofgenetic
value(0.03%)
did not stabilize and led to diver- gencefirst,
of groups of unknownparents
andthen,
of all animals’genetic
values. Incontrast,
for a research run with a smaller data set andtracing pedigrees
back foronly
4generations,
the same standardized convergence criteria as in theexample
described here were obtained afternearly
3 times fewer iterations. This illustrates that thediscrepancy
between B and B* increases with the number ofgenerations
considered. A modification of thealgorithm
in order to make B* closer to B whilekeeping
a similarsparsity
structure should beinvestigated.
Forexample,
mate con-tributions to the inverse of the
relationship
matrix could be nolonger ignored.
However,
this may not be necessary as it is shown inAppendix
2 that therealways
exists a relaxation
parameter w
such that convergence isguaranteed. Quite
unfor-tunately,
as for successive over-relaxation and second order Jacobiprocedures,
theoptimal
w is datadependent.
The model
presented
in[6]
and which led to the form of theincomplete Cholesky
factor described infigure
2 is rather restrictive. Inparticular,
it assumesthe existence of
only
one fixed effect. The sectionDealing
with severalfixed effects
showed a way to treat models with more than one fixed effect. For morecomplex situations,
asimple procedure
would be toapply
theincomplete Cholesky
factorization
only
to the block(Z’Z
+aA * ).
Theremaining part
of thesystem
can be solved
using
a more standardprocedure.
Butthen,
convergence islikely
to be somewhat slowed down. A trade-off is to increase the amount of information available to estimate fixed effects at each iteration
by &dquo;pseudo-absorption&dquo;
ofgenetic
value
equations,
as described inDucrocq
et al(1990).
Whatever the alternativechosen,
it seemsquite apparent
that faster and more efficientalgorithms
froma
computational viewpoint
need to takelarger advantage
of the knownsparsity
structure of the mixed model
equations
thansimpler, general-purpose algorithms.
Of course, the final
practical
choice betweencompeting algorithms
should take into account other considerations such asprogramming
costs andcomputer availability.
Then,
ratherspecialized algorithms
such as the oneproposed
here may be attractiveonly
forhuge
routine evaluations based on animal models.APPENDIX 1
Practical
algorithm
sketch for the solution of thesystem
B*s = rIn the
following section,
indices related to fixed and random effects(eg,
b and ain rb and
r a )
will bedropped
forclarity.
The indicesj, j s , j d
and h will refer toanimal j,
its sire(or
sire’sgroup),
its dam(or
dam’sgroup)
and theherd-year
inwhich it was recorded. n =
{n h }
will be the vector of number of records in eachherd-year .
Define s= ! . Index g will refer to all groups of unknown parents.
a
Then,
the wholealgorithm
for the solution de B*s = r,including steps [21]
to(25!,
can be detailed as follows:
1)
Initialize r, n, s to zero2)
Read the data file9 For
each j
with record: yj, add(wy j )
to rh and rj add 1 to nh3) Compute
vh =r h/ n h
for h =1, ...
rh= rh - wnhsh4)
Initialize vj as vj = r!for j = 1, ...
Read the data file with progeny
preceding parents
For eachj:
6)
Read the data file withparents preceding
progeny For eachj:
8)
Go to(3)
until convergence.Note that at each
iteration,
thecomputation
of Sh, h= 1, ...
iscompleted only
after
reaching
the end of the second data file.Therefore,
theupdate
of r in[25]
- which
requires
sh - would make necessary a thirdreading
of the data file. Thiscan be avoided
by transferring
thispart
of theupdating (-wZ’Xb
in[25]
to thebeginning
of the next iteration. Thisexplains
the term(v j - w Sh )
in(4).
APPENDIX 2
Proof that the
splitting
method based on anincomplete Cholesky decomposition
of the coefficient matrixalways
convergesThe
following
section shows that therealways
exist a value of the relaxationparameter
w such that thegeneral algorithm proposed
in[5]
converges for allstarting
values. Forsplitting methods,
a necessary and sufficient condition for this is - with the notation used in[3]
and[5] -
that p(I — wB * -’B)
< 1 wherep(M)
is the
spectral
radius ofM, ie,
thelargest
absolute value of theeigenvalues
ofM
(Golub
and VanLoan, 1983; Coleman, 1984). Indeed,
ifA i
is aneigenvalue
ofB
* - 1
B,
thenA i
is also aneigenvalue of T T - * BT 1 - *
where B* =T * T * ’. Obviously,
this matrix is
nonnegative definite,
so all itseigenvalues
arenonnegative.
LetÀ 1
be the
largest
one. Theeigenvalues
of the matrix I —wB * - 1 B
are all of the form(I - wA i
)
and if one chooses 0 < w< — Al
then p(I — wB*-1B) !
1.Equality
holdswhen B is
singular,
which is indeed the case when groups of unknownparents
and another fixed effect are consideredsimultaneously.
A constraint on solutions for groups(or
the other fixedeffects)
can be added to avoid thisproblem.
But iswas found that convergence is faster when no constraint is included. As for Gauss- Seidel
iteration,
it seems that there is a built-in constraint in the iterativesystem
preventing problems
fromoccurring.
ACKNOWLEDGMENTS
The author is
grateful
to ATabary,
IBMFrance,
for his excellent advice onoptimizing
the program code. The data and
partial
financial support for thisstudy
wereprovided by
the Union pour la Promotion de la Race
Prim’Holstein, Saint-Sylvain d’Anjou,
France.REFERENCES
Arnason T
(1982)
Prediction ofbreeding
values formultiple
traits in small nonrandom
mating (horse) populations.
ActaAgric
Scand32,
171-176Arnason T
(1986)
Practicalaspects
in the estimation ofbreeding
values for multi- trait selection. World Rev Anim Prod22,
75-84Coleman TE
(1984) Large Sparse
NumericalOptimization.
Lecture Notes in Com-puter Science,
N° 165.Springer-Verlag,
NewYork,
105 pDucrocq V,
BoichardD,
BonaitiB,
BarbatA,
Briend M(1990)
Apseudo-absorption strategy
forsolving
animal modelequations
forlarge
data files. JDairy
Sci73,
1945-1955
Golub
GH,
Van Loan CF(1983)
MatrixComputations.
JohnsHopkins
UnivPress, Baltimore, MD,
476 pGroeneveld
E,
KovacM, Wang
T(1990) PEST,
ageneral
purpose BLUPpackage
formultivariate
prediction
and estimation. Proc 4th WorldCongr
GenetAppl
LivestProd, Edinburgh, NE, July 23-27, 13, 488-491
Henderson CR
(1976)
Asimple
method forcomputing
the inverse of a numeratorrelationship
matrix used inprediction
ofbreeding
values. Biometrics32,
69-83Misztal
I,
Gianola D(1987)
Indirect solution of mixed modelequations.
JDairy
Sci
70,
716-723Misztal
I,
GianolaD,
Schaeffer LR(1987) Extrapolation
and convergence criteria with Jacobi and Gauss-Seidel iteration in animal models. JDairy
Sci70, 2577-2584 Poivey
JP(1986)
Méthodesimplifi6e
de calcul des valeursg6n6tiques
des femelles tenantcompte
de toutes lesparent6s.
Genet SelEvol 18,
321-331Pollak EJ
(1984) Transformation for Systematically Missing
Data to FacilitateMultiple
Trait Evaluation.Mimeograph,
CornellUniversity
Quaas RIr (1988)
Additivegenetic
model with groups andrelationships.
JDairy
Sci
71, 1338-1345
Quaas RL,
Pollak EJ(1980)
Mixed modelmethodology
for farm and ranch beef cattletesting
programs. J Anim Sci51,
1277-1287Quaas RL,
Pollak EJ(1981)
Modifiedequations
for sire models with groups.J
Dairy
Sci64,
1868-1872Robinson GK
(1986) Group
effects andcomputing strategies
for models for esti-mating breeding
values. JDairy
Sci69,
3106-3111Schaeffer
LR, Kennedy
BW(1986) Computing
solutions to mixed modelequations.
Proc 3rd World
Congr
GenetAppl
LivestProd,
UnivNebraska, Lincoln, NE, July
16-22, 12,
382-393 .Thompson
R(1977)
Estimation ofquantitative genetic parameters.
Proc IntQuant
Genet, Ames, Iowa,
639-657Westell RA
(1984)
Simultaneousgenetic
evaluation of sires and cows for alarge population
ofdairy
cattle. PhDdiss,
CornellUniv, Ithaca, NY;
Univ MicrofilmsInc, Ann Arbor,
MIWiggans GR,
MisztalI,
Van Vleck LD(1988)
Animal model evaluation ofAyrshire
milk