HAL Id: hal-00893959
https://hal.archives-ouvertes.fr/hal-00893959
Submitted on 1 Jan 1992
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Computing inbreeding coefficients in large populations
The Meuwissen, Z Luo
To cite this version:
Original
article
Computing
inbreeding
coefficients
in
large populations
THE Meuwissen
Z Luo
Institute
of
AnimalPhysiology
and GeneticsResearch,
Edinburgh
ResearchStation,
Roslin EH259PS,
UK(Received
21 October 1991;accepted
15April
1992)
Summary - An
algorithm
forcomputing inbreeding
coefficients inlarge populations
ispresented.
It isespecially
useful inlarge populations
because of the small size of thememory
required,
which is linear withpopulation
size, and itsspeed,
if the number ofgenerations
involved is not toolarge,
ie notlarger
than about 12. The method iscompared
with 2 other methods for
computational speed
and memory requirement. Thepresented
algorithm
is suited for situations where theinbreeding
coefficients for a few new animals are to becomputed given
that their ancestor’sinbreeding
coefficients were calculatedpreviously.
inbreeding coefficient
/
algorithmRésumé - Le calcul des coefficients de consanguinité dans de grandes populations.
On
présente
unalgorithme
de calcul descoefficients
deconsanguinité
pour degrandes
populations.
Il estparticulièrement adapté
à cespopulations
à cause dufaible
volume demémoire d’ordinateur requis
(il
dépend
linéairement de la taille de lapopulation)
et de la rapidité de calculquand
le nombre degénérations impliquées
n’est pas trop élevé(c’est-à-dire
inférieur
à environ12).
La méthode estcomparée
à 2 autres méthodes du point devue de la vitesse de calcul et de la mémoire
requise.
Leprésent algorithme
estégalement
adapté
aux situations de mise àjour
où lescoefficients
deconsanguinité correspondant
à un
faible
nombre d’animaux nouveaux doit être calculé en tirant parti descoefficients
calculés pour les anciens animaux.
coefficient de consanguinité
/
algorithme*
On leave from the Schoonoord Research Institute for Animal
Production,
PO Box501,
INTRODUCTION
Many
countries areimplementing
or haveimplemented
evaluation methods forna-tional herds based upon animal models. If these account for
inbreeding,
calculation ofinbreeding
coefficients inlarge populations
isrequired.
Henderson(1976)
andQuaas
(1976)
implicitly
present
methods for the calculation ofinbreeding
coeffi-cients,
whenthey
proposealgorithms
for the calculation of the inverse of thead-ditive
relationship
matrix. Henderson’s methodrequires
storage
of alarge
matrix.Quaas
avoids this: memoryrequirement
is linear with N andcomputation
time isproportional
toN
2
,
where N is the size of the data set. InQuaas’
methodcom-puter
time becomes alimiting
factor withincreasing
N. With thisalgorithm,
nouse is made of known
inbreeding
coefficients,
and hence all calculations have to berepeated
whenever coefficients arerequired
for a new batch of animals. Golden et al(1991)
present
analgorithm
based onQuaas’
methodusing
sparseprogramming
techniques.
Tier
(1990)
presents
a fast method for the calculation ofinbreeding
coefficients.Using
sparseprogramming
techniques,
hisalgorithm
first determines which ele-ments in the additivegenetic relationship
matrix A arerequired
and then calculatesthem. The memory
required
is a smallproportion
ofNZ:
about0.9%.
Forlarge
pop-ulations,
thephysical
memory of thecomputer
is stilllikely
to be too small and useof disk memory is
required, slowing
thisalgorithm considerably.
Knowninbreeding
coefficients are not recalculated.The aim of this paper is to
present
a method for the calculation ofinbreeding
coefficients in
large populations,
which isfast, requires
memoryproportional
to N and does notrecompute
knowninbreeding
coefficients. The methodpresented
iscompared
to Golden et al’s(1991)
implementation
ofQuaas’
(1976)
method and Tier’s(1990)
method forspeed
and memoryrequired.
METHOD
The method is based on the
decomposition
of the additivegenetic relationship
matrix
A,
as describedby
Henderson(1976):
A =LDL’,
where L is a lowertriangular
matrixcontaining
the fraction of the genes that animals derive fromtheir
ancestors,
and D is adiagonal
matrixcontaining
the withinfamily
additivegenetic
variances of animals. The animals are ordered such thatparents
precede
offspring.
From thedecomposition,
it follows that(Quaas, 197G):
where
A
ii
is the ithdiagonal
element ofA,
whichequals
theinbreeding
coefficient of animal iplus
1.Quaas
(1976)
computes
the elements of Lquickly
and one column at a timeby
tliefollowing
recursivealgorithm:
For
j=
1 to 1V(all
columns ofL)
For
i= j
+
1 to N(all
elements below thediagonal
element)
Lij =
(L!
+L
dd
)/2
when bothparents
si andd
i
of i areknown,
or=
L
kd
/2
whenonly
oneparent
k
i
of i isknown,
or= 0 when both
parents
areunknown,
for i <j.
The elements of D are calculated as:
D!!
= 1 when bothparents
areunknown,
or= 0.75 -
Fkj /4
whenonly
oneparent
k!
of j
isknown,
or= 0.5 -
(Fs!
+F
dJ/
4
when bothparents
Sj andd
j
of j
areknown,
whereF
j
denotes theinbreeding
coefficient of animalj.
Aftercomputing
the elements of thejtli
column ofL, they
aresquared
andmultiplied by
D!!.
Theresulting
vector is added to aworking
vector. When thisprocedure
is followed for every column ofL,
theworking
vector contains theA
jj
-values.
Thisalgorithm requires
N(N+ 1)/2
operations.
Golden et al(1991)
made a list of theoffspring
of all the animals i.Because element
L
ij
is non-zeroonly
if i is a descendantof j,
theoperations
withineach column are
performed only
for the descendantsof j.
Because the elements ofthe columns of L are not stored
they
have to be recalculated when a new batch ofanimals becomes available.
In the
present
algorithm,
the elements of L arecomputed
rowby
row, whichovercomes the
problem
ofrecalculating
elements of L when a new batch of animalshas to be evaluated. A row i of L
gives
the fraction of the genes that animal i derives from its ancestors.Hence,
L
isi
=Lid!
=0.5,
where si andd
i
are the sire and dam ofi,
respectively.
Each row of L can be builtby proceeding
up thepedigree adding
half the &dquo;contribution&dquo; of the current animal to each of its
parents.
The fraction of the genes derived from an ancestor is:where
P
j
is the set of identification numbers of the progenyof j
andANC
i
is the set of identification numbers of the ancestors ofi,
including
animal i itself. The latteridentity
in[2]
is becauseLi!
=0,
if k is not an ancestor of i or notequal
to i. This leads to thefollowing algorithm
forcomputing L,
rowby
row. As each row isdetermined,
the contributions to the elements ofA
ii
are accumulated. Rowi of L and
A;z,
are set to 0 beforestarting.
Thealgorithm keeps
track of a list ofancestors
ANC
I
,
whose contribution toA
ii
hasyet
to be included. If the sire or dam is unknown si = 0 ord
i
=0,
respectively.
delete j
fromANC,
end wliileThe
youngest
animal j
inANC
i
isevaluated,
because all of its progeny in thepedigree
of animal i must have beenevaluated, hence,
itsL
ij
value is known. The kernel of a Fortran program of thealgorithm
isgiven in
theAppendix.
The list of ancestors isrepresented by
a link list. In thecomputer
program time is savedby:
i)
checking
whether one of theparents
isunknown, giving
aninbreeding
coefficient of zero(otherwise,
thealgorithm
would trace thepedigree
of the otherparent) ;
andii)
if thepedigree
is sorted such that full sibs have successive identificationnumbers,
ieare evaluated
successively,
only
the first full sib is evaluated and all full sibs obtain(and have)
the sameinbreeding
coefficient as the evaluated full sib.In order to compare the
algorithms,
Golden et al’s(1991)
and Tier’s(1990)
algorithms
wereprogrammed
in Fortran. Golden et alpresented
theiralgorithm
inC,
whichthey
considered faster.However,
C was not available to the authors. Thesorting
of the list of descendants in the Golden et alalgorithm
wasperformed by
an IMSL routine here
(I1VISL, 1984).
SIMULATION
In order to compare the
algorithms,
the simulation program of Meuwissen and Vander Werf
(1991)
was used togenerate
apedigree
data set. The program simulatedan open
dairy
cattle nucleus scheme for10,
20 or 40 years, withpopulation
sizes of9 267,
15 582 and 28171animals, respectively
(see
tableI).
Selection was for animalmodel BLUP
breeding
values. Nucleus dams and commercial damsproduced
8 and1
offspring, respectively.
The number of nucleus and commercial sires selected was4 and
10,
respectively. Also,
abreeding
program was simulated for 20 years withselection of
only
one commercialsire,
representing
a situation withlarge
half sib families.Inbreeding
coefficients were calculated on a micro-VAX 3800 in batch mode. The maximumphysicsl
memory allocated to the batch queue was about 20 Mb. Becausethis exceeded the memory
required by
any of thealgorithms,
no disk memory wasused. If disk memory were
used,
thecomputational speed
woulddepend
on the sizeThe number of
generations
in the data-set after10,
20 and 40 years werecounted and a
weighted
average per individual wascalculated,
whereweighting
wasaccording
to the contribution of thepath.
Forinstance,
ifonly
the sire of thesire and the sire of an animal were
known,
the average number ofgenerations
was2.1/4
+1 ! 1/4
+0 -1/2
= 0.75. The animal with the maximum average number ofgenerations
and the average number ofgenerations
of the animals born in the lastyear evaluated were estimated.
RESULTS
The results are
presented
in table II.Although
thealgorithm
of Golden et al(1991)
and thatpresented
hererequire
the same number of additions ofL !D!!
terms, the latter consumed less
computer
time. This is because thealgorithm
of Golden et al:i)
makes a list ofoffspring
of eachanimal,
beforestarting evaluations;
ii)
traces descendants instead ofancestors,
which is more difficult because thenumber of progeny of each animal is
unlimited,
whereas the number ofparents
is limited
to 2;
iii)
sorts the list of descendants for everyanimal,
which istime-consuming ;
andiv)
tracing
ofdescendants, sorting them,
and addition ofL? -
D
jj
arenot
simultaneously performed.
The Golden et alalgorithm
requires
more memory,because the list of
offspring
needed and thetracing
of descendantsrequires
memory.However,
the memoryrequired
is still linear with the number of animals N. Tier’s(1990)
algorithm
was faster than thealgorithm presented
here,
when40 years are
evaluated,
ie evaluation of 12.3generations
of animals. When manygenerations
areevaluated,
many animals have common ancestors, whosepedigree
is re-evaluated many times in the
present
algorithm.
Tier’salgorithm
avoids redundant calculationby tracing pedigrees
once. The memoryrequired
is 15.9 lVlbfor 28171 animals. The
presented algorithm required
0.7 NIb is this situation. When about 6 or fewergenerations
areevaluated,
thepresent
algorithm
is fasterthan
Tier’s,
because Tier’salgorithm
first determines which elements of A arerequired
beforecalculating inbreeding
coefficients. Ifonly
the animals born in year40 are
evaluated,
thealgorithm
presented
here and that of Tier(1990)
have about the samespeed
(table II).
In Tier’salgorithm,
therequired
elements of A have to be determined forrelatively
few animals. The timerequired by
thealgorithm
of Goldenet al
(1991)
equals
thatrequired
if all animals areevaluated,
because thealgorithm
re-calculates allinbreeding
coefficients. The last alternative in table II shows that the presence oflarge
half sib families isadvantageous
to Tier’salgorithm.
This isbecause Tier’s
algorithm
traces thepedigree
of the sireonly
once.DISCUSSION AND CONCLUSIONS
The
presented
algorithm
forcomputing
diagonal
elements of A is a sparseimple-mentation of an
algorithm presented by
Mrode andThompson
(1989)
andQuaas
(1989)
for themultiplication
of A with a matrix.Here,
this matrix is theidentity
matrix and
only diagonal
elements are calculated.The
algorithm
combineshigh
computational speed
with low memoryrequire-ment,
if the number ofgenerations
evaluated is notlarger
than,
say 12(table II).
In order to
keep
the calculations within thephysical
memory of the computer,none of the
populations
evaluated in table II werereally large.
If the data setwere much
larger,
Tier’s(1990)
algorithm especially
wouldrequire
disk
memorywhich would decrease its
speed
considerably. However,
the presence oflarge
half sibfamilies favours the use of Tier’s
algorithm
(table II).
If the number ofgenerations
involved
exceeds,
say12,
thealgorithm presented
here becomes slowcompared
toTier’’s,
because common ancestors are traced many times. In order to overcomethis
problem
we maycompute F
i
=1/2AS!d! _
L
L!,kLd:kDkk
and store thek
sire’s row of L
(L
Sik
,
k =1, ... ,
).
s
i
However,
calculating
the dam’s row of L costsapproximately
the same amount ofcomputer
time ascalculating
the individual’srow of
L,
as in!1].
] .
In
practice,
ofteninbreeding
coefficients of many animals in thepopulation
have been calculated.Only
those of a few new born animals are unknown. Thepresented
APPENDIX
Integer
arrays(N
is the numberof
animals in thepopulation):
PED(1:2, l:l!T)
PED(1, i)
= sire identification no of animal iPED(2, i)
= dam identification no of animal iPOINT(1:N)
POINT(i)=
the next oldest ancestor in the link list= 0 if i is the last ancestor
Real
Arrays:
F(0: N)
F(i)
=inbreeding
coefficients of animal i.F(O) =
-1gives
appropriate
withinfamily
variances for unknownparents
by
the formula:.5-.25*(F (PED(1, i)+PED(2,i)).
F(i)
is
initially
setequal
to-1,
which is more accurate thancalculating
F + 1 and thensubtracting
1.L(1:N)
L(j)
= elementij
of matrixL,
if animal i is evaluatedD(1:N)
D(i)
= withinfamily
variance of animal iACKNOWLEDGMENTS
We are indebted to
Thompson
and referees forhelpful suggestions
and comments. The IB!IilkMarketing
Board ofEngland
andWales,
the Meat and Livestock Commission and theMinistry
ofAgriculture,
Fisheries and Food aregratefully acknowledged
for theirsponsorship.
REFERENCES
Golden
BL,
BrinksJS,
Bourdon RM(1991)
Aperformance programmed
method forcomputing
inbreeding
coefficients fromlarge
data sets for use in mixed-modelanalyses.
J AnimSci,
69,
3564-3573Henderson CR
(197G)
Asimple
method forcomputing
the inverse of a numeratorrelationship
matrix used in theprediction
ofbreeding
values. Biometrics32,
69-83IMSL
(1984)
International Mathematical and Statistical Libraries.Houston, TX,
ed 9.2Meuwissen
THE,
van der Werf JHJ(1991)
Effects ofheterogeneous
within-herdvariances on the
efficiency
ordairy
cattlebreeding
schemes.42th
Ann Meet EurAssoc Anim
Prod, Berlin,
Germany
Mrode