HAL Id: hal-00893856
https://hal.archives-ouvertes.fr/hal-00893856
Submitted on 1 Jan 1990
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Computing inbreeding coefficients quickly
B Tier
To cite this version:
Original
article
Computing inbreeding
coefficients
quickly
B Tier
University
of New
England, Animal Genetics and Breeding Unit *Arnaidade,
NSW, Australia(Received
13 June 1988; accepted 12 September1990)
Summary - An algorithm for computing
inbreeding
coefficients(F)
for all animals oflarge
populations
is described. The technique relies on the subset of the numerator relationship matrix(A)
required to compute its diagonal, which contains F. A simpleexample illustrates that this subset is a very small part of A.
inbreeding coefficients / fast algorithm
Résumé - Calcul rapide des coefficients de consanguinité. On présente un algorithme
pour calculer rapidement les
coefficients
deconsanguinité
de tous les anirrcaux, dans unegrande
population. La méthodefait
seulement intervenir le sous-ensemble de la matrice des coefficients de parenté qui est nécessaire au calcul de sa diagonale, qui contient lescoefficients
deconsanguinité.
Un exemple simple illustre que ce sous-ensemble n’estqu’une
petite partie de la matrice des
coefficients
de parenté. L’algorithme estprécisé
sous laforme
d’un
codage symbolique.
coefficients de consanguinité / calcul rapide
INTRODUCTION
Wright’s
(1922)
coefficient ofinbreeding
(F)
describes theprobability
that 2 alleles at any locus are identicalby
descent. Inbredoffspring
result frommating
twoanimals which have one
(or more)
common ancestors. Breeders of livestockperceive
inbreeding
asbeing
deleterious andconsequently
try to avoidmating
close relatives.The
relationship
betweenprospective
mates can becomputed
to determine thedegree
ofinbreeding
of theresulting offspring
from such amating.
For inbred
populations,
inbreeding
coefficients arerequired
to compute A-1directly using
Henderson’s(1975)
rules. Henderson(1976)
showed therelationship
between theinbreeding
coefficients of an animal’s parents and the &dquo;contributions&dquo;*
that each animal’s
pedigree
makes to !4 !. For the ith animal withparents j
andk,
thefollowing
contributions are added to A-1:where
d
ii
=
1.0 -0.25(a!!
+a
kk
).
Two
algorithms
arecommonly
used to computeinbreeding
coefficients.Origi-nally they
were calculatedusing
thepath
coefficient method(Wright, 1922).
Whilethis method is easy to use for
computing
F for animals which have few ancestorsand are
only slightly inbred,
it is a verycomplex
method for animals with many common ancestors. Asimpler approach
is to generate the numeratorrelationship
matrix
(A)
using
the tabular method as attributed to Lushby
Emik and Terrill(1949,
citedby
Hudson etal,
1982).
Inbreeding
coefficients can becomputed
fromthe
diagonal
elements of A : Fi = aii - 1. This paper describes an efficientimple-mentation of the tabular method.
COMPUTING INBREEDING COEFFICIENTS
USING THE TABULAR METHOD
To compute A
by
the tabular method animals in thepopulation
must be numbered from 1 to N so that parentsprecede
their progeny. Thepedigrees
of all animals must be stored in memory. A zero in thepedigree
denotes an unknown parent. Theupper
triangle
of A can becomputed
on a rowby
row basis in consecutive orderworking
from first to last.Diagonal
elements arecomputed using
the formula:The remainder of the row
(to
theright
of thediagonal)
can becomputed by applying
the formula:
where p and q are the parents of the
jth
animal and i <j.
If either parent isunknown
(p
or q =0)
thena oi
= 0. When a row is
complete
thecorresponding
column in the lower
triangle
can becompleted by
symmetry. Because A issymmetric
it is
only
necessary to store the upper(or lower)
triangle.
Table II illustrates A for thesample population
shown in table I.The
storage
required
to compute A isproportional
to the square of the numbersof animals and so limits the size of the
population
for which A can becomputed.
Using
thetechnique
of Hudson et al(1982)
memory isonly required
to store the non-zero elements of A and it can becomputed
forlarger populations.
Wheninbreeding
coefficients
only
arerequired
then it is not necessary to compute Acompletely,
norTHE RECURSIVE PEDIGREE METHOD
This is an
adaptation
of the tabular method anddepends
uponcomputing
the subset of Arequired
to compute itsdiagonal.
As each animal’s
pedigree
isread,
Equation
1 is used toidentify
the elementwhich describes the
relationship
between its parents. If the animal’sinbreeding
coefficient isalready
known and no new ancestral information is available it isstored. If not, then the element
(apq)
isplaced
in the subset(flagged)
to becomputed.
Equation
2 can now beapplied
toidentify
those elementsrequired
to computeelements
already
flagged.
Thisrequires
searching
the matrix(upper triangle)
from bottom to top and fromright
to left. As each identified element is found thetwo elements
required
toapply
equation
2 can also beflagged.
Because these twoelements lie to the left of the
flagged
element - animals are numbered so that parentsprecede
their progeny -searching
A in this manner results in all the elementsrequired
to compute thediagonal
of Abeing
identified.The
flagged
subset can now becomputed
by applying
equations
1 and 2starting
with the first row, and
proceeding sequentially
to the last.Computation
of each rowof A can be considered in 3 parts.
Firstly,
elements on the left of thediagonal
havealready
beencomputed -
they
were on theright
of thediagonal
in earlier rows.Secondly,
theoff-diagonal
element(a
jk
)
required
to compute thediagonal
element(a
ii
1.
Lastly,
theflagged
elements on theright
of thediagonal
can becomputed using
equation
2.Table III illustrates the recursive
pedigree
method for thesample population.
Firstly,
offdiagonal
elements from(1)
are identified andflagged.
Because animals1,
2 and 3 have at least 1 unknown parentthey
are not inbred and 1 is stored in thecorresponding diagonal
element. Thediagonal
elements of animals4, 5,
6 and 7are
represented by
the lettersP, Q,
R and Srespectively.
Theoffdiagonal
elementsrequired
to compute them arerepresented
by
the same letter in lowercase. Otherelements identified
by
t, u, v and w arerequired
to compute p, q, r and s.As the rows are
processed,
elements that must becomputed
before the currentelement can be
computed
are identified andflagged.
a
S6
(s)
is the firstflagged
element to be found. Itrequires
thata
35
(t)
anda
45
(u)
are known. Table IVillustrates the identification and
flagging
of new elements asthey
were foundusing
the search
procedure
described above.COMPUTATIONAL CONSIDERATIONS
To take
advantage
of thesaving
in space offeredby
this method it is necessary toElements in a linked list are not stored in memory in any
particular
order but arelinked
together
in some sequenceby pointers.
There is apointer
to the location in memory of the first element in each row. Each element in the list has an associatedpointer
to the location in memory of the next element in the sequence. Thepointer
associated with the last element in each row
(list)
is 0. To find any element in such a list it is necessary to &dquo;follow&dquo; thepointers
until therequired
element is found. Asnew elements are stored in a list the
pointers
areadjusted
to maintain theintegrity
of the sequence(for
a detailedexplanation
of linked lists seeKnuth,
1968).
To
expedite
thesearching phase,
the elements in each row should be linked inreverse column order
(from
highest
tolowest,
fig la).
After the subset has been identified thepointers
can beadjusted
so that the rows are linked in column order(fig lb).
For thisapplication
it is desirable to have apointer
(called
RECENT in theAppendix)
to the mostrecently
added element as well as apointer
(ROW)
to the first element in each list. As
only
the uppertriangle
isstored,
elements onthe left of the
diagonal
appear in the column above thediagonal.
These elementsmust be added to the lists
holding
the rows above thediagonal.
Because thesenew elements will
generally
beadjacent
to theprevious
addition to that rowusing
this
pointer
(RECENT)
can avoidrepeated searching
through
the lists.Similarly,
repeated searching
for the same new elements(arising
from animalshaving
thesame
parent)
within a row andsearching
the row to theright
of the current elementshould also be avoided.
Diagonal
elements can be stored in a separate vector(DIAG).
This can beused to
point
to thephysical
location in the linked list of the elementrequired
to compute
equation
1 until thediagonal
element is determined. Because the theoretical maximum for adiagonal
element is2.0,
the first 2 elements in the linkedlist must be reserved. Thus a value greater than 2 in the
diagonal
vector is apointer,
one less than 3 is adiagonal
element. As eachdiagonal
element is determined it canbe stored in this vector.
Subsequently,
inbreeding
coefficients can be derived fromthis vector.
An efficient method for
finding
elements on the left of thediagonal
for any row isrequired.
One way ofdoing
this is toadjust
thepointers
so that after the elements inon a column basis in reverse row order
(fig Ic).
Thisrequires
a vector(COLUMN)
which
points
to the mostrecently
added element of each column.SIMULATION STUDY
For this
example
apopulation
with thefollowing
characteristics was simulated.Starting
with a basepopulation
of 20 sires and 500dams,
40 years of progenywere
generated.
The adultpopulation
was held constant at 20 sires and 500 dams.Sires were mated
randomly
to thedams,
all of which had 1 calf. After each yearthe oldest half of the sires were
replaced
and the oldest quarter of the dams wereculled.
Yearlings
(offspring
from theprevious
&dquo;year&dquo;)
wererandomly
selected toreplace
the culled animals. As aresult,
some animals had up to 20generations
ofancestors.
Three sets of
inbreeding
coefficients werecomputed
for theanimals,
viz- after each
group of 10 years
assuming
that noinbreeding
coefficients had beencalculated on this
population
before;
- for each decade
assuming
thatinbreeding
coefficients had been calculated in theprevious
year ieonly inbreeding
coefficients for the latest group of calves were- for
parents
only
when noinbreeding
coefficients were known. A detailedalgo-rithm used to compute
inbreeding
is shown in theAppendix.
Thesecomputations
were carried out on a GOULD NP1 computer and the results from this are shown
in table V.
For the
population
of 20 520animals,
the subset of A included 516 435 elementsout of a total of 421070 400
(0.123%).
Thepopulation
size whichrequired
com-putation
of thelargest
proportion
of A(0.146%)
was10 520. When
inbreeding
coefficients were known for all but the most recent group of
calves,
or wereonly
required
for parents, asignificantly
smallerproportion
of A wasrequired.
DISCUSSION
The results in table V illustrate how very small a subset of A is
required
tocompute its
diagonal
for the simulatedpopulation.
Although
the subset is small as aproportion
ofA,
more than 6Mbytes
of memory wererequired
to store thelargest
subset
(516 435 elements).
Table Vb illustrates thecomputing
resourcesrequired
when
inbreeding
coefficients are known for all but the most recent group of calves.As this
technique
can make use ofpreviously computed inbreeding coefficients,
problems
that are toolarge
for a computer can be divided into a series of smallerproblems.
To avoidrepeated
computation
of the same elements ofA,
subsets shouldnot be chosen on a
chronological
basis but rather on a related group, herd orfamily
basis. The
technique
could bereadily adapted
to compute any subset of A that isof interest.
ACKNOWLEDGMENTS
The financial support of the Australian Meat and Livestock
Corporation
isgrate-fully
acknowledged,
as arehelpful
comments and encouragement .fromcolleagues
and referees.REFERENCES
Henderson CR
(1975)
Rapid
method forcomputing
the inverse of arelationship
matrix. J
Dairy
Sci58,
1727-1730Henderson CR
(1976)
Asimple
method forcomputing
the inverse of a numeratorrelationship
matrix used inprediction
ofbreeding
values. Biometrics32,
69-83Hudson
GFS, Quaas
RL,
Van Vleck LD(1982)
Computer algorithm
for the recursivemethod of
calculating
large
numeratorrelationship
matrices. JDairy
Sci65,
2018-2022 .
Knuth DE
(1968)
The artof Computer Programming.
Tlol 1. FundamentalAlgo-rithms. Addison
Wesley, Reading,
Massachussets. 634 pAPPENDIX
Algorithm
for the recursivepedigree
method. Table VI illustrates the state of thestorage at various stages of the
algorithm.
The
following
variables and subroutine arerequired
for thisalgorithm:
Integer
scalarsN is the number of animals. LLSIZE is a
pointer
to the last element in the linked listat any time
((LLSIZE+1)
isempty).
LARGE is the size of the vectors used to store the lists. LASTEL holds the address of the last elementpassed
to the subroutine.Integer
vectorsSIRE(0:N)
andDAM(O:N)
store parents(SIR.E(0)=DAM(0)=0).
During
thesearch-ing phase,
COLUMN(O:N)
is used tokeep
a record of theflagged
elements in a row;during
thecomputation
phase
it storespointers
to the first element in the columns of elements above thediagonal.
ROW(1:N)
storespointers
to the first elementin each row on the
right
of thediagonal.
RECENT(1:N)
storespointers
to thelocation of the elements
preceding
the mostrecently
added element in each row.NEXT(1:LARGE)
storespointers
to the next element in each row.JAY(1:LARGE)
stores the column
subscripts
of the elements.Real vectors
WORK(1:N)
is used asworkspace
forcomputing
a row.DIAG(1:N)
storespointers
to theapq’s
ofequation
[1]
initially,
andsubsequently
thediagonal
elements whencomputed.
AIJ(1:LARGE)
stores therequired
elements of A. SubroutineRESERVE reserves space in the linked lists for the elements in the subset while
maintaining
theintegrity
of the lists. A check to ensure that there is sufficient space(SUBROUTINE
RESERVE(INI,INJ)
If
(INI=0)
or(INJ*0)
RETURN If(INI=INJ)RETURN
I=MIN(INI,INJ);J=MAX(INI,INJ);
L=RECENT
(I);M=NEXT(L);
IF(JAY(M)<J)
THENL--O;M=R
OW
(
I
)
Endif If M=0 thenLLSIZE=LLSIZE+ l;JA Y(LLSIZE)=J;ROW(I)=LLSIZE;LAS1EL=LLSIZE
Else
While
(M>0)
and(JAY(M)>J)
doRECENT(I)=L;L=M;M=NEXT(M);Endwhile
IfJAY(M)#J
thenLLSIZE=LLSIZE+1;JAY(LLSIZE)=J;LASTEL=LLSIZE
IF L=0 ThenNEXT(LLSIZE)=
ROW
(
I
);
RO
W(I)=LLSIZE
ElseifM!
thenNEXT(L)=LLSIZE
ElseNEXT(LLSIZE)=NEXT(L);NEXT(L)=LLSIZE
Endif Else LASTEL=M Endif EndifENDreserve)
Algorithm
Table VI illustrates the state of the vector
during
thealgorithm
for thesample
population.
At the conclusion of thefollowing
steps DIAG contains thediagonal
of A:1 Number animals 1 to N so that parents
precede
their progeny.3 Read and store the
pedigrees
in SIRE and DAM.(3a.
Read any knowninbreeding
coefficients,
add 1 and store in DIAG(DIAG(I)=1+F
I
)).
4 Reserve space in the lists for
parental relationships
when both parents are known.When either the sire or dam is unknown store 1.0 in DIAG.
(For
1=1 to N doIf D1AG(I)=0.0 then
If
(SIRE(I)=0)
or(DAM(I)=O)
thenDIAG(I)=I.0
Else CALLRESERVE(SIRE(I),DAM(I));DIAG(I)=LASTEL
Endif EndifEndfor)
5 Pass
through
each row fromright
to leftstarting
with the Nth row andmoving
upwards,
examine eachflagged
element and reserve space in the linked lists for any element that isrequired
before the current element can becomputed. Keep
a recordof all
flagged
elements in the row(in COLUMN).
Confinesearching
to the left of the current element.(For
I=N downto 1 do6
Adjust
pointers
so that the lists are linked in column order and reset COLUMN. COLUMN. ’COLUMN(0)=0
(For
I=1 to N doCOLUMN(I)=0
K=ROW(I);NEW=0
WIIILE K>0 doIOLD=NEXT(K); NEXT(K)=NEW;
NEW=K;
K=IOLD;
Endwhile
ROW(I)=NEW
Endfor)
7
Compute
each rowstarting
with the first.[For
I=1 to N doa. Transfer elements on the left of the
diagonal
into WORK(K=COLUMN(I)
While K#0 do
WORK(JAY(K))=AIJ(K);K=NEXT(K);Endwhile)
b.Compute
thediagonal
and store it in WORK.(If!IAG(I»2.0 )
DIAG(I)=I+AIJ(DIAG(l))/2;
WORK(I)=DIAG(I))
c.