Computing inbreeding coefficients quickly

(1)

HAL Id: hal-00893856

https://hal.archives-ouvertes.fr/hal-00893856

Submitted on 1 Jan 1990

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Computing inbreeding coeﬀicients quickly

B Tier

To cite this version:

(2)

Original

article

Computing inbreeding

coefficients

_quickly

B Tier

University

of New

England, Animal Genetics and _BreedingUnit *

Arnaidade,

NSW, Australia

(Received

13 June 1988; accepted 12 _September

₁₉₉₀₎

Summary - An _algorithmfor _computing

_inbreeding

coefficients

(F)

for all animals of

large

populations

is described. The _techniquerelies on the subset of the numerator relationship matrix

_(A)

required to compute its _diagonal,which contains F. A _simple

example illustrates that this subset is a _verysmall _partof A.

inbreeding coefficients _/fast algorithm

Résumé - Calcul rapide des coefficients de consanguinité. On _présenteun _algorithme

pour calculer rapidement les

coefficients

de

consanguinité

de tous les anirrcaux, dans une

grande

population. La méthode

_fait

seulement intervenir le sous-ensemble de la matrice des _coefficientsde _{parenté qui}est nécessaire au calcul de sa diagonale, qui contient les

coefficients

de

_{consanguinité.}

Un _{exemple simple}illustre que ce sous-ensemble n’est

_qu’une

petite partie de la matrice des

_coefficients

de _{parenté. L’algorithme}est

_précisé

sous la

forme

d’un

_{codage symbolique.}

coefficients de _{consanguinité}_{/ calcul}_rapide

INTRODUCTION

Wright’s

(1922)

coefficient of

_inbreeding

_(F)

describes the

_probability

that 2 alleles at any locus are identical

_by

descent. Inbred

_offspring

result from

_mating

two

animals which have one

_{(or more)}

common ancestors. Breeders of livestock

_perceive

inbreeding

as

being

deleterious and

consequently

_tryto avoid

_mating

close relatives.

The

_relationship

between

_prospective

mates can be

computed

to determine the

degree

of

_inbreeding

of the

_{resulting offspring}

from such a

_mating.

For inbred

_populations,

_inbreeding

coefficients are

_required

to _{compute A-}1

directly using

Henderson’s

₍₁₉₇₅₎

rules. Henderson

₍₁₉₇₆₎

showed the

_relationship

between the

_inbreeding

coefficients of an animal’s _parentsand the &dquo;contributions&dquo;

*

(3)

that each animal’s

_pedigree

makes to !4 !. For the ith animal with

_{parents j}

and

k,

the

_following

contributions are added to A-1:

where

_d

_ii

₌

1.0 -

_0.25(a!!

+

_a

_kk

_).

Two

_algorithms

are

_commonly

used to _compute

_inbreeding

coefficients.

Origi-nally they

were calculated

_using

the

path

coefficient method

_{(Wright, 1922).}

While

this method is easy to use for

_computing

F for animals which have few ancestors

and are

only slightly inbred,

it is a _very

_complex

method for animals with _many common ancestors. A

_{simpler approach}

is to _generatethe numerator

_relationship

matrix

_(A)

_using

the tabular method as attributed to Lush

_by

Emik and Terrill

(1949,

cited

_by

Hudson et

_al,

_1982).

_Inbreeding

coefficients can be

computed

from

the

_diagonal

elements of A : Fi = aii - 1. This paper describes an efficient

imple-mentation of the tabular method.

COMPUTING INBREEDING COEFFICIENTS

USING THE TABULAR METHOD

To _computeA

_by

the tabular method animals in the

_population

must be numbered from 1 to N so that parents

precede

their progeny. The

pedigrees

of all animals must be stored in _memory.A zero in the

pedigree

denotes an unknown parent. The

upper

triangle

of A can be

computed

on a row

_by

row basis in consecutive order

working

from first to last.

_Diagonal

elements are

computed using

the formula:

The remainder of the row

(to

the

right

of the

diagonal)

can be

computed by applying

the formula:

where p and q are the _parentsof the

jth

animal and i <

_j.

If either _parentis

unknown

_(p

or _q=

0)

then

a oi

= 0. When a row is

complete

the

_{corresponding}

column in the lower

_triangle

can be

_{completed by}

_symmetry.Because A is

_symmetric

it is

_only

_necessaryto store the _upper

_{(or lower)}

_triangle.

Table II illustrates A for the

_{sample population}

shown in table I.

The

_storage

_required

to _computeA is

_proportional

to _{the square}of the numbers

of animals and so limits the size of the

_population

for which A can be

_computed.

Using

the

technique

of Hudson et al

₍₁₉₈₂₎

memory is

only required

to store the non-zero elements of A and it can be

_computed

for

_{larger populations.}

When

_inbreeding

coefficients

_only

are

_required

then it is not necessary to _computeA

_completely,

nor

(4)

THE RECURSIVE PEDIGREE METHOD

This is an

_adaptation

of the tabular method and

_depends

upon

computing

the subset of A

_required

to _computeits

_diagonal.

As each animal’s

_pedigree

is

read,

Equation

1 is used to

_identify

the element

which describes the

_relationship

between its _parents.If the animal’s

_inbreeding

coefficient is

_already

known and no new ancestral information is available it is

stored. If not, then the element

_(apq)

is

_placed

in the subset

_(flagged)

to be

computed.

Equation

2 can now be

applied

to

_identify

those elements

_required

to _compute

elements

_already

_flagged.

This

_requires

_searching

the matrix

_{(upper triangle)}

from bottom to _topand from

_right

to left. As each identified element is found the

two elements

_required

to

_apply

_equation

2 can also be

_flagged.

Because these two

elements lie to the left of the

_flagged

element - animals are numbered so that _parents

precede

their _{progeny -}

_searching

A in this manner results in all the elements

required

to compute the

_diagonal

of A

_being

identified.

The

_flagged

subset can now be

computed

by applying

equations

1 and 2

_starting

with the first _row,and

_{proceeding sequentially}

to the last.

_Computation

of each row

of A can be considered in 3 _parts.

_Firstly,

elements on the left of the

_diagonal

have

already

been

_{computed -}

_they

were on the

_right

of the

_diagonal

in earlier rows.

Secondly,

the

_off-diagonal

element

_(a

_jk

₎

_required

to _computethe

_diagonal

element

(a

ii

(5)

1.

_Lastly,

the

_flagged

elements on the

_right

of the

_diagonal

can be

_{computed using}

equation

2.

Table III illustrates the recursive

_pedigree

method for the

_{sample population.}

Firstly,

off

_diagonal

elements from

₍₁₎

are identified and

flagged.

Because animals

1,

2 and 3 have at least 1 unknown _parent

_they

are not inbred and 1 is stored in the

_{corresponding diagonal}

element. The

_diagonal

elements of animals

_{4, 5,}

6 and 7

are

_{represented by}

the letters

_{P, Q,}

R and S

_{respectively.}

The

_offdiagonal

elements

required

to _{compute them}are

_represented

_by

the same letter in lowercase. Other

elements identified

_by

_t,_u,v and w are

_required

to compute p, q, r and s.

As the rows are

_processed,

elements that must be

computed

before the current

element can be

_computed

are identified and

_flagged.

_a

_S6

_(s)

is the first

_flagged

element to be found. It

_requires

that

_a

₃₅

_(t)

and

_a

₄₅

_(u)

are known. Table IV

illustrates the identification and

_flagging

of new elements as

_they

were found

_using

the search

_procedure

described above.

COMPUTATIONAL CONSIDERATIONS

To take

_advantage

of the

_saving

in _spaceoffered

_by

_{this method it is necessary}to

(6)

Elements in a linked list are not _{stored in memory in any}

_particular

order but are

linked

_together

in some _sequence

_{by pointers.}

There is a

_pointer

to the location in memory of the first element in each row. Each element in the list has an associated

pointer

to the location in _memoryof the next element in the sequence. The

pointer

associated with the last element in each row

(list)

is 0. To find _anyelement in such a list it is necessary to &dquo;follow&dquo; the

_pointers

until the

required

element is found. As

new elements are stored in a list the

_pointers

are

_adjusted

to maintain the

_integrity

of the sequence

(for

a detailed

explanation

of linked lists see

Knuth,

1968).

To

_expedite

the

_{searching phase,}

the elements in each row should be linked in

reverse column order

(from

_highest

to

_lowest,

_{fig la).}

After the subset has been identified the

_pointers

can be

_adjusted

so that the rows are linked in column order

(fig lb).

For this

_application

it is desirable to have a

_pointer

_(called

RECENT in the

_Appendix)

to the most

_recently

added element as well as a

_pointer

(ROW)

to the first element in each list. As

_only

the upper

triangle

is

_stored,

elements on

the left of the

_diagonal

appear in the column above the

diagonal.

These elements

must be added to the lists

_holding

the rows above the

_diagonal.

Because these

new elements will

_generally

be

_adjacent

to the

_previous

addition to that row

_using

this

_pointer

_(RECENT)

can avoid

_{repeated searching}

_through

the lists.

_Similarly,

repeated searching

for the same new elements

_(arising

from animals

_having

the

same

_parent)

within a row and

searching

the row to the

_right

of the current element

should also be avoided.

Diagonal

elements can be stored in a _separatevector

_(DIAG).

This can be

used to

_point

to the

_physical

location in the linked list of the element

_required

to _compute

_equation

1 until the

_diagonal

element is determined. Because the theoretical maximum for a

diagonal

element is

_2.0,

the first 2 elements in the linked

list must be reserved. Thus a value _greaterthan 2 in the

_diagonal

vector is a

_pointer,

one less than 3 is a

diagonal

element. As each

_diagonal

element is determined it can

be stored in this vector.

_{Subsequently,}

_inbreeding

coefficients can be derived from

this vector.

An efficient method for

_finding

elements on the left of the

diagonal

for any row is

required.

One _wayof

_doing

this is to

_adjust

the

_pointers

so that after the elements in

(7)

on a column basis in reverse row order

(fig Ic).

This

_requires

a vector

_(COLUMN)

which

_points

to the most

_recently

added element of each column.

SIMULATION STUDY

For this

_example

a

population

with the

following

characteristics was simulated.

Starting

with a base

_population

of 20 sires and 500

_dams,

40 years of progeny

were

generated.

The adult

population

was held constant at 20 sires and 500 dams.

Sires were mated

randomly

to the

dams,

all of which had 1 calf. After each year

the oldest half of the sires were

replaced

and the oldest _quarterof the dams were

culled.

_Yearlings

_(offspring

from the

_previous

_{&dquo;year&dquo;)}

were

_randomly

selected to

replace

the culled animals. As a

_result,

some animals had _{up to}20

_generations

of

ancestors.

Three sets of

_inbreeding

coefficients were

_computed

for the

_animals,

viz

- _{after each}

group of 10 years

assuming

that no

inbreeding

coefficients had been

calculated on this

_population

before;

- for each decade

assuming

that

_inbreeding

coefficients had been calculated in the

_previous

_yearie

_{only inbreeding}

coefficients for the latest _groupof calves were

(8)

(9)

- for

parents

only

when no

_inbreeding

coefficients were known. A detailed

algo-rithm used to compute

inbreeding

is shown in the

_Appendix.

These

_computations

were carried out on a GOULD NP1 computer and the results from this are shown

in table V.

For the

_population

of 20 520

_animals,

the subset of A included 516 435 elements

out of a total of 421070 400

(0.123%).

The

_population

size which

_required

com-putation

of the

_largest

_proportion

of A

_(0.146%)

was

_{10 520. When}

_inbreeding

coefficients were known for all but the most recent _groupof

_calves,

or were

_only

required

for _parents,a

_{significantly}

smaller

_proportion

of A was

_required.

DISCUSSION

The results in table V illustrate how _verysmall a subset of A is

_required

to

compute its

_diagonal

for the simulated

_population.

_Although

the subset is small as a

_proportion

of

A,

more than 6

_Mbytes

of _memorywere

_required

to store the

_largest

subset

_{(516 435 elements).}

Table Vb illustrates the

_computing

resources

_required

when

_inbreeding

coefficients are known for all but the most recent _{group of calves.}

As this

_technique

can make use of

_{previously computed inbreeding coefficients,}

problems

that are too

_large

for a _computercan be divided into a series of smaller

problems.

To avoid

_repeated

_computation

of the same elements of

A,

subsets should

not be chosen on a

_{chronological}

basis but rather on a related group, herd or

_family

basis. The

_technique

could be

_{readily adapted}

to compute any subset of A that is

of interest.

ACKNOWLEDGMENTS

The financial _supportof the Australian Meat and Livestock

_Corporation

is

grate-fully

acknowledged,

as are

_helpful

comments and _{encouragement}.from

_colleagues

and referees.

REFERENCES

Henderson CR

₍₁₉₇₅₎

_Rapid

method for

_computing

the inverse of a

relationship

matrix. J

_Dairy

Sci

_58,

1727-1730

Henderson CR

₍₁₉₇₆₎

A

_simple

method for

_computing

the inverse of a numerator

relationship

matrix used in

_prediction

of

_breeding

values. Biometrics

_32,

69-83

Hudson

_{GFS, Quaas}

_RL,

Van Vleck LD

₍₁₉₈₂₎

_{Computer algorithm}

for the recursive

method of

_calculating

_large

numerator

_relationship

matrices. J

_Dairy

Sci

_65,

2018-2022 .

Knuth DE

₍₁₉₆₈₎

The art

_{of Computer Programming.}

Tlol 1. Fundamental

Algo-rithms. Addison

_{Wesley, Reading,}

Massachussets. 634 p

(10)

APPENDIX

Algorithm

for the recursive

_pedigree

method. Table VI illustrates the state of the

storage at various _stagesof the

_algorithm.

The

_following

variables and subroutine are

required

for this

algorithm:

Integer

scalars

N is the number of animals. LLSIZE is a

_pointer

to the last element in the linked list

at _anytime

_((LLSIZE+1)

is

_empty).

LARGE is the size of the vectors used to store the lists. LASTEL holds the address of the last element

_passed

to the subroutine.

Integer

vectors

SIRE(0:N)

and

_DAM(O:N)

store _parents

_{(SIR.E(0)=DAM(0)=0).}

_During

the

search-ing phase,

COLUMN(O:N)

is used to

_keep

a record of the

flagged

elements in a row;

during

the

computation

phase

it stores

pointers

to the first element in the columns of elements above the

_diagonal.

_ROW(1:N)

stores

_pointers

to the first element

in each row on the

_right

of the

_diagonal.

RECENT(1:N)

stores

_pointers

to the

location of the elements

_preceding

the most

_recently

added element in each row.

NEXT(1:LARGE)

stores

_pointers

to the next element in each row.

_JAY(1:LARGE)

stores the column

_subscripts

of the elements.

Real vectors

WORK(1:N)

is used as

_workspace

for

_computing

a row.

DIAG(1:N)

stores

_pointers

to the

_apq’s

of

_equation

_[1]

_initially,

and

_subsequently

the

_diagonal

elements when

computed.

AIJ(1:LARGE)

stores the

_required

elements of A. Subroutine

RESERVE reserves space in the linked lists for the elements in the subset while

maintaining

the

_integrity

of the lists. A check to ensure that there is sufficient _space

(11)

(SUBROUTINE

RESERVE(INI,INJ)

If

_(INI=0)

or

(INJ*0)

RETURN If

(INI=INJ)RETURN

I=MIN(INI,INJ);J=MAX(INI,INJ);

L=RECENT

(I);M=NEXT(L);

IF

_(JAY(M)<J)

THEN

L--O;M=R

OW

(

I

)

Endif If M=0 then

LLSIZE=LLSIZE+ l;JA Y(LLSIZE)=J;ROW(I)=LLSIZE;LAS1EL=LLSIZE

Else

While

(M>0)

and

_(JAY(M)>J)

do

RECENT(I)=L;L=M;M=NEXT(M);Endwhile

If

_JAY(M)#J

then

LLSIZE=LLSIZE+1;JAY(LLSIZE)=J;LASTEL=LLSIZE

IF L=0 Then

NEXT(LLSIZE)=

ROW

(

I

);

RO

W(I)=LLSIZE

Elseif

M!

then

NEXT(L)=LLSIZE

Else

NEXT(LLSIZE)=NEXT(L);NEXT(L)=LLSIZE

Endif Else LASTEL=M Endif Endif

ENDreserve)

Algorithm

Table VI illustrates the state of the vector

_during

the

_algorithm

for the

_sample

population.

At the conclusion of the

following

steps DIAG contains the

_diagonal

of A:

1 Number animals 1 to N so that _parents

precede

their progeny.

(12)

3 Read and store the

_pedigrees

in SIRE and DAM.

_(3a.

Read any known

inbreeding

coefficients,

add 1 and store in DIAG

_(DIAG(I)=1+F

_I

_)).

4 Reserve space in the lists for

parental relationships

when both parents are known.

When either the sire or dam is unknown store 1.0 in DIAG.

(For

1=1 to N do

If D1AG(I)=0.0 then

If

_(SIRE(I)=0)

or

_(DAM(I)=O)

then

DIAG(I)=I.0

Else CALL

_{RESERVE(SIRE(I),DAM(I));DIAG(I)=LASTEL}

Endif Endif

Endfor)

5 Pass

_through

each row from

_right

to left

_starting

with the Nth row and

_moving

upwards,

examine each

_flagged

element and reserve _spacein the linked lists for _any element that is

_required

before the current element can be

computed. Keep

a record

of all

_flagged

elements in the row

_{(in COLUMN).}

Confine

_searching

to the left of the current element.

(For

I=N downto 1 do

(13)

6

_Adjust

_pointers

so that the lists are linked in column order and reset COLUMN. COLUMN. ’

COLUMN(0)=0

(For

I=1 to N do

COLUMN(I)=0

K=ROW(I);NEW=0

WIIILE K>0 do

IOLD=NEXT(K); NEXT(K)=NEW;

NEW=K;

K=IOLD;

Endwhile

ROW(I)=NEW

Endfor)

7

_Compute

each row

_starting

with the first.

[For

I=1 to N do

a. Transfer elements on the left of the

_diagonal

into WORK

(K=COLUMN(I)

While K#0 do

WORK(JAY(K))=AIJ(K);K=NEXT(K);Endwhile)

b.

_Compute

the

_diagonal

and store it in WORK.

(If!IAG(I»2.0 )

DIAG(I)=I+AIJ(DIAG(l))/2;

WORK(I)=DIAG(I))

c.

_Compute

the elements on the

_right

of the

_diagonal

and store them in WORK and AIJ.

_Adjust

the

_pointers

so that elements on the

_right

of the

_diagonal

in the row are linked into columns.

Computing inbreeding coefficients quickly

HAL Id: hal-00893856

https://hal.archives-ouvertes.fr/hal-00893856

Submitted on 1 Jan 1990

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Computing inbreeding coeﬀicients quickly

B Tier

To cite this version:

Original

article

Computing inbreeding

coefficients

quickly

of New

Arnaidade,

(Received

1990)

inbreeding

(F)

large

populations

(A)

coefficients

consanguinité

grande

fait

coefficients

consanguinité.

qu’une

coefficients

précisé

forme

codage symbolique.

Wright’s

(1922)

inbreeding

(F)

probability

by

offspring

mating

(or more)

perceive

inbreeding

being

consequently

mating

relationship

prospective

computed

degree

inbreeding

resulting offspring

mating.

populations,

inbreeding

required

directly using

(1975)

(1976)

relationship

inbreeding

pedigree

parents j

k,

following

d

ii

=

_quickly

₁₉₉₀₎

_inbreeding

_(A)

_fait

_{consanguinité.}

_qu’une

_coefficients

_précisé

_{codage symbolique.}

_inbreeding

_(F)

_probability

_by

_offspring

_mating

_{(or more)}

_perceive

_mating

_relationship

_prospective

_inbreeding

_{resulting offspring}

_mating.

_populations,

_inbreeding

_required

₍₁₉₇₅₎

₍₁₉₇₆₎

_relationship

_inbreeding

_pedigree

_{parents j}

_following

_d

_ii

₌

_0.25(a!!

_a

_kk

_).

_algorithms

_commonly

_inbreeding

_using

_{(Wright, 1922).}

_computing

_complex

_{simpler approach}

_relationship

_(A)

_using

_by

_by

_al,

_1982).

_Inbreeding

_diagonal

_by

_population

_by

_Diagonal

_j.

_(p

_{corresponding}

_triangle

_{completed by}

_symmetric

_only

_{(or lower)}

_triangle.

_{sample population}

_storage

_required

_proportional

_population

_computed.

₍₁₉₈₂₎

_computed

_{larger populations.}