Computing inbreeding coefficients in large populations

(1)

HAL Id: hal-00893959

https://hal.archives-ouvertes.fr/hal-00893959

Submitted on 1 Jan 1992

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Computing inbreeding coeﬀicients in large populations

The Meuwissen, Z Luo

To cite this version:

(2)

Original

article

Computing

inbreeding

coefficients

in

_{large populations}

THE Meuwissen

Z Luo

Institute

_of

Animal

_Physiology

and Genetics

_Research,

Edinburgh

Research

Station,

Roslin EH25

9PS,

UK

(Received

21 October 1991;

accepted

15

_April

₁₉₉₂₎

Summary - An

_algorithm

for

computing inbreeding

coefficients in

_{large populations}

is

presented.

It is

_especially

useful in

_{large populations}

because of the small size of the

memory

required,

which is linear with

_population

size, and its

_speed,

if the number of

generations

involved is not too

_large,

ie not

_larger

than about 12. The method is

_compared

with 2 other methods for

_{computational speed}

and memory requirement. The

_presented

algorithm

is suited for situations where the

_inbreeding

coefficients for a few new animals are to be

computed given

that their ancestor’s

inbreeding

coefficients were calculated

previously.

inbreeding coefficient

_/

_algorithm

Résumé - Le calcul des coefficients de consanguinité dans de grandes populations.

On

_présente

un

_algorithme

de calcul des

_coefficients

de

_{consanguinité}

pour de

grandes

populations.

Il est

particulièrement adapté

à ces

populations

à cause du

faible

volume de

mémoire d’ordinateur _requis

_(il

_dépend

linéairement de la taille de la

_population)

et de la rapidité de calcul

_quand

le nombre de

_{générations impliquées}

n’est pas trop élevé

(c’est-à-dire

inférieur

à environ

_12).

La méthode est

_comparée

à 2 autres méthodes du point de

vue de la vitesse de calcul et de la mémoire

_requise.

Le

_{présent algorithme}

est

_également

adapté

aux situations de mise à

_jour

où les

_coefficients

de

_{consanguinité correspondant}

à un

_faible

nombre d’animaux nouveaux doit être calculé en tirant _partides

coefficients

calculés pour les anciens animaux.

coefficient de _{consanguinité}

_/

_algorithme

*

On leave from the Schoonoord Research Institute for Animal

_Production,

PO Box

_501,

(3)

INTRODUCTION

Many

countries are

_implementing

or have

_implemented

evaluation methods for

na-tional herds based upon animal models. If these account for

_inbreeding,

calculation of

_inbreeding

coefficients in

_{large populations}

is

_required.

Henderson

₍₁₉₇₆₎

and

Quaas

(1976)

implicitly

present

methods for the calculation of

_inbreeding

coeffi-cients,

when

they

propose

algorithms

for the calculation of the inverse of the

ad-ditive

_relationship

matrix. Henderson’s method

_requires

_storage

of a

large

matrix.

Quaas

avoids this: memory

requirement

is linear with N and

computation

time is

proportional

to

_N

₂

_,

where N is the size of the data set. In

Quaas’

method

com-puter

time becomes a

limiting

factor with

_increasing

N. With this

algorithm,

no

use is made of known

_inbreeding

coefficients,

and hence all calculations have to be

repeated

whenever coefficients are

_required

for a new batch of animals. Golden et al

₍₁₉₉₁₎

_present

an

algorithm

based on

Quaas’

method

_using

_sparse

_programming

techniques.

Tier

₍₁₉₉₀₎

presents

a fast method for the calculation of

_inbreeding

coefficients.

Using

sparse

programming

techniques,

his

_algorithm

first determines which ele-ments in the additive

_{genetic relationship}

matrix A are

required

and then calculates

them. The _memory

_required

is a small

_proportion

of

NZ:

about

0.9%.

For

_large

pop-ulations,

the

_physical

memory of the

computer

is still

_likely

to be too small and use

of disk _{memory is}

_{required, slowing}

this

_{algorithm considerably.}

Known

_inbreeding

coefficients are not recalculated.

The aim of this _{paper is}to

_present

a method for the calculation of

_inbreeding

coefficients in

_{large populations,}

which is

_{fast, requires}

_memory

_proportional

to N and does not

_recompute

known

_inbreeding

coefficients. The method

_presented

is

compared

to Golden et al’s

₍₁₉₉₁₎

_{implementation}

of

_Quaas’

₍₁₉₇₆₎

method and Tier’s

₍₁₉₉₀₎

method for

_speed

and _memory

_required.

METHOD

The method is based on the

_{decomposition}

of the additive

_{genetic relationship}

matrix

_A,

as described

_by

Henderson

_(1976):

A =

LDL’,

where L is _alower

triangular

matrix

_containing

the fraction of the genes that animals derive from

their

_ancestors,

and D is a

_diagonal

matrix

_containing

the within

_family

additive

genetic

variances of animals. The animals are ordered such that

_parents

precede

offspring.

From the

_{decomposition,}

it follows that

_{(Quaas, 197G):}

where

_A

_ii

is the ith

_diagonal

element of

_A,

which

_equals

the

_inbreeding

coefficient of animal i

_plus

1.

_Quaas

₍₁₉₇₆₎

_computes

the elements of L

_quickly

and one column at a time

_by

tlie

_following

recursive

_algorithm:

For

_j=

1 to 1V

_(all

columns of

_L)

(4)

For

i= j

+

1 to N

_(all

elements below the

_diagonal

_element)

Lij =

(L!

+

_L

_dd

_)/2

when both

parents

si and

d

i

of i are

_known,

or

=

L

kd

/2

when

_only

one

_parent

k

i

of i is

known,

or

= 0 when both

_parents

are

_unknown,

for i <

_j.

The elements of D are calculated as:

D!!

= 1 when both

_parents

are

_unknown,

or

= 0.75 -

Fkj /4

when

_only

one

_parent

k!

of j

is

known,

or

= 0.5 -

(Fs!

+

_F

_dJ/

₄

when both

_parents

_Sj and

_d

_j

of j

are

known,

where

F

j

denotes the

_inbreeding

coefficient of animal

_j.

After

_computing

the elements of the

_jtli

column of

_{L, they}

are

squared

and

multiplied by

_D!!.

The

resulting

vector is added to a

_working

vector. When this

_procedure

is followed for _everycolumn of

L,

the

working

vector contains the

_A

_jj

_-values.

This

algorithm requires

N(N+ 1)/2

operations.

Golden et al

₍₁₉₉₁₎

made a list of the

_offspring

of all the animals i.

Because element

_L

_ij

is non-zero

_only

if i is a descendant

_{of j,}

the

_operations

within

each column are

performed only

for the descendants

of j.

Because the elements of

the columns of L are not stored

_they

have to be recalculated when a new batch of

animals becomes available.

In the

present

algorithm,

the elements of L are

computed

row

by

row, which

overcomes the

problem

of

recalculating

elements of L when a new batch of animals

has to be evaluated. A row i of L

_gives

the fraction of the _genesthat animal i derives from its ancestors.

_Hence,

_L

_isi

=

Lid!

=

0.5,

where _s_i and

d

i

are the sire and dam of

i,

respectively.

Each row of L can be built

_{by proceeding}

up the

pedigree adding

half the &dquo;contribution&dquo; of the current animal to each of its

_parents.

The fraction of the _genesderived from an ancestor is:

where

_P

_j

is the set of identification numbers of the progeny

of j

and

ANC

i

is the set of identification numbers of the ancestors of

_i,

_including

animal i itself. The latter

_identity

in

_[2]

is because

_Li!

=

0,

if k is not an ancestor of i or not

_equal

to i. This leads to the

_{following algorithm}

for

_{computing L,}

row

_by

row. As each row is

_determined,

the contributions to the elements of

A

ii

are accumulated. Row

i of L and

A;z,

are set to 0 before

_starting.

The

_{algorithm keeps}

track of a list of

ancestors

_ANC

_I

_,

whose contribution to

_A

_ii

has

_yet

to be included. If the sire or dam is unknown si = 0 _or

d

i

=

_0,

respectively.

(5)

delete j

from

_ANC,

end wliile

The

_youngest

_{animal j}

in

_ANC

_i

is

_evaluated,

because all of its _progenyin the

pedigree

of animal i must have been

evaluated, hence,

its

_L

_ij

value is known. The kernel of a Fortran _programof the

_algorithm

is

given in

the

_Appendix.

The list of ancestors is

_{represented by}

a link list. In the

_computer

_programtime is saved

_by:

_i)

checking

whether one of the

_parents

is

unknown, giving

an

inbreeding

coefficient of zero

(otherwise,

the

_algorithm

would trace the

_pedigree

of the other

_{parent) ;}

and

_ii)

if the

_pedigree

is sorted such that full sibs have successive identification

numbers,

ie

are evaluated

_{successively,}

_only

the first full sib is evaluated and all full sibs obtain

(and have)

the same

_inbreeding

coefficient as the evaluated full sib.

In order to _comparethe

_algorithms,

Golden et al’s

₍₁₉₉₁₎

and Tier’s

₍₁₉₉₀₎

algorithms

were

_programmed

in Fortran. Golden et al

_presented

their

_algorithm

in

C,

which

_they

considered faster.

However,

C was not available to the authors. The

sorting

of the list of descendants in the Golden et al

_algorithm

was

performed by

an IMSL routine here

(I1VISL, 1984).

SIMULATION

In order to _comparethe

_algorithms,

the simulation program of Meuwissen and Van

der Werf

₍₁₉₉₁₎

was used to

_generate

a

_pedigree

data set. The _programsimulated

an _open

_dairy

cattle nucleus scheme for

10,

20 or 40 _years,with

population

sizes of

9 267,

15 582 and 28171

_{animals, respectively}

_(see

table

_I).

Selection was for animal

model BLUP

_breeding

values. Nucleus dams and commercial dams

_produced

8 and

1

_{offspring, respectively.}

The number of nucleus and commercial sires selected was

4 and

10,

respectively. Also,

a

_breeding

_programwas simulated for 20 _yearswith

selection of

_only

one commercial

_sire,

_representing

a situation with

_large

half sib families.

Inbreeding

coefficients were calculated on a micro-VAX 3800 in batch mode. The maximum

_physicsl

memory allocated to the batch _queuewas about 20 Mb. Because

this exceeded the _memory

_{required by}

_anyof the

_algorithms,

no disk _memorywas

used. If disk _memorywere

used,

the

_{computational speed}

would

_depend

on the size

(6)

The number of

_generations

in the data-set after

10,

20 and 40 _yearswere

counted and a

_weighted

_{average per}individual was

calculated,

where

_weighting

was

according

to the contribution of the

_path.

For

_instance,

if

_only

the sire of the

sire and the sire of an animal were

known,

the _averagenumber of

_generations

was

2.1/4

+

_{1 ! 1/4}

+

_{0 -1/2}

= 0.75. The animal with the maximum average number of

generations

and the _averagenumber of

_generations

of the animals born in the last

year evaluated were estimated.

RESULTS

The results are

presented

in table II.

Although

the

algorithm

of Golden et al

(1991)

and that

_presented

here

_require

the same number of additions of

L !D!!

terms, the latter consumed less

_computer

time. This is because the

_algorithm

of Golden et al:

_i)

makes a list of

_offspring

of each

animal,

before

_{starting evaluations;}

ii)

traces descendants instead of

ancestors,

which is more difficult because the

number of progeny of each animal is

unlimited,

whereas the number of

parents

is limited

to 2;

_iii)

sorts the list of descendants for _every

_animal,

which is

time-consuming ;

and

_iv)

_tracing

of

_{descendants, sorting them,}

and addition of

_{L? -}

_D

_jj

are

not

_{simultaneously performed.}

The Golden et al

_algorithm

_requires

more _memory,

because the list of

_offspring

needed and the

_tracing

of descendants

_requires

memory.

However,

the _memory

_required

is still linear with the number of animals N. Tier’s

₍₁₉₉₀₎

_algorithm

was faster than the

_{algorithm presented}

_here,

when

40 _yearsare

evaluated,

ie evaluation of 12.3

_generations

of animals. When _many

generations

are

evaluated,

_manyanimals have common _{ancestors, whose}

pedigree

is re-evaluated many times in the

present

algorithm.

Tier’s

_algorithm

avoids redundant calculation

by tracing pedigrees

once. The _memory

required

is 15.9 lVlb

for 28171 animals. The

_{presented algorithm required}

0.7 NIb is this situation. When about 6 or fewer

_generations

are

evaluated,

the

_present

algorithm

is faster

than

_Tier’s,

because Tier’s

_algorithm

first determines which elements of A are

required

before

_{calculating inbreeding}

coefficients. If

_only

the animals born in year

40 are

_evaluated,

the

_algorithm

_presented

here and that of Tier

₍₁₉₉₀₎

have about the same

_speed

(table II).

In Tier’s

_algorithm,

the

_required

elements of A have to be determined for

_relatively

few animals. The time

_{required by}

the

algorithm

of Golden

et al

₍₁₉₉₁₎

_equals

that

_required

if all animals are

_evaluated,

because the

_algorithm

re-calculates all

inbreeding

coefficients. The last alternative in table II shows that the presence of

large

half sib families is

advantageous

to Tier’s

_algorithm.

This is

because Tier’s

_algorithm

traces the

_pedigree

of the sire

_only

once.

DISCUSSION AND CONCLUSIONS

The

presented

algorithm

for

_computing

_diagonal

elements of A is a _sparse

imple-mentation of an

algorithm presented by

Mrode and

Thompson

(1989)

and

Quaas

(1989)

for the

_{multiplication}

of A with a matrix.

Here,

this matrix is the

identity

matrix and

_{only diagonal}

elements are calculated.

The

_algorithm

combines

_high

_{computational speed}

with low _memory

require-ment,

if the number of

_generations

evaluated is not

_larger

_than,

_say12

_{(table II).}

(7)

In order to

_keep

the calculations within the

_physical

_memoryof the computer,

none of the

_populations

evaluated in table II were

_{really large.}

If the data set

were much

larger,

Tier’s

(1990)

algorithm especially

would

_require

disk

memory

which would decrease its

_speed

_{considerably. However,}

the presence of

large

half sib

families favours the use of Tier’s

algorithm

(table II).

If the number of

_generations

involved

exceeds,

say

12,

the

_{algorithm presented}

here becomes slow

compared

to

Tier’’s,

because common ancestors are traced _{many times.}In order to overcome

this

_problem

we _may

_{compute F}

_i

=

1/2AS!d! _

L

L!,kLd:kDkk

and store the

k

sire’s row of L

(L

Sik

,

k =

1, ... ,

).

s

i

However,

calculating

the dam’s row of L costs

approximately

the same amount of

computer

time as

calculating

the individual’s

row of

L,

as in

!1].

] .

In

_practice,

often

_inbreeding

coefficients of _manyanimals in the

_population

have been calculated.

_Only

those of a few new born animals are unknown. The

_presented

(8)

APPENDIX

Integer

arrays

(N

is the number

of

animals in the

population):

PED(1:2, l:l!T)

PED(1, i)

= sire identification _noof animal i

PED(2, i)

= dam identification no of animal i

POINT(1:N)

POINT

_(i)=

the next oldest ancestor in the link list

= 0 if i is the last ancestor

Real

_Arrays:

F(0: N)

F(i)

=

inbreeding

coefficients of animal i.

F(O) =

-1

gives

appropriate

within

family

variances for unknown

parents

by

the formula:

_{.5-.25*(F (PED(1, i)+PED(2,i)).}

_F(i)

is

_initially

set

_equal

to

_-1,

which is more accurate than

calculating

F + 1 and then

_subtracting

1.

L(1:N)

L(j)

= element

_ij

of matrix

_L,

if animal i is evaluated

D(1:N)

D(i)

= within

_family

variance of animal i

(9)

ACKNOWLEDGMENTS

We are indebted to

Thompson

and referees for

helpful suggestions

and comments. The IB!Iilk

_Marketing

Board of

_England

and

_Wales,

the Meat and Livestock Commission and the

_Ministry

of

_Agriculture,

Fisheries and Food are

_{gratefully acknowledged}

for their

sponsorship.

REFERENCES

Golden

_BL,

Brinks

_JS,

Bourdon RM

₍₁₉₉₁₎

A

_{performance programmed}

method for

_computing

_inbreeding

coefficients from

_large

data sets for use in mixed-model

analyses.

J Anim

_Sci,

_69,

3564-3573

Henderson CR

_(197G)

A

_simple

method for

_computing

the inverse of a numerator

relationship

matrix used in the

prediction

of

_breeding

values. Biometrics

_32,

69-83

IMSL

₍₁₉₈₄₎

International Mathematical and Statistical Libraries.

_{Houston, TX,}

ed 9.2

Meuwissen

_THE,

van der Werf JHJ

₍₁₉₉₁₎

Effects of

heterogeneous

within-herd

variances on the

efficiency

or

dairy

cattle

breeding

schemes.

42th

Ann Meet Eur

Assoc Anim

_{Prod, Berlin,}

_Germany

Mrode

_R,

_Thompson

R

₍₁₉₈₉₎

An alternative

_algorithm

for

_{incorporating}

the

(10)

Quaas

RL

₍₁₉₇₆₎

_Computing

the

_diagonal

elements and inverse of a

large

numerator

relationship

matrix. Biometrics

_32,

949-953

Quaas

RL

₍₁₉₈₉₎

Transformed mixed model

_equations:

a recursive

algorithm

to eliminate

A-

1 .

J

_Dairy

Sci

_72,

1937-1941

Computing inbreeding coefficients in large populations

HAL Id: hal-00893959

https://hal.archives-ouvertes.fr/hal-00893959

Submitted on 1 Jan 1992

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Computing inbreeding coeﬀicients in large populations

The Meuwissen, Z Luo

To cite this version:

Original

article

Computing

inbreeding

coefficients

in

large populations

THE Meuwissen

Z Luo

of

Physiology

Research,

Edinburgh

Station,

9PS,

(Received

accepted

April

1992)

algorithm

computing inbreeding

large populations

presented.

especially

large populations

required,

population

speed,

generations

large,

larger

compared

computational speed

presented

algorithm

inbreeding

computed given

inbreeding

previously.

/

présente

algorithme

coefficients

consanguinité

grandes

populations.

particulièrement adapté

populations

faible

(il

dépend

population)

quand

générations impliquées

inférieur

12).

comparée

requise.

présent algorithme

également

adapté

_{large populations}

_of

_Physiology

_Research,

_April

₁₉₉₂₎

_algorithm

_{large populations}

_especially

_{large populations}

_population

_speed,

_large,

_larger

_compared

_{computational speed}

_presented

_inbreeding

_/

_présente

_algorithme

_coefficients

_{consanguinité}

_(il

_dépend

_population)

_quand

_{générations impliquées}

_12).

_comparée

_requise.

_{présent algorithme}

_également

_jour

_coefficients

_{consanguinité correspondant}

_faible

_/

_Production,

_501,

_implementing

_implemented

_inbreeding,

_inbreeding

_{large populations}

_required.

₍₁₉₇₆₎

_inbreeding

_relationship

_requires

_storage

_N

₂

_,

_increasing

_inbreeding

_required

₍₁₉₉₁₎

_present

_using

_programming

₍₁₉₉₀₎

_inbreeding

_algorithm

_{genetic relationship}

_required

_proportion

_large

_physical

_likely