• Aucun résultat trouvé

Computing inbreeding coefficients in large populations

N/A
N/A
Protected

Academic year: 2021

Partager "Computing inbreeding coefficients in large populations"

Copied!
10
0
0

Texte intégral

(1)

HAL Id: hal-00893959

https://hal.archives-ouvertes.fr/hal-00893959

Submitted on 1 Jan 1992

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Computing inbreeding coefficients in large populations

The Meuwissen, Z Luo

To cite this version:

(2)

Original

article

Computing

inbreeding

coefficients

in

large populations

THE Meuwissen

Z Luo

Institute

of

Animal

Physiology

and Genetics

Research,

Edinburgh

Research

Station,

Roslin EH25

9PS,

UK

(Received

21 October 1991;

accepted

15

April

1992)

Summary - An

algorithm

for

computing inbreeding

coefficients in

large populations

is

presented.

It is

especially

useful in

large populations

because of the small size of the

memory

required,

which is linear with

population

size, and its

speed,

if the number of

generations

involved is not too

large,

ie not

larger

than about 12. The method is

compared

with 2 other methods for

computational speed

and memory requirement. The

presented

algorithm

is suited for situations where the

inbreeding

coefficients for a few new animals are to be

computed given

that their ancestor’s

inbreeding

coefficients were calculated

previously.

inbreeding coefficient

/

algorithm

Résumé - Le calcul des coefficients de consanguinité dans de grandes populations.

On

présente

un

algorithme

de calcul des

coefficients

de

consanguinité

pour de

grandes

populations.

Il est

particulièrement adapté

à ces

populations

à cause du

faible

volume de

mémoire d’ordinateur requis

(il

dépend

linéairement de la taille de la

population)

et de la rapidité de calcul

quand

le nombre de

générations impliquées

n’est pas trop élevé

(c’est-à-dire

inférieur

à environ

12).

La méthode est

comparée

à 2 autres méthodes du point de

vue de la vitesse de calcul et de la mémoire

requise.

Le

présent algorithme

est

également

adapté

aux situations de mise à

jour

où les

coefficients

de

consanguinité correspondant

à un

faible

nombre d’animaux nouveaux doit être calculé en tirant parti des

coefficients

calculés pour les anciens animaux.

coefficient de consanguinité

/

algorithme

*

On leave from the Schoonoord Research Institute for Animal

Production,

PO Box

501,

(3)

INTRODUCTION

Many

countries are

implementing

or have

implemented

evaluation methods for

na-tional herds based upon animal models. If these account for

inbreeding,

calculation of

inbreeding

coefficients in

large populations

is

required.

Henderson

(1976)

and

Quaas

(1976)

implicitly

present

methods for the calculation of

inbreeding

coeffi-cients,

when

they

propose

algorithms

for the calculation of the inverse of the

ad-ditive

relationship

matrix. Henderson’s method

requires

storage

of a

large

matrix.

Quaas

avoids this: memory

requirement

is linear with N and

computation

time is

proportional

to

N

2

,

where N is the size of the data set. In

Quaas’

method

com-puter

time becomes a

limiting

factor with

increasing

N. With this

algorithm,

no

use is made of known

inbreeding

coefficients,

and hence all calculations have to be

repeated

whenever coefficients are

required

for a new batch of animals. Golden et al

(1991)

present

an

algorithm

based on

Quaas’

method

using

sparse

programming

techniques.

Tier

(1990)

presents

a fast method for the calculation of

inbreeding

coefficients.

Using

sparse

programming

techniques,

his

algorithm

first determines which ele-ments in the additive

genetic relationship

matrix A are

required

and then calculates

them. The memory

required

is a small

proportion

of

NZ:

about

0.9%.

For

large

pop-ulations,

the

physical

memory of the

computer

is still

likely

to be too small and use

of disk memory is

required, slowing

this

algorithm considerably.

Known

inbreeding

coefficients are not recalculated.

The aim of this paper is to

present

a method for the calculation of

inbreeding

coefficients in

large populations,

which is

fast, requires

memory

proportional

to N and does not

recompute

known

inbreeding

coefficients. The method

presented

is

compared

to Golden et al’s

(1991)

implementation

of

Quaas’

(1976)

method and Tier’s

(1990)

method for

speed

and memory

required.

METHOD

The method is based on the

decomposition

of the additive

genetic relationship

matrix

A,

as described

by

Henderson

(1976):

A =

LDL’,

where L is a lower

triangular

matrix

containing

the fraction of the genes that animals derive from

their

ancestors,

and D is a

diagonal

matrix

containing

the within

family

additive

genetic

variances of animals. The animals are ordered such that

parents

precede

offspring.

From the

decomposition,

it follows that

(Quaas, 197G):

where

A

ii

is the ith

diagonal

element of

A,

which

equals

the

inbreeding

coefficient of animal i

plus

1.

Quaas

(1976)

computes

the elements of L

quickly

and one column at a time

by

tlie

following

recursive

algorithm:

For

j=

1 to 1V

(all

columns of

L)

(4)

For

i= j

+

1 to N

(all

elements below the

diagonal

element)

Lij =

(L!

+

L

dd

)/2

when both

parents

si and

d

i

of i are

known,

or

=

L

kd

/2

when

only

one

parent

k

i

of i is

known,

or

= 0 when both

parents

are

unknown,

for i <

j.

The elements of D are calculated as:

D!!

= 1 when both

parents

are

unknown,

or

= 0.75 -

Fkj /4

when

only

one

parent

k!

of j

is

known,

or

= 0.5 -

(Fs!

+

F

dJ/

4

when both

parents

Sj and

d

j

of j

are

known,

where

F

j

denotes the

inbreeding

coefficient of animal

j.

After

computing

the elements of the

jtli

column of

L, they

are

squared

and

multiplied by

D!!.

The

resulting

vector is added to a

working

vector. When this

procedure

is followed for every column of

L,

the

working

vector contains the

A

jj

-values.

This

algorithm requires

N(N+ 1)/2

operations.

Golden et al

(1991)

made a list of the

offspring

of all the animals i.

Because element

L

ij

is non-zero

only

if i is a descendant

of j,

the

operations

within

each column are

performed only

for the descendants

of j.

Because the elements of

the columns of L are not stored

they

have to be recalculated when a new batch of

animals becomes available.

In the

present

algorithm,

the elements of L are

computed

row

by

row, which

overcomes the

problem

of

recalculating

elements of L when a new batch of animals

has to be evaluated. A row i of L

gives

the fraction of the genes that animal i derives from its ancestors.

Hence,

L

isi

=

Lid!

=

0.5,

where si and

d

i

are the sire and dam of

i,

respectively.

Each row of L can be built

by proceeding

up the

pedigree adding

half the &dquo;contribution&dquo; of the current animal to each of its

parents.

The fraction of the genes derived from an ancestor is:

where

P

j

is the set of identification numbers of the progeny

of j

and

ANC

i

is the set of identification numbers of the ancestors of

i,

including

animal i itself. The latter

identity

in

[2]

is because

Li!

=

0,

if k is not an ancestor of i or not

equal

to i. This leads to the

following algorithm

for

computing L,

row

by

row. As each row is

determined,

the contributions to the elements of

A

ii

are accumulated. Row

i of L and

A;z,

are set to 0 before

starting.

The

algorithm keeps

track of a list of

ancestors

ANC

I

,

whose contribution to

A

ii

has

yet

to be included. If the sire or dam is unknown si = 0 or

d

i

=

0,

respectively.

(5)

delete j

from

ANC,

end wliile

The

youngest

animal j

in

ANC

i

is

evaluated,

because all of its progeny in the

pedigree

of animal i must have been

evaluated, hence,

its

L

ij

value is known. The kernel of a Fortran program of the

algorithm

is

given in

the

Appendix.

The list of ancestors is

represented by

a link list. In the

computer

program time is saved

by:

i)

checking

whether one of the

parents

is

unknown, giving

an

inbreeding

coefficient of zero

(otherwise,

the

algorithm

would trace the

pedigree

of the other

parent) ;

and

ii)

if the

pedigree

is sorted such that full sibs have successive identification

numbers,

ie

are evaluated

successively,

only

the first full sib is evaluated and all full sibs obtain

(and have)

the same

inbreeding

coefficient as the evaluated full sib.

In order to compare the

algorithms,

Golden et al’s

(1991)

and Tier’s

(1990)

algorithms

were

programmed

in Fortran. Golden et al

presented

their

algorithm

in

C,

which

they

considered faster.

However,

C was not available to the authors. The

sorting

of the list of descendants in the Golden et al

algorithm

was

performed by

an IMSL routine here

(I1VISL, 1984).

SIMULATION

In order to compare the

algorithms,

the simulation program of Meuwissen and Van

der Werf

(1991)

was used to

generate

a

pedigree

data set. The program simulated

an open

dairy

cattle nucleus scheme for

10,

20 or 40 years, with

population

sizes of

9 267,

15 582 and 28171

animals, respectively

(see

table

I).

Selection was for animal

model BLUP

breeding

values. Nucleus dams and commercial dams

produced

8 and

1

offspring, respectively.

The number of nucleus and commercial sires selected was

4 and

10,

respectively. Also,

a

breeding

program was simulated for 20 years with

selection of

only

one commercial

sire,

representing

a situation with

large

half sib families.

Inbreeding

coefficients were calculated on a micro-VAX 3800 in batch mode. The maximum

physicsl

memory allocated to the batch queue was about 20 Mb. Because

this exceeded the memory

required by

any of the

algorithms,

no disk memory was

used. If disk memory were

used,

the

computational speed

would

depend

on the size

(6)

The number of

generations

in the data-set after

10,

20 and 40 years were

counted and a

weighted

average per individual was

calculated,

where

weighting

was

according

to the contribution of the

path.

For

instance,

if

only

the sire of the

sire and the sire of an animal were

known,

the average number of

generations

was

2.1/4

+

1 ! 1/4

+

0 -1/2

= 0.75. The animal with the maximum average number of

generations

and the average number of

generations

of the animals born in the last

year evaluated were estimated.

RESULTS

The results are

presented

in table II.

Although

the

algorithm

of Golden et al

(1991)

and that

presented

here

require

the same number of additions of

L !D!!

terms, the latter consumed less

computer

time. This is because the

algorithm

of Golden et al:

i)

makes a list of

offspring

of each

animal,

before

starting evaluations;

ii)

traces descendants instead of

ancestors,

which is more difficult because the

number of progeny of each animal is

unlimited,

whereas the number of

parents

is limited

to 2;

iii)

sorts the list of descendants for every

animal,

which is

time-consuming ;

and

iv)

tracing

of

descendants, sorting them,

and addition of

L? -

D

jj

are

not

simultaneously performed.

The Golden et al

algorithm

requires

more memory,

because the list of

offspring

needed and the

tracing

of descendants

requires

memory.

However,

the memory

required

is still linear with the number of animals N. Tier’s

(1990)

algorithm

was faster than the

algorithm presented

here,

when

40 years are

evaluated,

ie evaluation of 12.3

generations

of animals. When many

generations

are

evaluated,

many animals have common ancestors, whose

pedigree

is re-evaluated many times in the

present

algorithm.

Tier’s

algorithm

avoids redundant calculation

by tracing pedigrees

once. The memory

required

is 15.9 lVlb

for 28171 animals. The

presented algorithm required

0.7 NIb is this situation. When about 6 or fewer

generations

are

evaluated,

the

present

algorithm

is faster

than

Tier’s,

because Tier’s

algorithm

first determines which elements of A are

required

before

calculating inbreeding

coefficients. If

only

the animals born in year

40 are

evaluated,

the

algorithm

presented

here and that of Tier

(1990)

have about the same

speed

(table II).

In Tier’s

algorithm,

the

required

elements of A have to be determined for

relatively

few animals. The time

required by

the

algorithm

of Golden

et al

(1991)

equals

that

required

if all animals are

evaluated,

because the

algorithm

re-calculates all

inbreeding

coefficients. The last alternative in table II shows that the presence of

large

half sib families is

advantageous

to Tier’s

algorithm.

This is

because Tier’s

algorithm

traces the

pedigree

of the sire

only

once.

DISCUSSION AND CONCLUSIONS

The

presented

algorithm

for

computing

diagonal

elements of A is a sparse

imple-mentation of an

algorithm presented by

Mrode and

Thompson

(1989)

and

Quaas

(1989)

for the

multiplication

of A with a matrix.

Here,

this matrix is the

identity

matrix and

only diagonal

elements are calculated.

The

algorithm

combines

high

computational speed

with low memory

require-ment,

if the number of

generations

evaluated is not

larger

than,

say 12

(table II).

(7)

In order to

keep

the calculations within the

physical

memory of the computer,

none of the

populations

evaluated in table II were

really large.

If the data set

were much

larger,

Tier’s

(1990)

algorithm especially

would

require

disk

memory

which would decrease its

speed

considerably. However,

the presence of

large

half sib

families favours the use of Tier’s

algorithm

(table II).

If the number of

generations

involved

exceeds,

say

12,

the

algorithm presented

here becomes slow

compared

to

Tier’’s,

because common ancestors are traced many times. In order to overcome

this

problem

we may

compute F

i

=

1/2AS!d! _

L

L!,kLd:kDkk

and store the

k

sire’s row of L

(L

Sik

,

k =

1, ... ,

).

s

i

However,

calculating

the dam’s row of L costs

approximately

the same amount of

computer

time as

calculating

the individual’s

row of

L,

as in

!1].

] .

In

practice,

often

inbreeding

coefficients of many animals in the

population

have been calculated.

Only

those of a few new born animals are unknown. The

presented

(8)

APPENDIX

Integer

arrays

(N

is the number

of

animals in the

population):

PED(1:2, l:l!T)

PED(1, i)

= sire identification no of animal i

PED(2, i)

= dam identification no of animal i

POINT(1:N)

POINT

(i)=

the next oldest ancestor in the link list

= 0 if i is the last ancestor

Real

Arrays:

F(0: N)

F(i)

=

inbreeding

coefficients of animal i.

F(O) =

-1

gives

appropriate

within

family

variances for unknown

parents

by

the formula:

.5-.25*(F (PED(1, i)+PED(2,i)).

F(i)

is

initially

set

equal

to

-1,

which is more accurate than

calculating

F + 1 and then

subtracting

1.

L(1:N)

L(j)

= element

ij

of matrix

L,

if animal i is evaluated

D(1:N)

D(i)

= within

family

variance of animal i

(9)

ACKNOWLEDGMENTS

We are indebted to

Thompson

and referees for

helpful suggestions

and comments. The IB!Iilk

Marketing

Board of

England

and

Wales,

the Meat and Livestock Commission and the

Ministry

of

Agriculture,

Fisheries and Food are

gratefully acknowledged

for their

sponsorship.

REFERENCES

Golden

BL,

Brinks

JS,

Bourdon RM

(1991)

A

performance programmed

method for

computing

inbreeding

coefficients from

large

data sets for use in mixed-model

analyses.

J Anim

Sci,

69,

3564-3573

Henderson CR

(197G)

A

simple

method for

computing

the inverse of a numerator

relationship

matrix used in the

prediction

of

breeding

values. Biometrics

32,

69-83

IMSL

(1984)

International Mathematical and Statistical Libraries.

Houston, TX,

ed 9.2

Meuwissen

THE,

van der Werf JHJ

(1991)

Effects of

heterogeneous

within-herd

variances on the

efficiency

or

dairy

cattle

breeding

schemes.

42th

Ann Meet Eur

Assoc Anim

Prod, Berlin,

Germany

Mrode

R,

Thompson

R

(1989)

An alternative

algorithm

for

incorporating

the

(10)

Quaas

RL

(1976)

Computing

the

diagonal

elements and inverse of a

large

numerator

relationship

matrix. Biometrics

32,

949-953

Quaas

RL

(1989)

Transformed mixed model

equations:

a recursive

algorithm

to eliminate

A-

1

.

J

Dairy

Sci

72,

1937-1941

Références

Documents relatifs

28 En aquest sentit, quant a Eduard Vallory, a través de l’anàlisi del cas, es pot considerar que ha dut a terme les funcions de boundary spanner, que segons Ball (2016:7) són:

The written data on how students completed the analytic scale assessments was collected as well (see Appendix 7). The last session of the teaching unit was the presentation

17 The results from the questionnaire (see Figure 3) show that 42.3% of the students who received help from their peers admitted that a word was translated into

“A banda de la predominant presència masculina durant tot el llibre, l’autor articula una subjectivitat femenina que tot ho abasta, que destrueix la línia de

Mucho más tarde, la gente de la ciudad descubrió que en el extranjero ya existía una palabra para designar aquel trabajo mucho antes de que le dieran un nombre, pero

La transición a la democracia que se intentó llevar en Egipto quedó frustrada en el momento que el brazo militar no quiso despojarse de sus privilegios para entregarlos al

L’objectiu principal és introduir l’art i el procés de creació artística de forma permanent dins l’escola, mitjançant la construcció d’un estudi artístic

L’ inconformisme i la necessitat d’un canvi en la nostra societat, em van dur a estudiar més a fons tot el tema de la meditació, actualment sóc estudiant d’últim curs