Evaluation des pères dans le cas de paternité incertaine

(1)

HAL Id: hal-02717943

https://hal.inrae.fr/hal-02717943

Submitted on 1 Jun 2020

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Evaluation des pères dans le cas de paternité incertaine

Jean Louis Foulley, Daniel Gianola, D. Planchenault

To cite this version:

Jean Louis Foulley, Daniel Gianola, D. Planchenault. Evaluation des pères dans le cas de paternité

incertaine. Genetics Selection Evolution, BioMed Central, 1987, 19 (1), pp.83-102. �hal-02717943�

(2)

Sire evaluation with

uncertain

_paternity

J.L. FOULLEY D. GIANOLA D. PLANCHENAULT’

*

1.N.R.A., Station de

_Génétique

_{quantitative et}

_appliqu

_L

₆

_e

Centre de Recherches

Zootechniques,

F 78350

_{Jouy-en-Josas}

* ’’

Department of

Animal Sciences, University

of

Illinois, Urbana, Illinois 61801, U.S.A.

***

Insiimi

_d’Elevage

et de Médecine Vétérinaire des

_Pays

_Tropicaux,

10, rue Pierre-Curie, F 94704

_{Maisons-Alfort}

Cedex

Summary

A sire evaluation

procedure

is

proposed

for situations in which there is

uncertainty

with respect to the

assignment

of progeny to sires. The method

requires

the

specification

of the

prior

probabilities

P;j that progeny i is out of sire

_j.

Inferences about location _parameters_(«fixed >

environmental and _{group effects and}

_transmitting

abilities of

_sires)

are based on

_Bayesian

statistical

procedures.

Modal values of the

_posterior

distribution of these parameters are taken as

_point

estimators.

_Finding

this mode entails

solving

a nonlinear system of

_equations

and several

algo-rithms are

_suggested.

The

_methodology

is described for univariate evaluations obtained from

normal or

_binary

traits. Estimation of unknown variances is also addressed. A small numerical

example

is

_presented

to illustrate the

procedure.

Potential

_applications

to livestock

_breeding

are

discussed.

Key

words : Sire evaluation, uncertain

paternity, Bayesian

methods.

Résumé

Evaluation des

_pères

dans le cas de

_paternité

incertaine

Une méthode d’évaluation des

pères

est

proposée

en situation d’incertitude vis-à-vis de

l’assignation

des descendants à leurs

_pères.

La méthode

_requiert

la

spécification

des

_{probabilités}

a

priori

pij que le descendant i

provienne

du

_{père j.}

L’inférence des

_paramètres

de

_position

(effets

« _groupe» et de milieu, considérés comme fixes et valeurs

_génétiques

transmises des

_pères)

est

basée sur des

procédures

statistiques

bayésiennes.

Les valeurs modales de la distribution a

posteriori

de

_{ces paramètres}

ont été

prises

comme estimateurs

ponctuels.

La recherche du mode nécessite la résolution d’un

_{système d’équations}

non linéaire pour

_{lequel plusieurs algorithmes}

sont

proposés.

La

_{méthodologie}

est

_développée

dans le cadre _{univariate pour}des caractères normaux et

binaires. Le cas de variances inconnues est

_également

abordé. Un

_petit

_exemple

_numérique

est

présenté

à titre d’illustration. Enfin, les

applications possibles

aux

_espèces

_domestiques

sont

discutées.

(3)

I. Introduction

There are situations such as m

_{multiple-sire}

_matings

under

_pastoral

conditions where sire evaluation is

_complicated

because of

_uncertainty

with

_respect

to the

assign-ment _{of progeny}to sires.

_Using

information from red blood cell

types,

major

histocom-patibility

markers or

_precise

records on

_breeding

_period

and

_{gestation length,}

it is

possible

to

_specify

the

_{probabilities}

_(p

_;j

₎

that a

_{given offspring (i}

=

_1,

...,

n)

has been

sired

_by

different males

_(j

=

_1,

...,

m).

In the absence of such

information,

it is reasonable to state that individual males in a

_given

set, e.g., bulls

breeding

in the same

paddock,

are sires with

_{equal probability.}

This

_problem

was studied

_by

PmVEY & ELSEN

(1984)

within the framework of selection index and its restrictive

_assumptions.

The purpose of this paper is to

_present

a more

_general

and flexible

_methodology

able to

cope with several sources of variation

_including

unknown fixed effects and variance

components.

The

_procedure

is

_along

the lines of linear and nonlinear mixed model

methodology (H

ENDERSON

,

1973 ;

GIANOLA &

F

OULLEY

,

1983a,

b).

Continuous and discontinuous variation are _{examined in this paper}to _{illustrate the power and}

_generality

of the

_approach.

II.

_Normally

distributed data

A.

_Methodology

Consider the usual univariate linear model :

where y is a vector of

_{records, [3}

is an I x 1 vector of « fixed » effects

(e.g., genetic

groups, « nuisance » environmental

factors),

u is an m x 1 vector of random

_transmitting

abilities of

sires,

X and Z are instance

matrices,

and e is a vector of residuals. The matrices X and Z are known

_{(non-random),}

if the sires of the progeny with records in

y are identified. In other

words,

the above model holds

_{conditionally}

on X and Z. Let

_T

_i

_j

define the situation in which

_{male j}

is the true sire of _{progeny i. The} conditional _{distribution of the record y,}

_given

_Yi

_j

_,

the location

_parameters

_p

and u and

the residual variance

_U2

can be written as

where NIID stands for

normal,

independent

and

_identically

distributed ;

_z

_ij

is an m x 1

vector

_having

a 1 in

_{position j}

and 0’s elsewhere. Put

_{wi, =}

_[x,,

_zi

_j]

_,

0’ =

_{[(3’, u’]}

and define _laij =

w’,,O.

Inferences about 0 can be obtained

_conveniently

via

_Bayes

theorem,

and this has also been done in other

_genetic

evaluation

_{problems (RB}

_NNINGEN

_,

1971 ;

D

EMPFLE

,

1977 ;

L

EFORT

,

1980 ;

GIANOLA &

F

ERNANDO

,

1986).

The

_prior

distribution of 0 is «

naturally

» taken as the

_conjugate

of

_[1]

_(Cox

&

HII

NKLEY

,

1974)

so

(4)

where a’ =

_[8’,

_0]

and

It will be assumed from now on that

_{prior knowledge}

about

(3

is vague so as to

mimic the traditional mixed model

_analysis.

Hence,

the

_prior

distribution of 0 is

_strictly

proportional

to the

_marginal

_prior

distribution of u.

_However,

the notation of

_[2]

above is retained to

_present

a more

_{general expression}

for the

_posterior

distribution of the

vector 0. The matrix

_X

_u

= _A

U2

,

where A is the matrix of additive

_{relationships}

between

sires,

and

_u’

_is

the variance between

sires,

equal

to one

_quarter

of the additive

_genetic

variance.

Because the observations are

_{conditionally}

_independent,

the likelihood function can

be written as :

because I p

ij

= _1.The _meanof _thedistribution in

[3B]

is

i

where

_P

_{; =}

_[

_Pil’

.. ,

PiP

&dquo;&dquo;

p

i.]

is a 1 x m row vector

containing

the

probabilities

p,,

of

5£i;

(progeny

i out of sire

_j).

As shown in

_Appendix

A,

the variance of the distribution

[3B]

is

The

_posterior

distribution of 0

_(assuming

that the

_dispersion

parameters

are

known,

can be written

from,

[1],

[2], [3A]

and

_[3B]

as

which is not in the form of a normal distribution.

_Hence,

the mean of this distribution

cannot be a linear function of the data.

The selection rule which maximizes the

_{expected transmitting ability}

of a fixed number of selected sires is the mean of the

_posterior

distribution

_{[4] (G}

_OFFINET

&

E

LSEN

,

1984 ;

FERNANDO &

G

IANOLA

,

1986).

Because the

_expected

value of this distribution is difficult to obtain in closed

_form,

we calculate the modal value of 8 and

(5)

rule in the sense described

above ;

this is a reasonable

_{approximation}

as

_sample

size

increases

(Z

ELLNER

,

1971).

B.

_Computations

Finding

the maximum of

_[4]

with

_respect

to 0

_requires

_setting

to 0 the first derivatives of

_[4]

with

_respect

to this vector.

_Letting

_L(O)

be the

_{log-posterior density,}

we obtain :

and (!

(.)

is the standard normal

_density

function. Observe that

_q

_;j

is the

_posterior

probability

that progeny i is out of sire

_j,

and that this

_probability

is maximum when the residual y; -

w;!6

is null. This is so because in this instance the model under

_:

_£

_;j

would fit

_perfectly

to the data.

_Equating

_[5]

to 0

_gives

a nonlinear

system

of

_equations

on 0 so an iterative

_procedure

is

_required

to solve it.

Although

several

_algorithms

can be used for this purpose, the

_simple

form of

[5]

suggests

to

_implement

a functional iteration.

_Setting

[5]

to 0 and

_rearranging

_{yields :}

because

_prior

information about

_{(3 is}

_{vague and}

2q

;j

=

_{1 ; !}

₌

u’I(T’!

=

(4/h 2) _1,

_where

h

z

2 is

_{heritability.}

Note that the

coefficient

matrix and the

_right-hand

sides

_depend

on 0 as

q

i

,

is a function of

(3

and u ; this is clear from

[6].

Defining :

Q

=

{q

¡j

} :

an n x m matrix of

_{posterior probabilities,}

..

and :

D

c

=

_Diag

{Iq

¡J

:

an m x m

_diagonal

_matrix,

whose elements can be

_thought

of as the i

posterior expected

value of the number of progeny of sire

j,

the above

_system

can be written in terms of the iterative scheme :

where

_[k]

indicates the iterate number. In

_[8],

the matrices

Q

and

_D

_c

are evaluated at

(6)

One

_possible

way of

starting

iteration is to take

_q,j!

=

p

ij

for all values of i and

j.

Thus

₍

_y

= P =

{p;!},

and

₁₎

₁

_:’1

=

_A!

=

_Diag

flp

ijl

,

and these values can be viewed as the

« natural » ones to

_adopt

_prior

to the

data.

In

_{practice, uncertainty}

is

_only

with

respect

to a small subset of the sires that need

to _{be evaluated. The progeny}can _{be classified into 2 groups :}

_I&dquo;

_pertaining

to

individuals

_having

sires

_{unambiguously}

identified,

and

₁

₂

_{corresponding}

to _progenywith

parentage

under «

dispute

».

_Similarly,

sires can be allocated to _{2 groups :}

_J&dquo;

with all their progeny in set

_I&dquo;

and

J,,

with some _{progeny in}

_I,

and some _{progeny in}

₁

₂

_.

The data vector can be

_partitioned

into three

_mutually

exclusive and exhaustive

compo-nents :

because

the set

_{i

E

_i,

_{f1 j}

E

_J,}

is

_empty.

The vector of

_transmitting

abilities can be

partitioned

as

_[u&dquo;

u

2]

,

_{corresponding}

to sires in

_J,

and

_J,,

_{respectively,}

so.

Likewise

correspond

to the three

_partitions

in

_[9]

above. Further

with

_Z&dquo;

_{_}

_lp

_ij

= ₀_or

11, Z

12

=

fp

ij

= ₀_or

1}, Q22 = 10

<

_q,

<

_11,

_P

₂₂

=

₁₀

_<

p

ij

<

_11,

as _per

the

_partitions

in

_[9].

_Using

this

notation,

equations [8]

become :

where

_D!zz

is a

_diagonal

matrix with elements calculated as before but for the _progeny

and sires in the third

_partition

of

_[9].

_Again,

iteration can be started

_by

_replacing

the

«

_posterior

_{» Q}

and D matrices in

[ll],

_by

their «

_prior

»

_{counterparts,}

P and

A,

of

appropriate

order. The above

_equations

illustrate

_clearly

the modifications needed in the mixed model

_equations

to take into account uncertain

_paternity.

The

_portions

in the coefficient matrix and

_right-hand

sides

_pertaining

to records where

_paternity

is

(7)

unambi-guous

(y&dquo;

and

y

jz

)

are the usual ones. The incidence matrix

_Z22

that would arise if

paternity

of animals with records in Y22 were

_certain,

is

_{replaced by}

a matrix

Q

of

posterior probabilities.

These are

_updated

_during

the course of iteration to take into

account the contribution of the data.

Likewise,

Z!2Z22

is

_replaced

_by

the D

_matrix,

which is a function of the

_{posterior probabilities}

_q

_ij

_,

as

_already

indicated. Because

_Q

₂₂

is

usually

a small

matrix,

_[8]

or

_[11]

_{will converge}

_rapidly.

If functional iteration is slow to

converge,

algorithms

such as

_{Newton-Raphson}

can be

_{employed (Appendix B).}

III.

_Binary

data

A.

_Methodology

The data are now

_binary

_responsesso _y_i = ₀_or_1._{The model used here is based}on

the

_concept

of «

_liability

»

originally developed by

WRIGHT

(1934),

where it is assumed

that there is an

_underlying

normal variable rendered

binary

via an

_abrupt

threshold. Genetic evaluation

_procedures

based on threshold models have been discussed

_by

several authors

_(G

_IANOLA

&

F

OULLEY

,

1983a,b ;

FOULLEY et

_{aI. , 1983 ;}

FOULLEY &

GI

A

NOL

A

,

1984 ;

HARVILLE &

_M

_EE

_,

1984 ;

ILMOURG et

_C

_{lI. ,}

1985 ;

HBSCHELE et

1 1I. ,

1986).

The notation of the

_preceding

section is

retained,

with the

_{understanding}

that the

parameters

are now those of the

_underlying

distribution. The conditional distribution of

a

_binary

_{response is taken}as :

where

_<1>(.)

is the standardized normal cumulative distribution function. The

_parameter

IJ

-ij is the difference between the threshold and the mean of the statistical «

sub-population

» defined

by

indexes

i, j

J (GIANOLA

&

_F

_OULLEY

_,

1983a)

expressed

in units of

standard deviation.

_Assuming

the

_prior

distribution is as in

_[2]

and

_replacing

the normal

density

in

_[3B]

_by

_[12],

the

_{posterior density}

can be written as :

because the residual standard deviation is

_equal

to 1.

Finding

the 9 - mode of

_[13]

involves

_solving

a

_system

with a

_higher

order of

nonlinearity

than the one

_stemming

from

_[5]

so

_{Newton-Raphson}

is used here instead

(8)

Letting

the

_{Newton-Raphson equations}

can be written after

_algebra

as :

where the variance ratio À =0

_{11 <r.}

because the residual variance is

_{unity, .:1pl}

_k

_{l =}

_pl

_k

_l

-

P! ’!,

.:1u lk ! = ulkl -

U

[k

-1

1 ,

and

_l

_m

_,

_In

are vectors of ones of

_appropriate

order. One

_possible

_way

to start iteration would be to use

_equations

_[8]

with

_Q

_{replaced by}

P,

D,

replaced by

.:1c,

and y

replaced by

a vector of 0 and 1’s

_indicating

the absence or _{presence of the}

attribute in the progeny in

question.

The values of

₁₃

and u so obtained would be used

to calculate

_1T¡j

and

_r

_ij

in

_[16]

and

_[17]

to then

_{proceed iterating}

with

_[18]

above.

B.

_Analogy

with the normal case

Write ₇_,,in

_[16]

as

The

_expression

_q!

is

_{directly comparable}

to

_q

_ij

of

_[6]

for the normal case. Both can be

interpreted

as the

_{posterior probabilities}

_{that progeny i is}out of sire

_j,

and are similar

to formulae

_arising

in multivariate classification

_problems

_(L

_INDEMAN

et

_{al., 1980,}

_p.

196).

In the discrete case and

_given

_Y

_ij

_,

if _ui_j is

_large

_{progeny i would be}

_expected

to

respond

with

_{high probability}

in the first

_{and q*,}

will be

_larger

when the response is

actually

in the first rather than in the second

category.

The

_expression

for

v

jj

(with

a minus

sign)

is the « normal score » discussed

_by

GIANOLA & FOULLEY

(1983a,

p.

216 ;

1983b,

p.

143).

IV. Estimation of unknown variances

The

_point

estimators of location described above are the modes of

_posterior

(9)

in the situation of

_binary

_{responses. When these variances}are

unknown,

Box & TIAO

(1973)

and O’Hncnrr

₍₁₉₇₆₎

have

_given

_arguments

_indicating

that inferences could be made from the distribution

_f(Olul

=

8 j,

_u!

=

8[

),

where the variances are

_replaced

_by

the

modal values of the

_{marginal posterior}

distribution of the variances. In the absence of

prior

information about the

variances,

these modal values are those obtained from the method of restricted maximum likelihood

_(H

_ARVILLE

_,

1974,

1977).

This

_approach

was

employed by

GrnrroLn et al.

₍₁₉₈₆₎

in the context of

_{optimum prediction}

of

_breeding

values and these authors view the

_resulting

_predictors

as

_belonging

to the class of

empirical Bayes

estimators. The

_{general principles}

involved in

_finding

the modal values of the

_posterior

distribution of the variances are

_given

below.

F

OULLEY et al.

₍₁₉₈₆₎

and GIANOLA et al.

₍₁₉₈₆₎

showed that maximization of

_f(

_G.

₁

_,

u:.ly)

with

_respect

to the variances in the absence of

_prior

information about these

parameters

leads to the

_{equations :}

where

_E!

indicates

_expectation

with

_respect

to the distribution

_f(ul<T},

_Q

_{;, y).}

Further,

and now

_taking

_expectation

with

_respect

to

f(6!aj’,

Q

u, y),

we need to

_{satisfy :}

The derivation is based on the

_{decomposition}

of the

_posterior

distribution of all

unknowns,

f([3,

u,

_u,

₂

_,,

cr.21y),

into

It should be noted that the likelihood function does not

_depend

on

_{u 2,}

which is true

both in the normal and

_binary

cases.

_Also,

when flat

_priors

are taken for the

_variances,

f(I.T!)

and

_f(<7!)

do not _appearin the above

_{decomposition.}

Solving

[19}

and

_{[20] simultaneously}

for the unknown variances leads to an iterative

scheme

_involving

the

_{expressions :}

where

o k is iterate

_number,

9 C is the inverse of the coefficient matrix in

_{Newton-Raphson (Appendix B),}

or

of

_[18]

when observations are

_binary,

o

C!,,

is the submatrix of C

_{corresponding}

to the

u-effects,

o M is the coefficient matrix in

_[8]

or

[18]

without

_A-’X,

. W =

[X, Q]

It should be noted that in the

_binary

case the residual variance is not estimated because it is taken as

_equal

to one. The derivation of

_[22]

is

_given

in

_Appendix

C.

_Equation

(10)

[21],

however,

holds in both cases. The conditional

_expectations

are taken as if the

« true » values of the variance

_components

were those found in the

_previous

iteration. As

_pointed

out

_by

GIANOLA et al.

_(1986),

_[20]

and

_[21]

arise in the EM

_algorithm

(D

EMPSTER

et

_al.,

₁₉₇₇₎

when

_applied

to estimation

_by

restricted maximum

likelihood,

and the

_resulting

estimates are never

_negative.

V. Numerical

application

A small data set from a _progenytest of Blonde

_{d’Aquitaine}

sires carried out in France was used to illustrate the methods

_presented

_{in this paper. The data}set is the

same as the one utilized

_by

FOULLEY et al.

_(1983),

with some’

_{modifications,}

as

illustrated in table 1. There were 47

_calving

records

_including

information on

_region

of

origin

of the

_heifer,

_calving

season, sex and sire of

_calf,

and birth

_weight

(BW)

and

calving

ease

_(CE)

as _{response variables.}CE was recorded as an all-or-none trait with

«

easy » and « difficult »

_calvings

coded as 0 or

1,

respectively.

As shown in table

1,

paternity

was uncertain in the case of records

1,

2,

3 and 39. For the first three

records,

information on

_{breeding periods}

and

_{gestation lengths}

led to an

_assignment

to

natural service sires 7 and 8 of

_{probabilities}

_equal

to

1 and

3 ,

_{respectively.}

In the

4 4

case of record

39,

artificial insemination sires 1 and 2 were

_assigned

_{probabilities}

of

1 1

and 2 ,

respectively.

2 2

A. Model

Birth

_weight

was

_regarded

as

_following

a normal

distribution,

and CE was treated

as a binomial trait. Both traits were

_{analyzed using}

the model

where

_H

_;

is the effect of

_region

i of

_origin

of heifer

_(i

=

_1,

_2),

A

j

is the effect of the

_jth

season of

_calving

_(j

=

_1,

2), S,

is the effect of _sexof calf k

(k

= _{1 for males}_or₂_for

females), f,

is the

_{transmitting ability}

of the lth sire of heifer

₍₁

=

1,

.. ,

8),

and

e

ijkl

is a

residual with variance

_uj.

The vectors

_p

and u were

Prior

_knowledge

about

!3

was assumed to _{be vague.}

_Heritability

was .25 for both

traits,

and

₍

_T

_e2

was 5

_kg

for BW and 1 for

_CE,

the discrete trait. In

_forming

the

relationship

matrix

A,

it was assumed that the artificial insemination sires

_{(1 through 6)}

were

_unrelated,

and that the natural service sires 7 and 8 were non-inbred sons of 5

(11)

(12)

In this

_example,

the sets needed to define

_[9]

were :

_1,

=

11,

2, 3,

39} (with

I,

being

the

_complement),

_{J, =}

{2,

3, 4,

5}

and

_{J, =}

_{l,

6,

7,

8}.

Thus,

the matrix

_P,,

in

_[10]

was

For

BW,

the nonlinear

_system

_[11]

was solved

_using

3

algorithms :

functional

iteration,

and

_{Newton-Raphson}

and

_scoring

as described in

Appendix

B. For

CE,

computations

were carried out with

_{[18] ;}

_starting

values were calculated as

discussed

earlier.

Iteration

_stopped

when the square root of the average

squared

correction was less than

10-5

. Variance

_components

were estimated for both traits

using

the

_procedures

outlined in section III.

Sire evaluations

_ignoring

_uncertainty

on

_paternity

were also calculated so as to

further illustrate the

_procedures.

This was done

_{by assigning progenies}

1, 2,

3 to sire 1 and record 39 to sire 6.

B. Results

Results of the

_analysis

conducted for BW are

_presented

in table 2.

_Irrespective

of

the

_algorithm

_used,

the

_stopping

rule of 10-1 was satisfied in 4 iterations. The fact that

the

_algorithms

were

_equally

fast to _{converge is}

_undoubtedly

related to the limited

extent of

_{nonlinearity,}

as

_only

4 out of 39 records had

_ambiguous

_{parentage. Further,}

a

«

sharp

»

_assignment

of

probabilities

(—vs. —)

was

made in 3 out of the 4 records. In B4 4 4 /

this data set, from a

_practical

_point

of view iteration could have

_stopped

at the second round. Differences between the

_analyses

conducted

_{ignoring uncertainty}

and

_taking

it into account were minimal. Sire 1 was the most affected because 3 records

_assigned

to

him in the case of certain

_paternity

were

_assigned

to sires 7 or 8 when

_paternity

was

uncertain.

The

_analysis

of

_calving<

ease is shown in table 3. When

_paternity

was

certain,

5

iterates were

_required

to _{converge. On the other}

_hand,

13 iterations were

_required

when

_uncertainty

was taken into account. This is so because with a

_binary

trait there are 2 sources of

_nonlinearity

when

_paternity

is uncertain : one due to the fact that the model is

_nonlinear,

and the second due to the

_uncertainty

itself. The second source of

nonlinearity

was

_responsible

for the 8 additional iterations.

Estimates of variance

_components

in this

_example

were

<1!

= _{0 and}

<1;

= 22.81 for

BW,

and afl =

.096 for CE. The latter value

_gives

an estimate of

_heritability

of .35 in the

_underlying

scale. For

BW,

more than 400 iterates were needed for estimates of variance

_components

to _{converge, and 193 iterations}were

_required

for CE. It is well

known that the EM

_algorithm

is

_extremely

slow to _converge

_(T

_HOMPSON

_,

_1979),

especially

in small

_samples.

However,

alternative

_{parameterizations}

of the model or

numerical shortcuts

_(e.g.,

_S

_CHAEFFER

_,

_{1979 ;}

MISZTAL &

S

CHAEFFER

,

1986)

can be used

(13)

(14)

VI. Discussion

The

_impact

of the extent of misidentification on sire evaluation and on estimates of

genetic

parameters

was studied

_by

VAN VLECK

_(1970a,b)

and BoNmTt

_(1975).

These authors found that misidentification of sires biased downwards estimates of

_heritability

and of

_expected

_genetic

_{progress. Biases in evaluation of sires increased}as the fraction

of misidentified animals increased.

The

_approach

followed in the

_present

_study,

as in PomEY & ELSEN

(1984),

is to

directly

take into account in the

_{analysis uncertainty}

on the

_assignment

_{of progeny}to

sires so as to

_improve

_prediction

of

_breeding

values.

However,

PowEV & ELSEN

₍₁₉₈₄₎

studied the

_problem

in a selection index framework which

_{requires knowledge}

of means

and variances. The issue was adressed here in a more

_general

manner so as to

accommodate different

_types

of distribution

_(normal

or

_binomial),

and less restrictive

states of

_knowledge

vis-a-vis fixed effects and variance

_components.

Using

the

_{Bayesian paradigm}

as in GIANOLA et al.

_(1986),

leads to inferences based

on a

_posterior

distribution with the

_uncertainty

«

integrated

» or

_averaged

out. With the

p

and u

_components

of the mode of the

_posterior

distribution taken as

_point

estimators and

_predictors

of fixed and random

effects,

respectively,

a nonlinear

_system

_of equa-tions is obtained.

_Algorithms

for

_solving

these

_equations

are _{discussed in the paper.}

Variance

_components

were estimated from the

_{joint posterior}

distribution of the variances after

_taking

into account

_uncertainty

in the

_assignment

_{of progeny}to sires. The

_point

estimators chosen were the modal values of this

distribution ;

expressions

for

computing

the estimates

_iteratively

were

_presented.

Is it

_possible

to use

_directly

the mixed model

_equations

to obtain standard best linear unbiased

_predictors

when

_paternity

is uncertain ? The best linear unbiased

predictor

of u is

_{given by}

where V is the variance covariance matrix of the records

_(H

_ENDERSON

_,

₁

_973).

From results in

_Appendix

_A,

the

_diagonal

elements of V in the continuous case are

and the

_{off-diagonals}

are

where

_a

_jr

is the additive

_relationship

between

_{sires j}

and

_j’.

It follows that V is not in theformZGZ’ +

_{R needed to put V-’}

= R-1 -

_R-

₁

_Z(Z’R-

_I

_Z

₊

_{G-’)-’Z’R-’ so as to establish}

the

_equivalence

between the best linear unbiased

_predictor

above and the results

_given

by

the mixed model

_equations

_(H

_ENDERSON

_,

_1984).

It is not obvious how to treat the

problem

of uncertain

_paternity

_using

standard

_techniques.

The

_Bayesian

_solution pre-sented

_here,

on the other

_hand,

offers a clear answer. POIVEY & ELSEN

₍₁₉₈₄₎

discussed situations in which the methods

_presented

here could be

_applied.

These include :

_i)

females

_{exposed simultaneously}

or

_successively

in time to _{groups of}

_{males ;}

_{ii) joint}

use

of artificial and natural

_breeding

in

_sheep

flocks and cattle herds in

_conjunction

with

estrous

_{synchronization}

_{techniques ;}

and

_{iii) heterospermic}

_progeny

_testing

with ambi-guous

parentage.

A

_requirement

of the

_procedure

is the

_{specification}

of

_prior

(15)

probabili-ties

_p

_ij

which can be based on external information such as biochemical

_{po)ymorphysms}

or, more

_likely

under

extensive

_conditions,

records on

_breeding

dates and

_gestation

lengths.

In

_particular,

_{the methods described here may be}

_potentially

useful in situa-tions where natural service

sires

are used

_extensively,

_e.g.,

_{pastoral production}

_systems.

The

_computations

are

_feasible,

at least for univariate sire evaluations carried out under the

_assumption

of

_normality

and with

_genetic

_parameters

assumed known. Extensions to

the multivariate situation can be done without

_great

_conceptual

_difficulty.

Received

_February

27,

1986.

Accepted

June

_24,

1986.

Acknowledgements

This research was conducted while J.L. FOULLEY was a

_George

A. Miller

_Visiting

Scholar at

the

_University

of Illinois, on sabbatical leave from INRA. J.L. Fouc.r.EV wishes to thank the Direction des Productions animales and Direction des relations internationales, INRA, for their

support. D. GIANOLA

_acknowledges

_supportfrom the Illinois

_{Agriculture Experiment}

Station and of Grant No. US-805-84 from BARD-The United States-Israel Binational

_Agricultural

Research

and

_Development

Fund.

References

B

ONAITI B., 1975. Influence du taux d’erreur dans les filiations

_paternelles

sur la valeur de la

selection sur descendance chez les ovins. 7!’ JournEes de la Recherche Ovine et

_Caprine,

Paris, 2-4 decembre 1975, SPEOC-ITOVIC, 166-172.

Box G.E.P., TIAO G.C., 1983.

_{Bayesian inference}

in statistical

analysis.

588 pp.,

Addison-Wesley,

Reading,

Massachussets.

Cox D.R., HINKLEY D.V., 1974. Theoretical statisties. 511 _pp.,

_Chapman

and Hall, London. D

EMPFLE L., 1977. Relation entre BLUP et estimateurs

_bayesiens.

Ann. Genet. Sel. Anim., 9, 27-32.

D

EMPSTER A.P., LAIRD N.M., RUBIN D.B., 1977. Maximum likelihood from

_incomplete

data via

the EM

_algorithm.

J.

Roy.

Statis. Soc. , B, 39, 1-38.

F

ERNANDO R.L., GmNOCn D., 1986.

_{Optimal properties}

of the conditional mean as a selection

criterion. Theor.

_Appl.

Genet.

_(submitted).

F

OULLEY J.L., GIANOLA D., THOMPSON R., 1983. Prediction of

genetic

merit from data on

_binary

and

_quantitative

variates with an

_application

to

_{calving difficulty,}

birth

_weight

and

_pelvic

opening.

Genet. Sel. Evol., 15, 401-424.

F

OULLEY J.L., GIANOLA D., 1984. Estimation of

_genetic

merit from bivariate « _allor none »

responses. Genet. Sel. Evol., 16, 285-306.

F

OULLEYJ.L., 1M S., GIANOLA D., HÖSCHELE L, 1987.

Empirical Bayes

estimation of parameters for n

_{polygenic binary}

traits. Genet. Sel. Evol., 19

_{(in press).}

G

IANOLA D., FOULLEY J.L., 1983a. Sire evaluation for ordered

_categorical

data with a threshold model. Genet. Set. Evol., 15, 201-224.

GmNOLn D., FOULLEY J.L., 1983b. New

_techniques

of

_prediction

of

_breeding

value for disconti-nuous traits. Proc. 32nd Annual National Breeders Round-table, St. Louis, Missouri,

May

6, 1983, 28 _{pp., Mimeo.}

(16)

G

IANOLA D., FERNANDO R.L., 1986.

_Bayesian

methods in animal

_{breeding theory.}

J. Anim. Sci., 63, 217-244.

G

IANOLA _D.,FOULLEY J.L., FERNANDO R.L., 1986. Prediction of

_breeding

values when variances

are not known. G6n6l. Sel. Evol.

(submitted).

G

ILMOUR_A.R.,ANDERSONR.D., RAE A.L., 1985. The

analysis

of binomial data

_by

a

_generalized

linear mixed model. _{Biometrika, 72,}593-599. G

OFFINETB., ELSEN J.M., 1984. Critere

_optimal

de selection :

quelques

résultats

généraux.

Gen9t. Sel. Evol., 16, 307-318.

H

ARVILLE D.A., 1974.

_Bayesian

inference for variance _components

_{using only}

error contrasts.

Biometrika, 72, 593-599.

H

ARVILLE D.A., 1977. Maximum likelihood

_approaches

to variance _componentestimation and to

J. Am. Stat. Assoc., 72, 320-338.

H

ARVILLE D.A., MEE R.W., 1984. A mixed model

procedure

for

analyzing

ordered

categorical

data. Biometrics, 40, 393-408. H

ENDERSON C.R., 1973. Sire evaluation and

genetic

trends.

Proceedings of

the Animal

Breeding

and Genetics

_Symposium

in Honor

_of

Dr.

Jay

L. Lush. American

Society

of Animal Science

and American

Dairy

Science Association, 10-41,

Champaign,

Illinois. H

ENDERSON C.R., 1984.

Applications of

linear models in animal

breeding.

462 pp.,

University

of

Guelph,

Ontario.

H6sCHELE _Ina,FOULLEY J.L., COLLEAU J.J., GIANOLA D., 1986. Genetic evaluation for

multiple

binary

responses. Genet. Sel. Evol. , 18, 299-320.

L

EFORT G., 1980. Le modèle de base de la selection,

justification

et limites. In

_Legay

J.M. et al.

(ed.),

Biometrie et

_G6n

_g

_tique,

4, 1-14, Societe

Franqaise

de Biométrie. INRA,

D6partement

de Biométrie.

L

INDEMAN R.H., MERENDA P.F., GOLD R.Z., 1980. Introduction to bivariate and multivariate

analysis.

444 pp., Scott, Foresman & Co., Glenview, Illinois.

M

ISZTAL I., SCHAEFFER L.R., 1986. Nonlinear model for

describing

convergence of iterative methods of variance _componentestimation. J.

_Dairy

Sci., 69, 2209-2213.

O’H

AGAN A., 1976. On

_{posterior joint}

and

marginal

modes. Biometrika, 63, 329-333.

P

OIVEY J.P., ELSEN J.M., 1984. Estimation de la valeur

_génétique

des

_{reproducteurs}

dans le cas

d’incertitude sur les _{apparentements.}I.. Formulation des indices de selection. Génét. Sel.

Evol. , 16, 445-454. R6

NNINGEN K., 1971. Some

_properties

of the selection index derived

_by

« Henderson’s _mixed

model method ». Z. Tierz.

Ziichtungsbiol.,

_88,186-193.

S CHAEFF E

R L.R., 1979. Estimation of variance and covariance components for average

daily gain

and backfat thickness in swine. In : Searle S.R., Van Vleck L.D.

(ed.),

n Variance

compo-nents and animal

_breeding

». Proc.

_of

a

_Conference

in Honor

_of

C.R.

Henderson,

123-137. Cornell

_University,

Ithaca, New York.

T

HOMPSON R., 1979. Sire evaluation. Biometrics, 35, 339-353.

V AN

VLECtc L.D., 1970a. Misidentification in estimating the

paternal

sib correlation. J. _DairySci., 53, 1469-1475.

V

AN VLECK L.D., 1970b. Misidentification and sire evaluation. J.

_Dairy

_{Sci., 53, 1697-1702.}

WRIGHT

S., 1934. An

analysis

of

_variability

in number of

digits

in an inbred strain of

_{guinea pigs.}

Genetics, 19, 506-536.

Z

ELLNER A., 1971. An introduction to _Bayesian

_inference

in econometrics, 431 pp., John

Wiley

&

(17)

Appendix

A

Variance-covariance structure

_of

the data

_{(frequentist viewpoint)}

The

_starting

_point

is

_[3B].

As shown in the text,

omitting

the

_conditioning

on

₍

_T

_.

₂

for the sake of

_{simplicity :}

The variance of the distribution can be obtained

_by

_{writing :}

which follows from

_[1]

and

_[All.

Likewise

where the covariance is taken with

respect

to the

_joint

distribution of

_-Tii

and

_Y,.,.

The first term in the above

equations

is

_clearly

null because the observations are

conditio-nally independent. Assuming

P(:£

¡j

fl

_:£

_i’k

₎

₌

_Pi(Pik’

we

_get

We consider now the variance-covariance structure

unconditionally

on 0.

_Applying

the same

strategy,

one can write :

’

The second term is the variance of _j_n; taken with

_respect

to the distribution of 0.

Arguing

from a classical

_viewpoint

_(P

fixed,

u

_random)

we have :

From !A2]

because

_Z’

_j

A

_Z,

= ₁

_(if

_sires_are_not

_inbred),

_and

Yp

ij

z

ij

=

p,

Collecting [A5]

and

[A6]

into

_[A4]

_gives

(18)

Finally,

we consider the unconditional covariances between records y; and y;..

Writing :

we

observe

as before that the first term is null.

_Also,

from

_{[All :}

It should be observed that

_Var(y)

cannot be written as

_R

_Q

_e

+

_PAP’u!

because the

diagonal

elements of this last matrix

_expression

are not

_equal

to

_[A7]

except

in the trivial case P =

_Z,

_i.e.,

_when

_paternity

is certain.

Appendix

B I

Newton-Raphson

and

_scoring

_{algorithms for}

normal data The

_{Newton-Raphson algorithm}

consists in

_iterating

with :

where 0!&dquo;! =

O

lk

) -

8!’&dquo; and O

lk

)

is the solution _atiteration k. The first derivatives are

given

in

_[5]

and the second derivatives are :

From the definition of

_q;!

in

_[6],

we have :

Using

this in

_[B2]

above and

_{rearranging gives :}

(19)

Some

_{simplification}

in the calculations can be achieved

_{by replacing}

_r;!

_by

its

expectation

taken

_{conditionally}

on

0,

Q

?

and

_;

_£i

_j’

From

_[B4]

we obtain

_{directly :}

This used in

_[B5]

above

_yields

a «

scoring

»

_algorithm

for

_solving

_{the nonlinear}

system

of

_equations.

The

_system

_[B5]

can be written in matrix notation

_using

the matrices

R, E!,

and

_E,

of page 89 with

r,

j

as in

[B4]

instead of

[17].

Also,

define matrices

The _system

_[B5]

becomes then

The

_algorithm

is also described for the case where uncertain

_paternity

is

_only

with respect to a small

_proportion

of the sires evaluated.

Here,

we

_partition

R in the same

(20)

With the above

notation,

the

_system

in B6 can be written as :

The above

_equations

indicate the

_parts

of the

system

that need to be amended to

take uncertain

_paternity

into account. As

before,

the

_nonlinearity

stems from the contribution of the vector _Yn to information about the unknown

parameters.

In order

to start

iteration,

one may take RIOj =

_P,

E!

OI

=

A,,

E’&dquo;’

=

_A!,

and values of the _vectors

_(3,

u, and u, obtained

by

applying

linear mixed model

_methodology

upon the vectors Y and

Yi,-Appendix

C

Derivation

_of

the

_algorithm

used

_{for estimating}

_Qr

with normal data The estimator of

_a,

₂

needs to

_satisfy

_[20].

From

_[3A]

and

_[3B]

(21)

Using

this result in

_[Cl]

and then in

_[201

_{gives :}

where

_q

_ij

is as in

[6],

and where

_E,

indicates

_expectation

taken with

_respect

to the conditional distribution

_f(Olu,,

_{a!, y).}

The

_expectation

in

_[C2]

is difficult to obtain because

_q

_ij

is a function of 0. If _q_ij is

_regarded

as a constant a

_{rearrangement}

of

_[C2]

gives :

As done

_by

HARVILLE & MEE

₍₁₉₈₄₎

in the context of threshold

models,

we

_replace

E(¡L;,

=

w’, 6!j’,

a’,

y) by

the mode

6

calculated

_{using equations [8] (or}

the

_expressions

described in

Appendix

_B).

Thus, the formula above becomes :

Now,

Var(0(y) =

C

_u

_j

_,

where C is the inverse of the coefficient matrix in

_[B6]

or

[B7]

evaluated at

6. _Further,

at the

maximum,

[5]

must be null which is satisfied when

[7]

is satisfied.

_Multiplying

both sides of

_[7]

_{by 6}

_(and

_remembering

that a flat

_prior

is

used for

_!3)

_yields

With this in

mind,

we obtain the

_following

for the

_components

of

_[C3]

where M is the coefficient matrix in

_[8J

without A-’. ’.