HAL Id: hal-02717943
https://hal.inrae.fr/hal-02717943
Submitted on 1 Jun 2020
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Evaluation des pères dans le cas de paternité incertaine
Jean Louis Foulley, Daniel Gianola, D. Planchenault
To cite this version:
Jean Louis Foulley, Daniel Gianola, D. Planchenault. Evaluation des pères dans le cas de paternité
incertaine. Genetics Selection Evolution, BioMed Central, 1987, 19 (1), pp.83-102. �hal-02717943�
Sire evaluation with
uncertain
paternity
J.L. FOULLEY D. GIANOLA D. PLANCHENAULT’
*
1.N.R.A., Station de
Génétique
quantitative et
appliqu
L
6
e
Centre de RecherchesZootechniques,
F 78350Jouy-en-Josas
* ’’
Department of
Animal Sciences, Universityof
Illinois, Urbana, Illinois 61801, U.S.A.***
Insiimi
d’Elevage
et de Médecine Vétérinaire desPays
Tropicaux,
10, rue Pierre-Curie, F 94704Maisons-Alfort
CedexSummary
A sire evaluation
procedure
isproposed
for situations in which there isuncertainty
with respect to theassignment
of progeny to sires. The methodrequires
thespecification
of theprior
probabilities
P;j that progeny i is out of sirej.
Inferences about location parameters (« fixed >environmental and group effects and
transmitting
abilities ofsires)
are based onBayesian
statisticalprocedures.
Modal values of theposterior
distribution of these parameters are taken aspoint
estimators.
Finding
this mode entailssolving
a nonlinear system ofequations
and several algo-rithms aresuggested.
Themethodology
is described for univariate evaluations obtained fromnormal or
binary
traits. Estimation of unknown variances is also addressed. A small numericalexample
ispresented
to illustrate theprocedure.
Potentialapplications
to livestockbreeding
arediscussed.
Key
words : Sire evaluation, uncertainpaternity, Bayesian
methods.Résumé
Evaluation des
pères
dans le cas depaternité
incertaineUne méthode d’évaluation des
pères
estproposée
en situation d’incertitude vis-à-vis del’assignation
des descendants à leurspères.
La méthoderequiert
laspécification
desprobabilités
apriori
pij que le descendant iprovienne
dupère j.
L’inférence desparamètres
deposition
(effets
« groupe » et de milieu, considérés comme fixes et valeurs
génétiques
transmises despères)
estbasée sur des
procédures
statistiques
bayésiennes.
Les valeurs modales de la distribution aposteriori
deces paramètres
ont étéprises
comme estimateursponctuels.
La recherche du mode nécessite la résolution d’unsystème d’équations
non linéaire pourlequel plusieurs algorithmes
sontproposés.
Laméthodologie
estdéveloppée
dans le cadre univariate pour des caractères normaux etbinaires. Le cas de variances inconnues est
également
abordé. Unpetit
exemple
numérique
estprésenté
à titre d’illustration. Enfin, lesapplications possibles
auxespèces
domestiques
sontdiscutées.
I. Introduction
There are situations such as m
multiple-sire
matings
underpastoral
conditions where sire evaluation iscomplicated
because ofuncertainty
withrespect
to theassign-ment of progeny to sires.
Using
information from red blood celltypes,
major
histocom-patibility
markers orprecise
records onbreeding
period
andgestation length,
it ispossible
tospecify
theprobabilities
(p
;j
)
that agiven offspring (i
=1,
...,
n)
has beensired
by
different males(j
=1,
...,
m).
In the absence of suchinformation,
it is reasonable to state that individual males in agiven
set, e.g., bullsbreeding
in the samepaddock,
are sires withequal probability.
Thisproblem
was studiedby
PmVEY & ELSEN(1984)
within the framework of selection index and its restrictiveassumptions.
The purpose of this paper is topresent
a moregeneral
and flexiblemethodology
able tocope with several sources of variation
including
unknown fixed effects and variancecomponents.
Theprocedure
isalong
the lines of linear and nonlinear mixed modelmethodology (H
ENDERSON
,
1973 ;
GIANOLA &F
OULLEY
,
1983a,
b).
Continuous and discontinuous variation are examined in this paper to illustrate the power andgenerality
of the
approach.
II.
Normally
distributed dataA.
Methodology
Consider the usual univariate linear model :where y is a vector of
records, [3
is an I x 1 vector of « fixed » effects(e.g., genetic
groups, « nuisance » environmental
factors),
u is an m x 1 vector of randomtransmitting
abilities ofsires,
X and Z are instancematrices,
and e is a vector of residuals. The matrices X and Z are known(non-random),
if the sires of the progeny with records iny are identified. In other
words,
the above model holdsconditionally
on X and Z. LetT
i
j
define the situation in whichmale j
is the true sire of progeny i. The conditional distribution of the record y,given
Yi
j
,
the locationparameters
p
and u andthe residual variance
U2
can be written aswhere NIID stands for
normal,
independent
andidentically
distributed ;
z
ij
is an m x 1vector
having
a 1 inposition j
and 0’s elsewhere. Putwi, =
[x,,
zi
j]
,
0’ =[(3’, u’]
and define laij =w’,,O.
Inferences about 0 can be obtainedconveniently
viaBayes
theorem,
and this has also been done in other
genetic
evaluationproblems (RB
NNINGEN
,
1971 ;
D
EMPFLE
,
1977 ;
L
EFORT
,
1980 ;
GIANOLA &F
ERNANDO
,
1986).
Theprior
distribution of 0 is «naturally
» taken as theconjugate
of[1]
(Cox
&HII
NKLEY
,
1974)
sowhere a’ =
[8’,
0]
andIt will be assumed from now on that
prior knowledge
about(3
is vague so as tomimic the traditional mixed model
analysis.
Hence,
theprior
distribution of 0 isstrictly
proportional
to themarginal
prior
distribution of u.However,
the notation of[2]
above is retained topresent
a moregeneral expression
for theposterior
distribution of thevector 0. The matrix
X
u
= AU2
,
where A is the matrix of additiverelationships
betweensires,
andu’
is
the variance betweensires,
equal
to onequarter
of the additivegenetic
variance.Because the observations are
conditionally
independent,
the likelihood function canbe written as :
because I p
ij
= 1. The mean of the distribution in[3B]
isi
where
P
; =
[
Pil’
.. ,PiP
&dquo;&dquo;
p
i.]
is a 1 x m row vectorcontaining
theprobabilities
p,,
of5£i;
(progeny
i out of sirej).
As shown inAppendix
A,
the variance of the distribution[3B]
isThe
posterior
distribution of 0(assuming
that thedispersion
parameters
areknown,
can be writtenfrom,
[1],
[2], [3A]
and[3B]
aswhich is not in the form of a normal distribution.
Hence,
the mean of this distributioncannot be a linear function of the data.
The selection rule which maximizes the
expected transmitting ability
of a fixed number of selected sires is the mean of theposterior
distribution[4] (G
OFFINET
&E
LSEN
,
1984 ;
FERNANDO &G
IANOLA
,
1986).
Because theexpected
value of this distribution is difficult to obtain in closedform,
we calculate the modal value of 8 andrule in the sense described
above ;
this is a reasonableapproximation
assample
sizeincreases
(Z
ELLNER
,
1971).
B.
Computations
Finding
the maximum of[4]
withrespect
to 0requires
setting
to 0 the first derivatives of[4]
withrespect
to this vector.Letting
L(O)
be thelog-posterior density,
we obtain :
and (!
(.)
is the standard normaldensity
function. Observe thatq
;j
is theposterior
probability
that progeny i is out of sirej,
and that thisprobability
is maximum when the residual y; -w;!6
is null. This is so because in this instance the model under:
£
;j
would fit
perfectly
to the data.Equating
[5]
to 0gives
a nonlinearsystem
ofequations
on 0 so an iterative
procedure
isrequired
to solve it.Although
severalalgorithms
can be used for this purpose, thesimple
form of[5]
suggests
toimplement
a functional iteration.Setting
[5]
to 0 andrearranging
yields :
because
prior
information about(3 is
vague and2q
;j
=1 ; !
=u’I(T’!
=(4/h 2) _1,
whereh
z
2 isheritability.
Note that thecoefficient
matrix and theright-hand
sidesdepend
on 0 asq
i
,
is a function of(3
and u ; this is clear from[6].
Defining :
Q
={q
¡j
} :
an n x m matrix ofposterior probabilities,
..and :
D
c
=Diag
{Iq
¡J
:
an m x mdiagonal
matrix,
whose elements can bethought
of as the iposterior expected
value of the number of progeny of sirej,
the above
system
can be written in terms of the iterative scheme :where
[k]
indicates the iterate number. In[8],
the matricesQ
andD
c
are evaluated atOne
possible
way ofstarting
iteration is to takeq,j!
=p
ij
for all values of i andj.
Thus
(
y
= P ={p;!},
and1)
1
:’1
=A!
=Diag
flp
ijl
,
and these values can be viewed as the« natural » ones to
adopt
prior
to thedata.
In
practice, uncertainty
isonly
withrespect
to a small subset of the sires that needto be evaluated. The progeny can be classified into 2 groups :
I&dquo;
pertaining
toindividuals
having
siresunambiguously
identified,
and1
2
corresponding
to progeny withparentage
under «dispute
».Similarly,
sires can be allocated to 2 groups :J&dquo;
with all their progeny in setI&dquo;
andJ,,
with some progeny inI,
and some progeny in1
2
.
The data vector can bepartitioned
into threemutually
exclusive and exhaustivecompo-nents :
because
the set{i
Ei,
f1 j
EJ,}
isempty.
The vector oftransmitting
abilities can bepartitioned
as[u&dquo;
u
2]
,
corresponding
to sires inJ,
andJ,,
respectively,
so.Likewise
correspond
to the threepartitions
in[9]
above. Furtherwith
Z&dquo;
_
lp
ij
= 0 or11, Z
12
=fp
ij
= 0 or1}, Q22 = 10
<q,
<11,
P
22
=10
<p
ij
<11,
as perthe
partitions
in[9].
Using
thisnotation,
equations [8]
become :where
D!zz
is adiagonal
matrix with elements calculated as before but for the progenyand sires in the third
partition
of[9].
Again,
iteration can be startedby
replacing
the«
posterior
» Q
and D matrices in[ll],
by
their «prior
»counterparts,
P andA,
ofappropriate
order. The aboveequations
illustrateclearly
the modifications needed in the mixed modelequations
to take into account uncertainpaternity.
Theportions
in the coefficient matrix andright-hand
sidespertaining
to records wherepaternity
isunambi-guous
(y&dquo;
andy
jz
)
are the usual ones. The incidence matrixZ22
that would arise ifpaternity
of animals with records in Y22 werecertain,
isreplaced by
a matrixQ
ofposterior probabilities.
These areupdated
during
the course of iteration to take intoaccount the contribution of the data.
Likewise,
Z!2Z22
isreplaced
by
the Dmatrix,
which is a function of theposterior probabilities
q
ij
,
asalready
indicated. BecauseQ
22
isusually
a smallmatrix,
[8]
or[11]
will convergerapidly.
If functional iteration is slow toconverge,
algorithms
such asNewton-Raphson
can beemployed (Appendix B).
III.
Binary
dataA.
Methodology
The data are now
binary
responses so yi = 0 or 1. The model used here is based onthe
concept
of «liability
»originally developed by
WRIGHT(1934),
where it is assumedthat there is an
underlying
normal variable renderedbinary
via anabrupt
threshold. Genetic evaluationprocedures
based on threshold models have been discussedby
several authors(G
IANOLA
&F
OULLEY
,
1983a,b ;
FOULLEY etaI. , 1983 ;
FOULLEY &GI
A
NOL
A
,
1984 ;
HARVILLE &M
EE
,
1984 ;
ILMOURG etC
lI. ,
1985 ;
HBSCHELE et1
1I. ,
1986).
The notation of the
preceding
section isretained,
with theunderstanding
that theparameters
are now those of theunderlying
distribution. The conditional distribution ofa
binary
response is taken as :where
<1>(.)
is the standardized normal cumulative distribution function. Theparameter
IJ
-ij is the difference between the threshold and the mean of the statistical «
sub-population
» definedby
indexesi, j
J (GIANOLA
&F
OULLEY
,
1983a)
expressed
in units ofstandard deviation.
Assuming
theprior
distribution is as in[2]
andreplacing
the normaldensity
in[3B]
by
[12],
theposterior density
can be written as :because the residual standard deviation is
equal
to 1.Finding
the 9 - mode of[13]
involvessolving
asystem
with ahigher
order ofnonlinearity
than the onestemming
from[5]
soNewton-Raphson
is used here insteadLetting
the
Newton-Raphson equations
can be written afteralgebra
as :where the variance ratio À =0
11 <r.
because the residual variance isunity, .:1pl
k
l =
pl
k
l
-P! ’!,
.:1u lk ! = ulkl -U
[k
-1
1 ,
andl
m
,
In
are vectors of ones ofappropriate
order. Onepossible
wayto start iteration would be to use
equations
[8]
withQ
replaced by
P,
D,
replaced by
.:1c,
and yreplaced by
a vector of 0 and 1’sindicating
the absence or presence of theattribute in the progeny in
question.
The values of13
and u so obtained would be usedto calculate
1T¡j
andr
ij
in[16]
and[17]
to thenproceed iterating
with[18]
above.B.
Analogy
with the normal caseWrite 7,, in
[16]
asThe
expression
q!
isdirectly comparable
toq
ij
of[6]
for the normal case. Both can beinterpreted
as theposterior probabilities
that progeny i is out of sirej,
and are similarto formulae
arising
in multivariate classificationproblems
(L
INDEMAN
etal., 1980,
p.196).
In the discrete case andgiven
Y
ij
,
if uij islarge
progeny i would beexpected
torespond
withhigh probability
in the firstcategory
and q*,
will belarger
when the response isactually
in the first rather than in the secondcategory.
Theexpression
forv
jj
(with
a minussign)
is the « normal score » discussedby
GIANOLA & FOULLEY(1983a,
p.
216 ;
1983b,
p.143).
IV. Estimation of unknown variances
The
point
estimators of location described above are the modes ofposterior
in the situation of
binary
responses. When these variances areunknown,
Box & TIAO(1973)
and O’Hncnrr(1976)
havegiven
arguments
indicating
that inferences could be made from the distributionf(Olul
=8
j,
u!
=8[
),
where the variances arereplaced
by
themodal values of the
marginal posterior
distribution of the variances. In the absence ofprior
information about thevariances,
these modal values are those obtained from the method of restricted maximum likelihood(H
ARVILLE
,
1974,
1977).
Thisapproach
wasemployed by
GrnrroLn et al.(1986)
in the context ofoptimum prediction
ofbreeding
values and these authors view theresulting
predictors
asbelonging
to the class ofempirical Bayes
estimators. Thegeneral principles
involved infinding
the modal values of theposterior
distribution of the variances aregiven
below.F
OULLEY et al.
(1986)
and GIANOLA et al.(1986)
showed that maximization off(
G.
1
,
u:.ly)
withrespect
to the variances in the absence ofprior
information about theseparameters
leads to theequations :
where
E!
indicatesexpectation
withrespect
to the distributionf(ul<T},
Q
;, y).
Further,
and now
taking
expectation
withrespect
tof(6!aj’,
Q
u, y),
we need tosatisfy :
The derivation is based on the
decomposition
of theposterior
distribution of allunknowns,
f([3,
u,u,
2
,,
cr.21y),
intoIt should be noted that the likelihood function does not
depend
onu 2,
which is trueboth in the normal and
binary
cases.Also,
when flatpriors
are taken for thevariances,
f(I.T!)
andf(<7!)
do not appear in the abovedecomposition.
Solving
[19}
and[20] simultaneously
for the unknown variances leads to an iterativescheme
involving
theexpressions :
where
o k is iterate
number,
9 C is the inverse of the coefficient matrix in
Newton-Raphson (Appendix B),
orof
[18]
when observations arebinary,
o
C!,,
is the submatrix of Ccorresponding
to theu-effects,
o M is the coefficient matrix in
[8]
or[18]
withoutA-’X,
. W =
[X, Q]
It should be noted that in the
binary
case the residual variance is not estimated because it is taken asequal
to one. The derivation of[22]
isgiven
inAppendix
C.Equation
[21],
however,
holds in both cases. The conditionalexpectations
are taken as if the« true » values of the variance
components
were those found in theprevious
iteration. Aspointed
outby
GIANOLA et al.(1986),
[20]
and[21]
arise in the EMalgorithm
(D
EMPSTER
etal.,
1977)
whenapplied
to estimationby
restricted maximumlikelihood,
and the
resulting
estimates are nevernegative.
V. Numerical
application
A small data set from a progeny test of Blonde
d’Aquitaine
sires carried out in France was used to illustrate the methodspresented
in this paper. The data set is thesame as the one utilized
by
FOULLEY et al.(1983),
with some’modifications,
asillustrated in table 1. There were 47
calving
recordsincluding
information onregion
oforigin
of theheifer,
calving
season, sex and sire ofcalf,
and birthweight
(BW)
andcalving
ease(CE)
as response variables. CE was recorded as an all-or-none trait with«
easy » and « difficult »
calvings
coded as 0 or1,
respectively.
As shown in table1,
paternity
was uncertain in the case of records1,
2,
3 and 39. For the first threerecords,
information onbreeding periods
andgestation lengths
led to anassignment
tonatural service sires 7 and 8 of
probabilities
equal
to
1
and
3
,
respectively.
In the4 4
case of record
39,
artificial insemination sires 1 and 2 wereassigned
probabilities
of1 1
and 2 ,
respectively.
2 2
A. Model
Birth
weight
wasregarded
asfollowing
a normaldistribution,
and CE was treatedas a binomial trait. Both traits were
analyzed using
the modelwhere
H
;
is the effect ofregion
i oforigin
of heifer(i
=1,
2),
A
j
is the effect of thejth
season of
calving
(j
=1,
2), S,
is the effect of sex of calf k(k
= 1 for males or 2 forfemales), f,
is thetransmitting ability
of the lth sire of heifer(1
=1,
.. ,8),
ande
ijkl
is aresidual with variance
uj.
The vectorsp
and u werePrior
knowledge
about!3
was assumed to be vague.Heritability
was .25 for bothtraits,
and(
T
e2
was 5kg
for BW and 1 forCE,
the discrete trait. Informing
therelationship
matrixA,
it was assumed that the artificial insemination sires(1 through 6)
were
unrelated,
and that the natural service sires 7 and 8 were non-inbred sons of 5In this
example,
the sets needed to define[9]
were :1,
=11,
2, 3,
39} (with
I,
being
the
complement),
J, =
{2,
3, 4,
5}
andJ, =
{l,
6,
7,
8}.
Thus,
the matrixP,,
in[10]
wasFor
BW,
the nonlinearsystem
[11]
was solvedusing
3algorithms :
functionaliteration,
and
Newton-Raphson
andscoring
as described inAppendix
B. ForCE,
computations
were carried out with[18] ;
starting
values were calculated asdiscussed
earlier.Iteration
stopped
when the square root of the averagesquared
correction was less than10-5
. Variance
components
were estimated for both traitsusing
theprocedures
outlined in section III.Sire evaluations
ignoring
uncertainty
onpaternity
were also calculated so as tofurther illustrate the
procedures.
This was doneby assigning progenies
1, 2,
3 to sire 1 and record 39 to sire 6.B. Results
Results of the
analysis
conducted for BW arepresented
in table 2.Irrespective
ofthe
algorithm
used,
thestopping
rule of 10-1 was satisfied in 4 iterations. The fact thatthe
algorithms
wereequally
fast to converge isundoubtedly
related to the limitedextent of
nonlinearity,
asonly
4 out of 39 records hadambiguous
parentage. Further,
a«
sharp
»assignment
ofprobabilities
(—vs. —)
was
made in 3 out of the 4 records. In B4 4 4 /this data set, from a
practical
point
of view iteration could havestopped
at the second round. Differences between theanalyses
conductedignoring uncertainty
andtaking
it into account were minimal. Sire 1 was the most affected because 3 recordsassigned
tohim in the case of certain
paternity
wereassigned
to sires 7 or 8 whenpaternity
wasuncertain.
The
analysis
ofcalving<
ease is shown in table 3. Whenpaternity
wascertain,
5iterates were
required
to converge. On the otherhand,
13 iterations wererequired
when
uncertainty
was taken into account. This is so because with abinary
trait there are 2 sources ofnonlinearity
whenpaternity
is uncertain : one due to the fact that the model isnonlinear,
and the second due to theuncertainty
itself. The second source ofnonlinearity
wasresponsible
for the 8 additional iterations.Estimates of variance
components
in thisexample
were<1!
= 0 and<1;
= 22.81 forBW,
and afl =
.096 for CE. The latter valuegives
an estimate ofheritability
of .35 in theunderlying
scale. ForBW,
more than 400 iterates were needed for estimates of variancecomponents
to converge, and 193 iterations wererequired
for CE. It is wellknown that the EM
algorithm
isextremely
slow to converge(T
HOMPSON
,
1979),
especially
in smallsamples.
However,
alternativeparameterizations
of the model ornumerical shortcuts
(e.g.,
S
CHAEFFER
,
1979 ;
MISZTAL &S
CHAEFFER
,
1986)
can be usedVI. Discussion
The
impact
of the extent of misidentification on sire evaluation and on estimates ofgenetic
parameters
was studiedby
VAN VLECK(1970a,b)
and BoNmTt(1975).
These authors found that misidentification of sires biased downwards estimates ofheritability
and ofexpected
genetic
progress. Biases in evaluation of sires increased as the fractionof misidentified animals increased.
The
approach
followed in thepresent
study,
as in PomEY & ELSEN(1984),
is todirectly
take into account in theanalysis uncertainty
on theassignment
of progeny tosires so as to
improve
prediction
ofbreeding
values.However,
PowEV & ELSEN(1984)
studied theproblem
in a selection index framework whichrequires knowledge
of meansand variances. The issue was adressed here in a more
general
manner so as toaccommodate different
types
of distribution(normal
orbinomial),
and less restrictivestates of
knowledge
vis-a-vis fixed effects and variancecomponents.
Using
theBayesian paradigm
as in GIANOLA et al.(1986),
leads to inferences basedon a
posterior
distribution with theuncertainty
«integrated
» oraveraged
out. With thep
and ucomponents
of the mode of theposterior
distribution taken aspoint
estimators andpredictors
of fixed and randomeffects,
respectively,
a nonlinearsystem
of equa-tions is obtained.Algorithms
forsolving
theseequations
are discussed in the paper.Variance
components
were estimated from thejoint posterior
distribution of the variances aftertaking
into accountuncertainty
in theassignment
of progeny to sires. Thepoint
estimators chosen were the modal values of thisdistribution ;
expressions
forcomputing
the estimatesiteratively
werepresented.
Is it
possible
to usedirectly
the mixed modelequations
to obtain standard best linear unbiasedpredictors
whenpaternity
is uncertain ? The best linear unbiasedpredictor
of u isgiven by
where V is the variance covariance matrix of the records
(H
ENDERSON
,
1
973).
From results inAppendix
A,
thediagonal
elements of V in the continuous case areand the
off-diagonals
arewhere
a
jr
is the additiverelationship
betweensires j
andj’.
It follows that V is not in theformZGZ’ +R needed to put V-’
= R-1 -R-
1
Z(Z’R-
I
Z
+G-’)-’Z’R-’ so as to establish
theequivalence
between the best linear unbiasedpredictor
above and the resultsgiven
by
the mixed modelequations
(H
ENDERSON
,
1984).
It is not obvious how to treat theproblem
of uncertainpaternity
using
standardtechniques.
TheBayesian
solution pre-sentedhere,
on the otherhand,
offers a clear answer. POIVEY & ELSEN(1984)
discussed situations in which the methodspresented
here could beapplied.
These include :i)
femalesexposed simultaneously
orsuccessively
in time to groups ofmales ;
ii) joint
useof artificial and natural
breeding
insheep
flocks and cattle herds inconjunction
withestrous
synchronization
techniques ;
andiii) heterospermic
progenytesting
with ambi-guousparentage.
Arequirement
of theprocedure
is thespecification
ofprior
probabili-ties
p
ij
which can be based on external information such as biochemicalpo)ymorphysms
or, more
likely
under
extensiveconditions,
records onbreeding
dates andgestation
lengths.
Inparticular,
the methods described here may bepotentially
useful in situa-tions where natural servicesires
are usedextensively,
e.g.,pastoral production
systems.
Thecomputations
arefeasible,
at least for univariate sire evaluations carried out under theassumption
ofnormality
and withgenetic
parameters
assumed known. Extensions tothe multivariate situation can be done without
great
conceptual
difficulty.
ReceivedFebruary
27,
1986.Accepted
June24,
1986.Acknowledgements
This research was conducted while J.L. FOULLEY was a
George
A. MillerVisiting
Scholar atthe
University
of Illinois, on sabbatical leave from INRA. J.L. Fouc.r.EV wishes to thank the Direction des Productions animales and Direction des relations internationales, INRA, for theirsupport. D. GIANOLA
acknowledges
support from the IllinoisAgriculture Experiment
Station and of Grant No. US-805-84 from BARD-The United States-Israel BinationalAgricultural
Researchand
Development
Fund.References
B
ONAITI B., 1975. Influence du taux d’erreur dans les filiations
paternelles
sur la valeur de laselection sur descendance chez les ovins. 7!’ JournEes de la Recherche Ovine et
Caprine,
Paris, 2-4 decembre 1975, SPEOC-ITOVIC, 166-172.
Box G.E.P., TIAO G.C., 1983.
Bayesian inference
in statisticalanalysis.
588 pp.,Addison-Wesley,
Reading,
Massachussets.Cox D.R., HINKLEY D.V., 1974. Theoretical statisties. 511 pp.,
Chapman
and Hall, London. DEMPFLE L., 1977. Relation entre BLUP et estimateurs
bayesiens.
Ann. Genet. Sel. Anim., 9, 27-32.D
EMPSTER A.P., LAIRD N.M., RUBIN D.B., 1977. Maximum likelihood from
incomplete
data viathe EM
algorithm.
J.Roy.
Statis. Soc. , B, 39, 1-38.F
ERNANDO R.L., GmNOCn D., 1986.
Optimal properties
of the conditional mean as a selectioncriterion. Theor.
Appl.
Genet.(submitted).
F
OULLEY J.L., GIANOLA D., THOMPSON R., 1983. Prediction of
genetic
merit from data onbinary
and
quantitative
variates with anapplication
tocalving difficulty,
birthweight
andpelvic
opening.
Genet. Sel. Evol., 15, 401-424.F
OULLEY J.L., GIANOLA D., 1984. Estimation of
genetic
merit from bivariate « all or none »responses. Genet. Sel. Evol., 16, 285-306.
F
OULLEYJ.L., 1M S., GIANOLA D., HÖSCHELE L, 1987.
Empirical Bayes
estimation of parameters for npolygenic binary
traits. Genet. Sel. Evol., 19(in press).
G
IANOLA D., FOULLEY J.L., 1983a. Sire evaluation for ordered
categorical
data with a threshold model. Genet. Set. Evol., 15, 201-224.GmNOLn D., FOULLEY J.L., 1983b. New
techniques
ofprediction
ofbreeding
value for disconti-nuous traits. Proc. 32nd Annual National Breeders Round-table, St. Louis, Missouri,May
6, 1983, 28 pp., Mimeo.G
IANOLA D., FERNANDO R.L., 1986.
Bayesian
methods in animalbreeding theory.
J. Anim. Sci., 63, 217-244.G
IANOLA D., FOULLEY J.L., FERNANDO R.L., 1986. Prediction of
breeding
values when variancesare not known. G6n6l. Sel. Evol.
(submitted).
GILMOURA.R., ANDERSONR.D., RAE A.L., 1985. The
analysis
of binomial databy
ageneralized
linear mixed model. Biometrika, 72, 593-599. G
OFFINETB., ELSEN J.M., 1984. Critere
optimal
de selection :quelques
résultatsgénéraux.
Gen9t. Sel. Evol., 16, 307-318.H
ARVILLE D.A., 1974.
Bayesian
inference for variance componentsusing only
error contrasts.Biometrika, 72, 593-599.
H
ARVILLE D.A., 1977. Maximum likelihood
approaches
to variance component estimation and torelated
problems.
J. Am. Stat. Assoc., 72, 320-338.H
ARVILLE D.A., MEE R.W., 1984. A mixed model
procedure
foranalyzing
orderedcategorical
data. Biometrics, 40, 393-408. H
ENDERSON C.R., 1973. Sire evaluation and
genetic
trends.Proceedings of
the AnimalBreeding
and GeneticsSymposium
in Honorof
Dr.Jay
L. Lush. AmericanSociety
of Animal Scienceand American
Dairy
Science Association, 10-41,Champaign,
Illinois. HENDERSON C.R., 1984.
Applications of
linear models in animalbreeding.
462 pp.,University
ofGuelph,
Guelph,
Ontario.H6sCHELE Ina, FOULLEY J.L., COLLEAU J.J., GIANOLA D., 1986. Genetic evaluation for
multiple
binary
responses. Genet. Sel. Evol. , 18, 299-320.L
EFORT G., 1980. Le modèle de base de la selection,
justification
et limites. InLegay
J.M. et al.(ed.),
Biometrie etG6n
g
tique,
4, 1-14, SocieteFranqaise
de Biométrie. INRA,D6partement
de Biométrie.L
INDEMAN R.H., MERENDA P.F., GOLD R.Z., 1980. Introduction to bivariate and multivariate
analysis.
444 pp., Scott, Foresman & Co., Glenview, Illinois.M
ISZTAL I., SCHAEFFER L.R., 1986. Nonlinear model for
describing
convergence of iterative methods of variance component estimation. J.Dairy
Sci., 69, 2209-2213.O’H
AGAN A., 1976. On
posterior joint
andmarginal
modes. Biometrika, 63, 329-333.P
OIVEY J.P., ELSEN J.M., 1984. Estimation de la valeur
génétique
desreproducteurs
dans le casd’incertitude sur les apparentements. I.. Formulation des indices de selection. Génét. Sel.
Evol. , 16, 445-454. R6
NNINGEN K., 1971. Some
properties
of the selection index derivedby
« Henderson’s mixedmodel method ». Z. Tierz.
Ziichtungsbiol.,
88, 186-193.S CHAEFF E
R L.R., 1979. Estimation of variance and covariance components for average
daily gain
and backfat thickness in swine. In : Searle S.R., Van Vleck L.D.
(ed.),
n Variancecompo-nents and animal
breeding
». Proc.of
aConference
in Honorof
C.R.Henderson,
123-137. CornellUniversity,
Ithaca, New York.T
HOMPSON R., 1979. Sire evaluation. Biometrics, 35, 339-353.
V AN
VLECtc L.D., 1970a. Misidentification in estimating the
paternal
sib correlation. J. Dairy Sci., 53, 1469-1475.V
AN VLECK L.D., 1970b. Misidentification and sire evaluation. J.
Dairy
Sci., 53, 1697-1702.WRIGHT
S., 1934. Ananalysis
ofvariability
in number ofdigits
in an inbred strain ofguinea pigs.
Genetics, 19, 506-536.
Z
ELLNER A., 1971. An introduction to Bayesian
inference
in econometrics, 431 pp., JohnWiley
&Appendix
AVariance-covariance structure
of
the data(frequentist viewpoint)
The
starting
point
is[3B].
As shown in the text,omitting
theconditioning
on(
T
.
2
for the sake of
simplicity :
The variance of the distribution can be obtained
by
writing :
which follows from
[1]
and[All.
Likewisewhere the covariance is taken with
respect
to thejoint
distribution of-Tii
andY,.,.
The first term in the aboveequations
isclearly
null because the observations areconditio-nally independent. Assuming
P(:£
¡j
fl:£
i’k
)
=
Pi(Pik’
weget
We consider now the variance-covariance structure
unconditionally
on 0.Applying
the samestrategy,
one can write :’
The second term is the variance of jn; taken with
respect
to the distribution of 0.Arguing
from a classicalviewpoint
(P
fixed,
urandom)
we have :From !A2]
because
Z’
j
AZ,
= 1(if
sires are notinbred),
andYp
ij
z
ij
=p,
Collecting [A5]
and[A6]
into[A4]
gives
Finally,
we consider the unconditional covariances between records y; and y;..Writing :
we
observe
as before that the first term is null.Also,
from[All :
It should be observed that
Var(y)
cannot be written asR
Q
e
+PAP’u!
because thediagonal
elements of this last matrixexpression
are notequal
to[A7]
except
in the trivial case P =Z,
i.e.,
whenpaternity
is certain.Appendix
B INewton-Raphson
andscoring
algorithms for
normal data TheNewton-Raphson algorithm
consists initerating
with :where 0!&dquo;! =
O
lk
) -
8!’&dquo; and O
lk
)
is the solution at iteration k. The first derivatives aregiven
in[5]
and the second derivatives are :From the definition of
q;!
in[6],
we have :Using
this in[B2]
above andrearranging gives :
Some
simplification
in the calculations can be achievedby replacing
r;!
by
itsexpectation
takenconditionally
on0,
Q
?
and;
£i
j’
From[B4]
we obtaindirectly :
This used in
[B5]
aboveyields
a «scoring
»algorithm
forsolving
the nonlinearsystem
ofequations.
The
system
[B5]
can be written in matrix notationusing
the matricesR, E!,
andE,
of page 89 with
r,
j
as in[B4]
instead of[17].
Also,
define matricesThe system
[B5]
becomes thenThe
algorithm
is also described for the case where uncertainpaternity
isonly
with respect to a smallproportion
of the sires evaluated.Here,
wepartition
R in the sameWith the above
notation,
thesystem
in B6 can be written as :The above
equations
indicate theparts
of thesystem
that need to be amended totake uncertain
paternity
into account. Asbefore,
thenonlinearity
stems from the contribution of the vector Yn to information about the unknownparameters.
In orderto start
iteration,
one may take RIOj =P,
E!
OI
=A,,
E’&dquo;’
=A!,
and values of the vectors(3,
u, and u, obtainedby
applying
linear mixed modelmethodology
upon the vectors Y andYi,-Appendix
CDerivation
of
thealgorithm
usedfor estimating
Qr
with normal data The estimator ofa,
2
needs tosatisfy
[20].
From[3A]
and[3B]
Using
this result in[Cl]
and then in[201
gives :
where
q
ij
is as in[6],
and whereE,
indicatesexpectation
taken withrespect
to the conditional distributionf(Olu,,
a!, y).
Theexpectation
in[C2]
is difficult to obtain becauseq
ij
is a function of 0. If qij isregarded
as a constant arearrangement
of[C2]
gives :
As done
by
HARVILLE & MEE(1984)
in the context of thresholdmodels,
wereplace
E(¡L;,
=w’, 6!j’,
a’,
y) by
the mode6
calculatedusing equations [8] (or
theexpressions
described in
Appendix
B).
Thus, the formula above becomes :Now,
Var(0(y) =
Cu
j
,
where C is the inverse of the coefficient matrix in[B6]
or[B7]
evaluated at6.
Further,
at themaximum,
[5]
must be null which is satisfied when[7]
is satisfied.Multiplying
both sides of[7]
by 6
(and
remembering
that a flatprior
isused for
!3)
yields
With this in
mind,
we obtain thefollowing
for thecomponents
of[C3]
where M is the coefficient matrix in