HAL Id: hal-00894106
https://hal.archives-ouvertes.fr/hal-00894106
Submitted on 1 Jan 1995
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Inequality of maximum a posteriori estimators with
equivalent sire and animal models for threshold traits
M Mayer
To cite this version:
Original
article
Inequality
of
maximum
a
posteriori
estimators
with
equivalent
sire
and animal models
for
threshold traits
M
Mayer
University of Nairobi, Department of
AnimalProduction,
PO Box 29053,Nairobi, Kenya
(Received
25May 1994; accepted
17May
1995)
Summary - It is demonstrated that the use of
equivalent
sire and animal models todescribe the
underlying
variable of a threshold trait leads to different maximum aposteriori
estimators. In thesimple example
data sets examined in thisstudy,
the use of an animalmodel shrank the maximum a posteriori estimators towards zero
compared
with theestimators under a sire model. The differences between the 2 estimators were
particularly
noticeable when the
heritability
on theunderlying
scale washigh
andthey
increased withincreasing heritability. Moreover,
it was shown that even theranking
ofbreeding
animals may be different whenapplying
the 2 different estimators asranking
criteria.threshold trait
/
genetic evaluation/
maximum a posteriori estimationRésumé - Inégalité entre des estimateurs du maximum a posteriori avec des modèles équivalents père et animal pour des caractères à seuil. Il est démontré que l’utilisation
des modèles
équivalents père
etanimal,
pour décrire la variablesous-jacente
d’un caractère àseuil,
conduit àdifférents
estimateurs du maximum aposteriori.
Dans lesexemples
simples analysés
dans cetteétude,
en comparant l’utilisation de ces 2 modèles on constateque les estimateurs du ma!imum a
posteriori
tendent à serapprocher
de 0lorsqu’on
utilise le modèle animal. Les
différences
entre les 2 estimateurs sontparticuLièrement
notables
quand
l’héritabilité sur l’échellesous-jacente
estélevée,
et elles augmentent avecl’héritabilité. De
plus,
il est aussi montré que même le classement desreproducteurs
peutêtre
différent
selon l’estimateur utilisé comme critère de classement.caractère à seuil
/
évaluation génétique/
estimation du maximum a posterioriINTRODUCTION
New
procedures
based on the thresholdconcept
haverecently
beendeveloped
methods introduced
by
Gianola andFoulley
(1983),
Stiratelli et al(1984)
and Harville and Mee(1984)
areessentially
the same and were derivedmainly
in aBayesian
framework. With thesemethods,
locationparameters
in theunderlying
scale are estimatedby
maximizing
ajoint posterior density.
The estimators aretherefore
designated
as maximum aposteriori
estimators. Studies on theproperties
of maximum a
posteriori
estimators andcomparisons
with the estimators andpredictors
of linear model methods havepreviously
been based on sire modelsto describe the variable on the
underlying
scale(eg,
Meijering
andGianola,
1985;
H6schele,
1988;
Weller etal, 1988;
Renand etal, 1990; Mayer,
1991).
Forcontinuously
distributed traits the use of an animal model is the state of the art for thegenetic
evaluation ofbreeding
animals. Alogical
step
would be to use an animal model as well to describe theunderlying
variable ingenetic
evaluation withthreshold traits. In the
present
paper someproperties
of maximum aposteriori
estimators are studied in the context of
applying
an animal model to describe theunderlying
variable of threshold traits. METHODSExample
data setConsider the
following
simple
data structure(2
sires,
each with 1progeny)
for adichotomous character with the 2 realizations
M
o
(0)
andM, (1):
Progeny
1 exhibits the realizationM
o
and progeny 2 the realizationM
l
.
Assumethat we want to estimate the
breeding
values of the sires for the trait M. Maximum aposteriori
estimationTo estimate the
breeding
values of thesires,
the thresholdconcept
is used and wefollow the
Bayesian
approach
along
the lines introducedby
Gianola andFoulley
(1983),
Stiratelli et al(1984),
and Harville and Mee(1984).
Let the data bearranged
as a 2 x 2contingency table,
where the 2 rowsrepresent,
asusual,
thesire
effects,
and the 2 columnsrepresent
the realizationsM
o
andM
i
.
The elements of thecontingency
table in the ith row andjth
column aredesignated
as nij. TheThe rth
experimental
unit in row i is characterizedby
anunderlying
continuously
distributed
variable,
which is rendered discretethrough
a fixed threshold.Let us assume that the variance
components
in thedescription
of theunderlying
variable are
known,
at least toproportionality. Using
a ’flat’prior
for the threshold(t),
so as to mimic the traditional mixed modelanalysis,
andassuming
that the sire effects on theunderlying
scale(u
l
, u
2
)
apriori
arenormally, identically
andindependently
distributed with mean 0 and varianceQ
u,
thejoint posterior density
has the form
(
C1
=constant):
Using
aprobit-transformation
and apurely
additivegenetic
model for theunderlying
variable(4l:
standardized normal cumulativedensity
function):
The residual standard deviation
(
Qe
)
is taken as the unit of measurement on theunderlying
scale,
so that Qe = 1.The maximum a
posteriori
estimators are the values which maximize thejoint
posterior
density
g(/3).
This isequivalent
to the maximation of thelog-posterior
density,
which can be done moreeasily.
The first derivatives of thelog-posterior
density
withrespect
to t and ui are not linear functions of theseparameters
andmust be solved
iteratively
using
numericaltechniques
such asNewton-Raphson
orFisher’s
scoring
procedure.
TheNewton-Raphson
algorithm
leads to thefollowing
iterative scheme:where
(0:
standardized normalprobability density
function):
The
joint posterior density
g((3)
issymmetric
in the sense thatg(t
=z i , U1 = X2,U2 =
.r3!!,!,!)
=
9
(t
= -X1 , U1 = -X3,U2 =-!2!!,!,!).
From thisexpression
it followsdirectly
that theposterior
density
has its maximum wheret = 0 and U1 + U2 = 0.
Making
use of thisresult,
the iterativeequation
[2]
reduces to:The estimated
breeding
values of the siresequal
2u !k)
at convergence. Forcompatibility
reasons with the maximum aposteriori
estimators in thefollowing
section,
which are based on an animal model to describe theunderlying
variable and where the environmental standard deviation(
QE
)
is taken as the unit ofmeasurement,
the estimator in[3]
may also beexpressed
in units of QEby
multiplying
u!k)
by
[(1 -
4
c
)/(1 -
2
/
1
)]-
C
or(1 -
3<!)’!,
respectively,
where cis the intraclass correlation
Q
u/(1+Qu).
Moreover,
it is convenient to express the maximum aposteriori
estimators in terms of thephenotypic
standard deviation ap =(
U2
+!e)1!2
by
multiplying
the estimator from[3]
by
[1/(l -
c
)]-
l
/
2
or(
a2
+
1)-1!2,
respectively.
Use of an
equivalent
animal modelUp
to now we have dealt with a sire model to describe theunderlying
variablefor the threshold trait. Let the rows of the
contingency
table nowrepresent
theprogenies
in which the observations are made and we use anequivalent
animalmodel to describe the basis variable and to estimate the
breeding
values of the sires. For that purpose letv’
=[!1!2 !3 !4],
where vl and v2represent
the additivegenetic
values on theunderlying
scale of theprogenies
1 and2,
and v3 and v4represent
the additivegenetic
effects of their sires 1 and 2.Under these conditions v is a
priori normally
distributed with mean 0 and variance-covariance matrixAcr 2,
where A is the numeratorrelationship
matrix anda2
the
additivegenetic
variance on theunderlying
scale. For theexample
dataset the
relationship
matrix has thefollowing
form:The likelihood function
invariably
follows aproduct
binomial distribution. Sothe
joint posterior density
is now:with
and the environmental variance
(!E)
as the unit of measurement on theunderlying
scale.
For the
joint
posterior
density
[5]
theNewton-Raphson
algorithm
leads after awhere:
It can be
easily
seen that forvi
k
)
and v2!!
respectively,
thisequation
system
isformally
very similar to the iterativeequation
(2!.
Further,
it shows the result thatV
i - Vi
/2
and
V4
2
/2,
ie the estimatedbreeding
values of the siresequal
half of the estimated
genetic
effects of theirprogenies.
To express the maximum aposteriori
estimators in units of thephenotypic
standard deviationthey
have tobe
multiplied,
as in the case of the sire modelby
[1/(1 -
c)!-1!2
or(ua
+
1
)-
1
/
2
,
respectively.
More
complex
data structureA more
complex
datastructure,
viz thehypothetical
data setalready
usedby
Gianola and
Foulley
(1983)
to illustrate the maximum aposteriori
estimationprocedure,
wasemployed
to demonstrate the differences between the estimators based on either a sire or an animal model. The data consist ofcalving
ease scoresfrom 28 male and female calves born in 2
herd-years
from heifers and cows mated to4 sires.
Calving
ease was scored as a response in 1 of 3 orderedcategories
(normal
birth, slight difficulty,
extremedifficulty).
The data arearranged
into acontingency
table in table I.
Two different
heritability
values on theunderlying
scale(h
2
= 0.20 andh
2
=0.60)
werepostulated.
For the maximum aposteriori
estimation of the locationparame-ters of the
underlying
variable,
thecomputer
programpackage
TMMCAT(Mayer,
1991,
1994a)
was used.Different sire
rankings
even the
ranking
of the sires may be differentdepending
upon the modeltype
chosen. To examine this
question,
data for 3 sires weresystematically generated
and
analyzed.
The number of observations nij of sire i(i
=1,2,3)
inresponse
category j
(j
=1, 2)
was varied from 0 to 8. Asimple
randomone-way model
(sire
model,
animalmodel)
for theunderlying
variable and aheritability
value of 0.60was
postulated. Again
for theanalyses,
thecomputer
program TMMCAT was used.RESULTS
Example
data setFigure
1 shows therelationship
between the estimatedbreeding
value(EBV)
of sire1 and
heritability
for each of the models. The uppergraph
represents
the estimatestaking
the environmental standard deviation as the unit ofmeasurement,
while inthe lower
graph
the estimates areexpressed
in units of thephenotypic
standard deviation.It is obvious that the EBV is different whether a sire model or an
equivalent
animal model is used to describe the
underlying
variable. The difference betweenthe EBVs is noticeable at a
heritability
of about 0.60 and increases withincreasing
the EBV based on a sire model shows an almost linear
relationship
with theheritability
even if withincreasing heritability
theslope slightly
decreases. With ahigher
intraclass correlation on theunderlying
scale as with the animalmodel,
thisexhibits a maximum at a
heritability
value of about 0.81 and decreasesrapidly
thereafter.
Hypothetical
data setby
Gianola andFoulley
Table II shows the maximum a
posteriori
estimates of theparameters
of thehypothetical
data set in table I for the 2 different modeltypes
(sire
model andanimal
model).
The estimates areexpressed
in units of thephenotypic
standard deviation. The estimable linear functionh
+H
i
+ A
h
+S
7
&dquo;,
represents
the baseline.Further,
in table II the estimates of the contrastst
2
- t
l
(threshold
2 - threshold1),
H2 - H
i
(herd-year
2 - herd-year
1),
A!-Ah
(cow - heifer calving), 8f -8m (female
- male
calf)
are shown.Moreover,
table II contains the EBVs of thesires,
which in the case of the siremodel,
for reasons ofcomparison,
were calculated as twice the estimated sire effects.If !j
+!i
represents
the estimator of aparticular
estimable combination i of theexplanatory
effects,
then theprobability
of a response in thejth
category
(pz!)
canbe
estimated as
p
il
=<1>[(t1
+!i)/Q),
i2
p
=!!(t2
+.Ài)/u] -
<1>[(t
1
+!i)/!!
andA
3
= 1 -!P[(F2
+!t)/o’];
where 0’ is the error standard deviation(sire model)
or theenvironmental standard deviation
(animal model).
For the
heritability
value of 0.20 the estimators from the sire model and the animal model arequite
similar. The absolute estimates from the animal model are a littlesmaller,
with theexception
of the estimated distance between the thresholds. Thegreatest
relative difference is shownby
the contrast between cow effect andheifer effect. When a
heritability
of 0.60 isapplied,
the differences between themodel are
smaller, only
the estimated distance between the thresholds isgreater
and thegreatest
relative difference is found for the contrast between cow effect andheifer effect.
Different sire
rankings
Table III shows 3 data structures for which different sire
rankings
under the differentmodel
types
(sire
model and animalmodel)
were identified.The estimates based on the animal model are
again
smaller than the estimatesfrom the sire model. The different
rankings
under the 2 modeltypes
areinteresting.
Under the sire model the EBVs of the sires 3 are
clearly
greater
than the valuesof the sires
2,
whereas under an animal model it is the other way around and theEBVs of the sires 3 are
clearly
smaller than the EBVs of the sires 2. Other datastructures with different sire
rankings
were identified and table III showsonly
afew
examples.
Itmight
beexpected
that withhigher heritability
values and morecomplex
datadesigns
thistopsy-turvy
situation would be even more marked.DISCUSSION
Studies on the
properties
of maximum aposteriori
estimators for threshold traits have been based on sire models to describe theunderlying
variable. Interest hasmainly
focused oncomparisons
of maximum aposteriori
estimators with theestimators and
predictors
of linear model methods. Asregards
theEBVs,
or moreexactly
sireevaluation,
thecomparisons
have been based onempirical
product-momentcorrelations,
sirerankings,
and the realizedgenetic
response obtained fromtruncation selection.
Meijering
and Gianola(1985)
showed that with a balancedrandom one-way classification and
binary
responses, theapplication
ofquasi-best
linear unbiased
prediction
(QBLUP)
and maximum aposteriori
estimation(MAPE)
in the mean true
breeding
values of sires with eitherQBLUP
or MAPE asselection rules were not
significant.
In simulation studies where the data weregenerated by
a one-way sire model with variable progeny groupsize,
there werenoticeable differences in
efficiency
betweenQBLUP
and MAPEonly
when the incidence washigh
(>
90%,
binary
response)
and withhigh
(> 0.20)
heritability
(Meijering
andGianola, 1985; H6schele,
1988).
Whenapplying
mixed models for datasimulation,
MAPE wasgenerally
found to besuperior
incomparisons
withQBLUP.
Thissuperiority depends
on several factors: differences inincidence,
intraclasscorrelation,
unbalancedness of thelayout,
and differences in the fixedeffects and the
proportion
of selected candidates(Meijering
andGianola,
1985;
H6schele, 1988;
Mayer,
1991).
Therelationship
between the relativesuperiority
ofMAPE and the
proportion
of selected candidates was not found to be of asimple
kind
(Mayer, 1991).
Somewhat in contrast to thesefindings
in the simulation studieswere the results of
investigations
whereproduct-moment
correlations betweenMAPE and
QBLUP
estimators were calculated(Jensen,
1986;
Djemali
etal, 1987;
H6schele,
1988;
Weller etal, 1988;
Renand etal,
1990).
The correlation coefficientsthroughout
were veryhigh.
This is even moreastonishing
becausetheoretically
thereis no linear
relationship
between MAPE andQBLUP,
and so even with identical sireranking
the correlation coefficient is smaller than one.Generally,
a sire model can be considered as aspecial
case of an animalmodel,
making
very restrictivesupositions.
Forcontinuously
distributedtraits,
the use of an animal model iscurrently
the state of the art forgenetic
evaluation ofbreeding
animals.Basically,
all thepractical
and theoretical reasons in favour of the use ofan animal model instead of a sire model for traits
showing
a continuous distributionare also valid for threshold traits.
However,
in thepresent
paper it wasclearly
shown that with thresholdtraits,
themaximum a
posteriori
estimators of theparameters
of theunderlying
variable or theestimators of the
breeding
value are different whether a sire model or anequivalent
animal model is used to describe the
underlying
variable. Thisis,
of course, anunfavourable behaviour of the estimation
method,
but aslong
as theranking
ofbreeding
animals is notaffected,
it would not be soproblematic
from abreeding
point
of view. As mentionedabove, QBLUP
and MAPEyield
exactly
the sameranking
of sires for a one-way random model withbinary
responses and constant progeny group sizealthough
there is no strict linearrelationship
betweenQBLUP
and MAPE.
However,
as was shown in thisstudy,
with maximum aposteriori
estimation,
even theranking
ofbreeding
animals maydepend
on the modeltype
chosen.The result that the use of
equivalent
sire and animal models for theunderlying
variable and the use of the same estimation method lead to different estimators may be counterintuitive. Under the animal model the maximum a
posteriori
estimators are obtainedby computing
the mode of thejoint posterior
distribution of the’environmental’ effects and the
breeding
values of all the animals. Under the siremodel the mode of the
marginal posterior
density
for the sires iscomputed.
In theanalyses
of the
simple example
data set in thisstudy,
the’marginal’
posterior
density
forthe
siresequals
thejoint posterior
density,
afterintegrating
out theJJ
g(t,
Vl, V2, V3,!4 !!
8v
2’
h
1
)8v
2
It is well known that theposterior
densities inthis
study
areonly
asymptotically
normal andgenerally
do not havetypical
formsof distribution
(see,
forexample,
Foulley
etal,
1990).
Therefore the mode of themarginal posterior density
function is different from the mode of thejoint posterior
distribution. The situation is different in the linear model case or moreprecisely
when the likelihood follows a normal distribution. As shown
by
several authors(eg,
Gianola etal,
1990)
then,
under theassumption
that the variancecomponents
are
known,
thejoint posterior
density
is multivariatenormal,
ie modes and means are identical andmarginal
means can be deriveddirectly
from the vector ofjoint
means.
If the maximum a
posteriori
estimatesdepend
on the modeltype
used to describe theunderlying
variable,
as shown in this paper, there may arise thequestion
as towhich is the better estimator. In animal
breeding,
the best selectionrule,
in thesense of
maximizing
theexpected
value of the mean of thebreeding
values of afixed number of individuals selected from a fixed number of
candidates,
is themean of the
marginal posterior
distribution of thebreeding
values(Goffinet,
1983;
Fernando and
Gianola,
1986).
Only
because its calculation isusually thought
to be unfeasible for threshold traits(Foulley
etal, 1990;
Knuiman andLaird,
1990),
arejoint posterior
modes or maximum aposteriori
estimators considered andregarded
as an
approximation
to theposterior
mean. For convenience andsimplicity,
let us assume that in theanalysis
of thesimple
example
data set in thisstudy,
the value of the threshold is known to be t = 0. Themarginal
posterior
mean can then becalculated,
notonly by
numerical means, but alsoanalytically.
Under the conditions that thebreeding
values and the environmental effects arenormally, identically
and
independently
distributed with mean 0 and variancesQ
a
and uf respectively,
and
mutually
independent,
it follows from well-known statisticalproperties
of the normal distribution that themarginal
expected
valueE[alM
o]
is:Using
theadditive-genetic
transmissionmodel,
where p is a progeny of sire sand dam
d,
and am is a random variable with mean 0 and varianceu’12
describing
the Mendelian
sampling:
the
marginal breeding
value of sire 1(
U1
),
conditional to his progenyexhibiting
therealization
Mo(n2!)
is:In table IV this
marginal
mean estimator iscompared
with the maximum aposteriori
estimators under the 2 different modeltypes
(sire
model and animalmodel).
In the
simple
example
data set of thisstudy
the maximum aposteriori
estimator under the sire model is much closer to themarginal
mean estimator. This isvery poor when
compared
with themarginal
mean estimator. In a morethorough
study
ofmarginal
mean estimationby Mayer
(1994b),
similar results were foundin the
analysis
of thecalving
ease data setpresented
in table I.Thus,
a normaldensity approximation
may not bejustified
forposterior
distributions when nij values are small as under an animal model. FurtherMayer
(1994b)
showed thatthe standard deviation of the
marginal
posterior
density
wasclearly
greater
thanthe
approximate
standard deviation of the maximum aposteriori
estimates derivedfrom the observed or estimated Fisher information matrix.
ACKNOWLEDGMENTS
The author is
grateful
to the anonymous reviewers for theirhelpful suggestions
andcomments on the
manuscript.
REFERENCES
Djemali M, Berger PJ,
Freeman AE(1987)
Orderedcategorical
sire evaluation fordystocia
in Holsteins. J
Dairy
Sci70,
2374-2384Fernando
RL,
Gianola D(1986)
Optimal
properties of the conditional mean as a selectioncriterion. Theor
Appl
Genet72,
822-825Foulley JL,
GianolaD,
Im S(1990)
Genetic evaluation for discretepolygenic
traits in animalbreeding.
In: Advances in Statistical Methodsfor
GeneticImprovement of
Livestock(D
Gianola,
KHammond,
eds),
Springer-Verlag, Berlin,
361-409Gianola D,
Foulley
JL(1983)
Sire evaluation for orderedcategorical
data with a thresholdmodel. Genet Sel
Evol
15, 201-224Gianola
D,
ImS,
Macedo FW(1990)
A framework forprediction
ofbreeding
value. In: Advances in Statistical Methodsfor
GeneticImprovement of
Livestock(D
Gianola,
KHammond,
eds),
Springer-Verlag, Berlin,
210-238Goffinet B
(1983)
Selection on selected records. Genet Sel Evol15,
91-97Harville
DA,
Mee RW(1984)
A mixed modelprocedure
foranalyzing
orderedcategorical
data. Biometrics40,
393-408Jensen J
(1986)
Sire evaluation for type traits with linear and non-linearprocedures.
Livest Prod Sci 15, 165-171Knuiman
MW,
Laird NM(1990)
Parameter estimation in variance component models forbinary
response data. In: Advances in Statistical Methodsfor
GeneticImprovement of
Livestock(D
Gianola,
KHammond,
eds),
Springer-Verlag, Berlin,
177-189Mayer
M(1991)
Schatzung
von Zuchtwerten and Parametern beikategorischen
Schwellen-merkmalen in der Tierzucht. Habilitationthesis,
Hannover School ofVeterinary
Sciences,
HannoverMayer
M(1994a)
TMMCAT - a computer programusing
non-linear mixed model threshold concepts toanalyse
orderedcategorical
data. In: Proc 5th WorldCongr
GenetA
PP
L
Livest Prod22,
49-50Mayer
M(1994b)
Numericalintegration techniques
for point and interval estimation ofbreeding
values for threshold traits. In:l!5th
AnnualMeeting
of
theEuropean
Association
for
Animal Production.Edinburgh,
5-8September 1994
Meijering A,
Gianola D(1985)
Liner versus nonlinear methods of sire evaluation forcategorical
traits: a simulationstudy.
Genet Sel Evol17,
115-132Renand
G,
JanssLLG,
Gaillard J(1990)
Sire evaluation for direct effects ondystocia by
linear and threshold models. In: Proc
4rd
WorldCongr
GenetAppl
Livest Prod 465-467 StiratelliR,
LairdN,
Ware JH(1984)
Random effects models for serial observations withbinary
response. Biometrics 40, 967-971Weller JI, Misztal