HAL Id: hal-00894033
https://hal.archives-ouvertes.fr/hal-00894033
Submitted on 1 Jan 1994
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Assessment of a Poisson animal model for embryo yield
in a simulated multiple ovulation-embryo transfer
scheme
Rj Tempelman, D Gianola
To cite this version:
Original
article
Assessment of
a
Poisson
animal model
for
embryo yield
in
a
simulated
multiple
ovulation-embryo
transfer
scheme
RJ
Tempelman
1
D
Gianola
2
1
University of Wisconsin-Madison, Department of Dairy Science;
2University of Wisconsin-Madison, Department of
Meat and AnimalScience,
1675,Observatory Drive, Madison,
WI 53706, USA(Received
16 March 1993;accepted
10January
1994)
Summary - Estimation and
prediction techniques
for Poisson and linear animal models werecompared
in a simulationstudy
where observations were modelled asembryo
yields
having
a Poisson residual distribution. In a one-way model(fixed
meanplus
random animaleffect)
withgenetic
variance(0’;)
equal
to 0.056 or 0.125 on alog
linearscale,
Poissonmarginal
maximum likelihood(MML)
gave estimates of0
’; with
smallerempirical
bias and meansquared
error(MSE)
than restricted maximum likelihood(REML)
analyses
of raw andlog-transformed
data.Likewise,
estimates of residual variance(the
averagePoisson
parameter)
were poorer when the estimation wasby
REML. These results wereanticipated
as there is noappropriate
variancedecomposition independent
of locationparameters in Che linear model. Predictions of random effects obtained from the mode of the
joint posterior
distribution of fixed and random effects under the Poisson mixed-model tended to have smallerempirical
bias and MSE than best linear unbiasedprediction
(BLUP).
Although
the latter method does not take into accountnonlinearity
and doesnot make use of the
assumption
that the residual distribution wasPoisson, predictions
wereessentially
unbiased. Afterlog
transformation of therecords, however,
BLUP ledto
unsatisfactory predictions.
Whenembryo yields
of zero wereignored
in theanalysis,
Poisson animal modelsaccounting
for truncationoutperformed
REML and BLUP. A mixed-model simulationinvolving
one fixed factor(15 levels)
and 2 random factors for 4 sets of variance components was also carried out; in thisstudy,
REML was not includedin view of
highly heterogeneous
nature of variancesgenerated
on the observed scale. Poisson MML estimates of variance components wereseemingly unbiased, suggesting
that statistical information in the
sample
about the variances wasadequate.
Best linear unbiased estimation(BLUE)
of fixed effects had greaterempirical
bias and MSE than the Poisson estimates from thejoint posterior distribution,
with differences between the*
2
analyses increasing
with thegenetic
variance and with the true values of the fixed effects.Although
differences inprediction
of random effects between BLUP and Poissonjoint
modes weresmall, they
were oftensignificant
and in favor of those obtained with the Poisson mixed model. Inconclusion,
if the residual distribution isPoisson,
and if therelationship
between the Poisson parameter and the fixed and random effects islog linear,
REML and BLUE may lead to poorinferences,
whereas the BLUP ofbreeding
values isremarkably
robust to thedeparture
fromlinearity
in terms of average bias and MSE.Poisson distribution
/
embryo yield/
generalized linear mixedmodel / variance
component estimation/
countsRésumé -
Évaluation
d’un modèle individuel poissonnien pour le nombre d’embryons dans un schéma d’ovulation multiple et de transfert d’embryons. Destechniques
d’estimation et deprédiction
pour des modèles poissonniens et linéaires ont étécomparées
par simulation de nombres
d’embryons supposés
suivre une distribution résiduelle de Pois-son. Dans un modèle à unfacteur
(moyenne
fixée
eteffet
individuelaléatoire)
avec desvariances
génétiques
(Q! )
égales
à 0, 056 ou 0,125 sur une échelleloglinéaire,
la méthode demaximisation de la vraisemblance
marginale
(MML)
de Poisson donne des estimées deou
2ayant un biais empirique et une erreur
quadratique
moyenne(MSE)
inférieurs
àl’analyse
des donnéesbrutes,
outransformées
enlogarithmes,
par le maximum de vraisemblancerestreinte
(REML).
Demême,
la variance résiduelle(le
paramètre
de Poissonmoyen)
était moins bien estimée par le REML. Ce résultat étaitprévisible,
car il n’existe pas dedécomposition appropriée
de la varianceindépendante
desparamètres
deposition
dans le modèle linéaire. Lesprédictions
deseffets
aléatoires obtenues àpartir
du mode de la distributionconjointe
aposteriori
deseffets fixés
et aléatoires sous un modèle mixtepois-sonien tendent à avoir un biais
empirique
et une MSEinférieurs
à la meilleureprédiction
linéaire sans biais(BLUP).
Bien que cette dernière méthode ne prenne en compte ni la non-linéarité nil’hypothèse
d’une distribution résiduelle dePoisson,
lesprédictions
sontsans biais notable. Le BL UP
appliqué après transformation logarithmique
des données con-duitcependant
à desprédictions
nonsatisfaisantes. Quand
les valeurs nulles du nombred’embryons
sontignorées
dansl’analyse,
les modèles individuels poissonniens prenant encompte la troncature donnent de meilleurs résultats que le REML et le BL UP. Une
simu-lation de modèle mixte à un
facteur fixé
(15
niveaux)
et2 facteurs
aléatoirespour
4
en-sembles de composantes de variance aégalement
étéréalisée;
dans cetteétude,
le REML n’était pas inclus à cause de la nature hautementhétérogène
des variancesgénérées
sur l’échelle d’observation. Les estimées MMLpoissonniennes
sont apparemment nonbiaisées,
cequi suggère
quel’information statistique
sur les variances contenue dans l’échantillonest
adéquate.
La meilleure estimation linéaire sans biais(BLUE)
deseffets fixés
a un biais empirique et une MSEsupérieurs
aux estimées de Poisson dérivées de la distributionconjointe
aposteriori,
avec desdifférences
entre les2 analyses
qui augmentent avec lava-riance
génétique
et les vraies valeurs deseffets fixés.
Bien que lesdifférences
soientfaibles
entre leseffets
aléatoiresprédits
par le BL UP et par les modesconjoints
poissonniens,elles sont souvent
significatives
et enfaveur
de ces dernières. Enconclusion,
si ladistri-bution résiduelle est poissonnienne, et si la relation entre le
paramètre
de Poisson et leseffets fixés
et aléatoires estloglinéaire,
REML et BLUE peuvent conduire à desinférences
de mauvaisequalité,
alors que le BL UP des valeursgénétiques
se comporte d’une manièreremarquablement
robusteface
aux écarts à lalinéarité,
en termes de biais moyen et de MSE.distribution de Poisson
/ nombre
d’embryons/
modèle linéaire mixte généralisé/
INTRODUCTION
Reproductive
technology
isimportant
in thegenetic improvement
ofdairy
cattle. Forexample, multiple
ovulation andembryo
transfer(MOET)
schemes may aid inaccelerating
the rate ofgenetic
progress attained with artificial insemination andprogeny
testing
of bulls in thepast
30 years(Nicholas
andSmith,
1983).
An
important
bottleneck of MOETtechnology, however,
is thehigh variability
inquantity
andquality
ofembryos
collected fromsuperovulated
donor dams(Lohuis
etal, 1990 ;
Liboriussen andChristensen,
1990; Hahn, 1992; Hasler,
1992).
Keller andTeepker
(1990)
simulated the effect ofvariability
in number ofembryos following
superovulation
on the effectiveness of nucleusbreeding
schemes and concluded that increases of up to40%
inembryo
recovery rate(percentage
of cowsproducing
notransferable
embryos)
could more than halve female-realized selectiondifferentials,
the effect
being
greatest
for small nucleus units. Similar results were foundby
Ruane(1991).
In thesestudies,
it was assumed that residual variation inembryo
yields
wasnormal,
and thatyield
insubsequent superovulatory
flushes wasindependent
ofthat in a
previous
flush,
ie absence ofgenetic
orpermanent
environmental variation forembryo yield.
Optimizing
embryo
yields
could beimportant
for other reasons as well. Forinstance,
withgreater
yields,
the gap ingenetic gain
between closed and opennucleus
breeding
schemes could be narrowed(Meuwissen, 1991).
Furthermore,
because of
possible antagonisms
betweenproduction
andreproduction,
it may benecessary to use some selection
intensity
to maintainreproductive
performance
(Freeman, 1986).
Also,
ifyield
promotants,
such as bovinesomatotropin,
areadopted,
the relative economicimportance
ofproduction
andreproduction,
withrespect
togenetic selection,
willprobably
shift towardsreproduction. Finally,
ifcytoplasmic
or nonadditivegenetic
effects turn out to beimportant,
it would be desirable to increaseembryo yields by selection,
so as toproduce
theappropriate
family
structures(Van
Raden etal,
1992)
needed tofully exploit
these effects. Lohuis et al(1990)
found a zeroheritability
ofembryo yield
indairy
cattle.Using
restricted maximum likelihood(REML),
Hahn(1992)
estimated heritabilities of 6 and4%
for number ofova/embryos
recovered and number of transferableembryos
recovered,
respectively,
inHolsteins;
corresponding repeatabilities
were23 and
15%.
Naturaltwinning
ability
may beclosely
related tosuperovulatory
response in
dairy cattle,
as cow families withhigh
twinning
rates tend to have ahigh
ovariansensitivity
togonadotropins,
such as PMSG and FSH(Morris
andDay,
1986).
Heritabilities oftwinning
rate in Israeli Holsteins were found to be2%,
using REML,
and10%
employing
a threshold model(Ron
etal,
1990).
Best linear unbiased
prediction
(BLUP)
ofbreeding
values,
best linear unbiased estimation(BLUE)
of fixedeffects,
and REML estimation ofgenetic
parameters
are
widely
used in animalbreeding
research.However,
these methods are mostappropriate
when the data arenormally
distributed. The distribution ofembryo
yields
is notnormal,
and it isunlikely
that it can be rendered normalby
atransformation,
particularly
when meanyields
are low andembryo
recovery failurerates are
high.
Analysis
of discrete data with linearmodels,
such as thoseemployed
in BLUE or
REML,
often results inspurious
interactions whichbiologically
do notIt seems sensible to consider nonlinear forms of
analysis
forembryo yield.
Thesemay be
computationally
more intensive than BLUP andREML,
but can offer moreflexibility.
Thestudy
of Ron et al(1990)
suggests
that nonlinear models fortwinning
ability
may have thepotential
ofcapturing genetic
variance forreproduction
thatwould not be usable
by
selection otherwise. Forexample,
threshold models havebeen
suggested
forgenetic
analysis of categorial
traits,
such ascalving
ease(Gianola
and
Foulley,
1983;
Harville andMee,
1984).
In thesemodels,
gene substitutions areviewed as
occurring
in aunderlying
normal scale.However,
therelationship
betweenthe outward variate
(which
is scoredcategorically,
eg,’easy’
versus ’difficult’calving)
and the
underlying
variable is nonlinear and mediatedby
fixed thresholds. Selectionfor
categorical
traitsusing
predictions
ofbreeding
values obtained with nonlinear threshold models was shownby
simulation togive
up to12% greater
genetic
gain
ina
single
cycle
of selection than that obtained with linearpredictors (Meijering
andGianola,
1985).
Becausegenetic gain
iscumulative,
this increase may be substantial.The use of better models could also
improve
(eg,
smaller meansquared
error(MSE))
estimates of differences in
embryo
yield
between treatments.In the context of
embryo yield,
an alternative to the threshold model is ananalysis
based on the Poisson distribution. This is considered to be more suitablefor the
analysis
of variates where the outcome is a count that may take valuesbetween zero and
infinity.
A Poisson mixed-effects model has beendeveloped by
Foulley
et al(1987).
From thismodel,
it ispossible
to obtain estimates ofgenetic
parameters
andpredictors
ofbreeding
values.The
objective
of thisstudy
was to compare the standard mixed linear model with the Poissontechnique
ofFoulley
et al(1987),
viasimulation,
for theanalysis
ofembryo yield
indairy
cattle.Emphasis
was onsampling performance
of estimatorsof variance
components
(REML
versusmarginal
maximumlikelihood, MML,
for the Poissonmodel),
of estimators of fixed effects and ofpredictors
ofbreeding
values
(BLUE
and BLUP evaluated at average REML estimates ofvariance,
versusPoisson
posterior
modes evaluated at the true values ofvariance).
AN OVERVIEW OF THE POISSON MIXED MODEL
Under Poisson
sampling,
theprobability
ofobserving
a certainembryo yield
response
(y
i
)
on female i as a function of the vector ofparameters
9 can be writtenas:
with
The Poisson mixed model introduced
by
Foulley
et al(1987)
makes use of the link function ofgeneralized
linear models(McCullagh
andNelder,
1989).
This function allows themodelling
the Poissonparameter
A
i
for female i in terms of 0. Thisparameterization
differs from thatpresented
inFoulley
et al(1987)
who modelled Poissonparameters
for individualoffspring
of eachfemale, allowing
for extensionin
Foulley
and Im(1993)
and Perez-Enciso et al(1993).
In the Poissonmodel,
the link is thelogarithmic
function.such that
Above,
0’ =
![3’,
u’], 13
is a p x 1 vector of fixedeffects,
u is a q x 1 vector ofbreeding
values,
andw’
=[x!
z!]
is an incidence row vectorrelating
0 to r¡i. Let X =fx’l
and Z =fz’l
be incidence matrices of order n x p and n x q,respectively,
such that:
In a
Bayesian
context,
Foulley
et al(1987)
employ
theprior
densitiesand
where A is the matrix of additive
relationships
between animals andQ
u
is
the additivegenetic
variance. Giveno,’,
Foulley
et al(1987)
calculate the mode of thejoint posterior
distribution of(3
and u with thealgorithm
where t denotes iterate
number,
and where y is the vector of observations. Note that the last term in
[8]
canbe
regarded
as a vector of standardized(with
respect
to the conditional Poissonvariance)
residuals.Marginal
maximum likelihood(MML),
ageneralization
ofREML,
has beensuggested
forestimating
variancecomponents
in nonlinear models(Foulley
etal,
1987;
H6schele etal,
1987).
Anexpectation-maximization
(EM)
type
iterativewhere T =
trace(A-
1
C
u
&dquo;),
such thatand u is the
u-component
solution to[7]
upon convergence for agiven
o,’
value.
In!9!,
kpertains
to the iterationnumber,
and iterations continue until the difference between successive iterates of[9],
separated
by
nested iterates of[7],
becomesarbitrarily
small. The aboveimplementation
of MML is notexact,
and arises fromthe
approximation
(Foulley
etal,
1990)
SIMULATION EXPERIMENTS
A one-way random effects model
Embryo yields
in two MOET closed nucleus herdbreeding
schemes were simulated.Breeding
values(u)
forembryo yields
for n, and nd basepopulation
sires anddams, respectively,
were drawn from the distribution u NN(0, I!u),
where0’
;
had
the valuesspecified
later. The dams weresuperovulated,
and the number ofembryos
collected from each dam were
independent drawings
from Poisson distributions withparameters:
where 1 is a nd x 1 vector of ones, p is a location
parameter
and ud is the vectorof
breeding
values of the nd dams. In nucleus1,
f
..l
=ln(2),
whereas in nucleus 2p =
ln(8).
Note from[12]
that for agiven
donor damd
i
,
so, in view of the
assumptions,
which
implies
that the locationparameter
can beinterpreted
as the mean of the naturallog
of the Poissonparameters
in thepopulation
of donor dams. It should benoted,
as inFoulley
and Im(1993)
thatThus
The sex of the
embryos
collected from the donor dams wasassigned
at randomembryo
to age at firstbreeding
was !r = 0.70 in nucleus1,
and 7r = 0.60 in nucleus 2. This is because research hassuggested
thatembryo quality
andyield
from asingle
flush tend to be
negatively
associated(Hahn, 1992).
Thus theexpected
number of femaleembryos
surviving
to age at firstbreeding produced by
agiven
donor damd
i
is,
for i =1, 2, ... n
d
,
and,
on average,The
genetic
merit forembryo
yield
for the ith femaleoffspring,
uo! , wasgenerated
by randomly selecting
andmating
sires and dams from the basepopulation,
andusing
therelationship:
where USi and udi are the
breeding
values of the sire anddam,
respectively,
ofoffspring
i,
and the third term is a Mendeliansegregation residual;
z -N(0,1).
As with the
dams,
the vector of true Poissonparameters
for femaleoffspring
was
71
0
=exp[1p
+u
o]
where uorepresents
the vector ofdaughters’
genetic
values. The unit vector 1 in this case would have dimension
equal
to the number ofsurviving
femaleoffspring. Embryo yields
fordaughters
weresampled
from aPoisson distribution with
parameter
equal
to the ith element ofX
o
.
Four
populations
weresimulated,
and each wasreplicated
30 times:1)
nucleus 1(g = In 2),
U2
= 0.056;
2)
nucleus 1(!
=In 2),
Q
u
=0.125;
3)
nucleus 2=
In 8),
Qu
= 0.056 ;
and4)
nucleus 2(u
=ln8), !
= 0.125. Features of the 2 nucleus herds are in table I. Theexpected
nucleus size isslightly
greater
thann
s +
n
d
(1
+(
7
!,/2)exp(J.l)),
ie about 218 cows in each of the 2schemes,
plus
thecorresponding
number of sires. The values ofor were
arrived at as follows:Foulley
et al
(1987), using
a first orderapproximation,
introduced theparameter
which can be viewed as a
’pseudo-heritability’. Using this,
the values ofu
£
in
the4
populations
correspond
to:1)
‘h
2
’
=0.10;
2)
’
‘h
2
=0.20;
3)
‘h
2
’
=0.31;
and4)
’h
2
,
= 0.50.In each of the 30
replicates
of eachpopulation,
variancecomponents
forembryo
yield
were estimatedemploying
thefollowing
methods:1)
Poisson MML as inFoulley
et al(1987); 2)
REML as if the data werenormal;
3)
truncated Poisson MMLexcluding
counts of zero, andusing
the formulae ofFoulley
etal,
(1987);
4)
REML-0,
ie REMLapplied
to the dataexcluding
counts of zero; and(5)
non-null responses while
discarding
the null responses.Empirical
bias and meansquared
error(MSE)
of theestimates,
calculated from the 30replicates,
wereused for
assessing
performance
of the variancecomponent
estimationprocedure.
Because theprobability
ofobserving
a zero count in a Poisson distribution with a mean of 8 is verylow,
the truncated Poisson and REML-0analyses
were notcarried out in nucleus 2.
Likewise,
breeding
values werepredicted
using
thefollowing
methods:
1)
the Poisson model as in[7]
with the trueo!,
andtaking
aspredictors
A =
exp[1Q
+û],
where u is the vector ofbreeding
values ofsires,
dams,
anddaughters;
2)
BLUP(1!*
+u
*
)
in a linear modelanalysis
where the variancecomponents
were the average of the 30 REML estimates obtained in thereplications
and the asterisk denotes direct estimation of locationparameters
on the observedscale;
3)
a truncated Poissonanalysis
with the truea
and
predictors
as in1);
4)
BLUP-0,
as in2)
butexcluding
zerocounts,
andusing
the average of the30 REML-0 estimates as true
variances;
and5)
BLUP-LOG,
as in2)
afterexcluding
zero counts and
transforming
theremaining
records intologs.
The average of the30 REML-LOG estimates of variance
components
was used in this case.BLUP-LOG
predictors
ofbreeding
values wereexpressed
asexp[lti
+
u]
where p
andu
aresolutions to the
corresponding
mixed linear modelequations. Hence,
all 5types
ofpredictions
werecomparable
becausebreeding
values areexpressed
on the observed scale. Asgiven
in!12!,
the vector of true Poissonparameters
orbreeding
valuesfor all individuals was deemed to be A =
exp[1p
+u].
Average
bias and MSE ofprediction
ofbreeding
values of dams anddaughters
werecomputed
within eachdata set and these statistics were
averaged
again
over 30 furtherreplicates.
Rankcorrelations between different estimates of
breeding
values were not considered asthey
are often verylarge
inspite
of the fact that one model may fit the datasubstantially
better than the other(Perez-Enciso
etal,
1993).
A mixed model with two random effects
The base
population
consisted of 64 unrelated sires and 512 unrelateddams,
and thegenetic
model was as before. Theprobability
of adaughter surviving
to age atEmbryo yields
on dams anddaughters
weregenerated
by
drawing
randomnumbers from Poisson distributions with
parameters:
where p
is a fixed effect common to allobservations,
H ={H
i
}
is a 15 x 1 vectorof fixed
effects,
s ={
Sj
} ’&dquo; N(0,Iu£)
is a 100 x 1 vector of unrelated ’service sire’effects,
u =j
Uk
} -
N(0,
)
U2
A
is a vector ofbreeding
valuesindependent
of service sireeffects,
and0
’;
and
0
’;
areappropriate
variancecomponents.
The values
of
+ Hi
wereassigned
such that:Thus,
in the absence of randomeffects,
theexpected
embryo yield ranged
from 1 to 15. Each of the 15 values of fl+ H
i
had anequal
chance ofbeing assigned
to anyparticular
record.Service sire has been deemed to be an
important
source of variation forembryo
yield
insuperovulated dairy
cows(Lohuis
etal, 1990; Hasler,
1992).
However,
no sizablegenetic
variance has been detected whenembryo yield
is viewed as a trait of the donor cow(Lohuis
etal, 1990; Hahn,
1992).
This influenced the choice of the 4 different combinations of true values for the variancecomponents
considered. In all cases, the service sire
component
was twice aslarge
as thegenetic
component.
The sets of variancecomponents
chosen were:(A)
u
£
=
0.0125,
=
0
.
0250;
(
B
)
Qu
=0
.
0250
,
g
2
=0500;
.
0
(
C
)
o
r2
=0
.
0375
,
U2
=0
.
0750;
and
(D)
U2
=0.0500,
a;
= 0.1000.Along
the lines of[14],
thegenetic
variancescorrespond
to’pseudoheritabilities’
of7.5-22%,
and to relative contributions of service sires to varianceof
15-44% ;
these calculations are based on theapproximate
average true fixed
effect A
on the observed scale in the absence ofoverdispersion:
For each of the 4 sets of variance
parameters,
30replicates
weregenerated
to assess thesampling performance
of Poisson MML in terms ofempirical
biasand square root MSE. Relative bias was
empirical
bias as apercentage
of thetrue variance
component.
Coefficients of variation for REML and MML estimatesof variance
components
were used toprovide
a directcomparison
asthey
areexpressed
on different scales. REML estimates were alsorequired
in order to compare estimates of fixed effects andpredictions
of random effects obtainedunder a linear
mpdel
analysis
with those found under the Poisson model. MML and REML estimates werecomputed by
Laplacian integration
(Tempelman
andGianola, 1993)
using
a Fortran program thatincorporated
a sparse matrixsolver,
SMPAK
(Eisenstat
etal,
1982)
and ITPACK subroutines(Kincaid
etal,
1982)
toset up the
system
ofequations !7!.
ForREML,
thiscorresponds
to the derivative-freealgorithm
describedby
Graser et al(1987)
with acomputing
strategy
similarto that in Boldman and Van Vleck
(1991).
well
defined)
tocompute
estimates of fixed effects andpredictions
of random effects in the linear modelanalysis;
for the Poissonmodel,
the true values of the variancecomponents
were used.Empirical
biases and MSEs of the estimates of fixed effects obtained with the linear and with the Poisson models were assessed from another30
replicates
within each set of variancecomponents.
One morereplicate
was thengenerated
for each variancecomponent set,
from which theempirical
average bias and MSE ofprediction
of service sire and animal random effects were evaluated.In order to make
comparisons
on the samescale,
the Poisson modelpredictands
of the random effects were defined to be
b.exp(s)
for service sires andb!exp(u)
foradditive
genetic effects,
respectively;
b is the ’baseline’parameter:
In view of
!15!,
so that
Hence,
The ’baseline’ value can then be
interpreted
as theexpected
value of the Poissonparameter
of an observation made under the conditions of an’average’
level ofthe fixed effects and in the absence of random effects. The Poisson mixed-model
predictions
were constructedby replacing
the unknownquantities
inb, exp(s),
andexp(u)
by
theappropriate
solutions in[7].
In the linear mixed
model,
thepredictors
were defined to be:and
for service sire and
genetic effects,
respectively.
Here the unit vectors 1 are of thesame dimension as the
respective
vectors of random effects and the asterisk is usedto denote direct estimation of location
parameters
on the observed scale.Estimators for fixed effects were also
expressed
on the observed scale. The truevalues of the fixed effects were deemed to be i =
exp(p
+H
i
)
for i =1, 2, ...
15 asin
!16!.
Estimators
for fixed effects under the Poisson model were therefore taken tobe
exp(ti+!)
for i =1, 2, ... 15.
As the linear mixed model estimatesparameters
RESULTS AND DISCUSSION
One-way
modelMeans and standard errors of estimates of the
genetic
variance(
0
&dquo;)
for the fiveprocedures
aregiven
in table II and MSEs of the estimates aregiven
in table III.Clearly,
estimates obtained with REML and REML-0 wereextremely biased;
thisis so because the
genetic
components
obtained are not on theappropriate
scale ofmeasurement
(ie
the canonicallog
scale).
Theproblem
was somewhat correctedby
alogarithmic
transformation of the records. ForE(A
i
) *
2 andQ
u
=
0.056,
the
REML-LOG,
Poisson and Poisson-truncated estimators werenearly
unbiased(within
the limits of Monte-Carlovariance),
but the Monte-Carlo standard errors were muchlarger
for REML-LOG. ForQ
2
u
=0.125,
the Poisson estimates were biased downwards(P
<0.05)
for both values ofE(A
i
),
while those ofREML-LOG were biased
upwards
andsignificantly
so withE(A; ) x5
8. In a one-way sire thresholdmodel,
H6schele et al(1987)
also found downward biases for the MMLprocedure.
Inspite
of these smallbiases, however,
the MSEs of the Poisson estimates(table
III)
were much lower than those of REML-LOG. The verylarge
(relative
toPoisson and
REML-LOG)
MSEs of the REML and REML-0procedures
illustrate thepitfalls
incurred incarrying
out a linear modelanalysis
when the situationdictates a nonlinear
analysis,
or a transformation of the data.A linear one-way random effects
model, however,
can be contrived in which caseit can be shown that REML may
actually
estimate somewhatmeaningful
variancecomponents
on the observed scale.Presuming
thatmultiple
records on an individualis
possible,
the variance ofYi! (with
subscripts denoting
thejth
record on the ithindividual)
can beclassically represented
as:such that from
[13c]
and resultspresented
by
Foulley
and Im(1993):
The covariance between different records on the same individual
(ie
cov(Y!,
Y!!!)
can be used torepresent
the variance of the random effects.Given
independent
Poissonsampling
conditional on ui, the first term of the aboveequation
isnull,
andThus a one-way random linear model that has the same first and second moments
as Y
ij
is
where
Y2!
is thejth
record observed on the ithanimal,
is
the overall mean,u*
is the random effect of the ith animal ande !
is the residual associated with thejth
record on the ith animal. Here(i
*
= exp()i+o-!/2)
ui
has null mean and variancea 2
*
=exp(2p)exp(u
£
)
[exp (
g
2
)
1]
andeij
has null mean and variancea e 2*
=exp
(p
-f-Q!/2).
Theempirical
mean REML estimatesreported
in table IIclosely
relate to the functionals foror u 2
*
in!22b!,
inspite
of the violatedindependence
underestimated
E(!i),
asexpected
theoretically.
In the Poissonmodel,
the residualvariance is the Poisson
parameter
of the observation inquestion.
Hence the residual variance in a linear modelanalysis
would becomparable
toE(A
i
).
Thelog-transformed REML estimates
(REML-LOG)
have nomeaning
here because thePoisson residual variance is
generated
on the observedscale,
contrary
to thegenetic
variance which arises on alogarithmic
scale.Generally,
the Poisson and REMLmethods gave
seemingly
unbiased estimates of the true average Poissonparameter.
However,
REML estimates of residualvariance,
rather,
ofE(A
i
),
appeared
to bebiased
upwards
(P
<0.01)
for thehigher
genetic
variance andhigher
Poisson meanpopulation
(table IV).
The MSEs of Poisson estimates of average residual varianceswere much smaller than those obtained with
REML,
especially
in thepopulations
with ahigher
mean. REML-0 was even worse thanREML,
both in terms of bias(table IV)
and MSE(table V).
This is due to truncation of the distribution(eg,
Carriquiry
etal,
1987)
which is not taken into account in REML-0. The truncated Poissonanalysis
gaveupwards
biased estimates and hadhigher
MSE than thestandard Poisson method.
However,
truncated Poissonoutperformed
REML in an MSE sense inestimating
the average residualvariance,
inspite
ofusing
less data(zero
counts notincluded).
Empirical
mean biases ofpredictions
ofbreeding
values for dams anddaughters
are shown in table VI forQ
u
= 0.056 and table VII forQ
u
= 0.125. Poisson-basedmethods and BLUP gave unbiased estimates of
breeding
values while BLUP-LOGand BLUP-0
performed
poorly;
BLUP-LOG had a downward bias and BLUP-0 had anupward
bias. Predictions ofbreeding
values for the truncated Poissonanalysis
weregenerally
unbiased.Empirical
MSEs ofpredictions
ofbreeding
values are shown in tables VIII(
U2
=0.056)
and IX(
Q
u
=0.125).
Paired t-tests were used inassessing
theperformance
of thecomparisons
Poisson versus BLUP(and
BLUP-LOG)
andPoisson truncated versus BLUP-0. BLUP-LOG and BLUP-0
procedures
had thelargest
MSEs,
probably
due to their substantialempirical
bias. Forufl =
0.056when
E(Ai ) x5
2,
and a smaller(P
<0.05)
MSE forpredicting daughters’ breeding
values when
E(Ai) !!
8. Forufl
= 0.125(table
IX),
Poisson and BLUP had similarMSEs when
E(Ai )
*
2,
but Poisson had smaller MSEs than BLUP(P
<0.05)
for both dams anddaughters
whenE(Aj )
x5
8. These small differences between BLUP and Poisson are somewhatsurprising
in view of the different scales ofprediction,
and their
practical
significance
is an openquestion.
It was alsosurprising
that thedifferences between BLUP and Poisson tended to be more
significant
withE(Ai ) *
8than with
E(Ai ) x
5
2,
since it is known that the Poisson distributionapproaches
anormal distributions as
A
i
increases(Haight,
1967).
However,
this may be due toa
higher
power of the test whendetecting
alarger
difference. Anotherexplanation
may be that a lower
E(A
i
)
leads to a lower’pseudoheritability’
and, hence,
a lowerdegree
of association betweenphenotypes
andbreeding
values. In this case, the linear and Poisson models may differ less whenpredicting
breeding
values because of ahigher degree
ofshrinkage
towards zero. When counts of zero wereexcluded,
the Poisson-truncated method had
always
smaller MSEs ofprediction
ofbreeding
values than the BLUP-0 method.
Mixed model
Because variance
components
estimatedby
MML and REML are on differentscales, empirical
coefficients of variation(CV)
were used toprovide
a basis ofcomparison
(fig
1).
Clearly,
REML estimates were more variable than their Poissoncounterparts.
No clearpattern
withrespect
toincreasing
values of the variancecomponents
emerged,
except
that CVs for both MML service sire andgenetic
estimates seemedrelatively
more stable while CVs for REML service sire estimatessteadily
increased for values ofufl
larger
than 0.0250(ie
0’
; =
0.0500).
Forcount data with low means, variance
components
that are estimated under linearmixed-effects
models,
moregeneral
than the one-way random effectsmodel,
areheteroskedastic from one observation to the
next,
depending
on bothexperimental
design
and locationparameters
(Foulley
andIm,
1993).
Relative biases of the MML estimates are
given
infigure
2. Relative biases wereless than
4%
for all 4 sets of variancecomponents,
with no clear trend withrespect
to the true values of variance
components.
Using
t-tests,
these biases did not differfrom zero. Unlike results obtained with threshold models
(eg,
H6schele etal, 1987;
Simianer and
Schaeffer,
1989),
a small subclass(equal
to 1 in the Poisson animalmodel)
did not lead to detectable bias of variancecomponent
estimates in the Poisson mixed model.Relative errors
(square
roots of theempirical
MSEs,
expressed
as apercentage
the size of the true variances were somewhat
opposite
for service sire andgenetic
variance
component
estimates. Relative error for thegenetic
component
decreased as the true varianceincreased,
whereas the error of the service sire estimatesincreased somewhat with the true value of the
parameter.
The relative errors ofMML estimates were almost identical to their
empirical
CVs(fig 1)
because of the smallbias,
as shown infigure
2.Empirical
biases of fixed-effect estimates obtained with linear and Poissoncomponents.
The 2 methods gave estimates that were biasedupwards,
but the bias waslarger
for BLUE in all 4 cases. Thisapparent
paradox
can beexplained
and not
exp(p
+H
i
).
Empirical
biases for the 2methods,
and theirdifference,
increased withincreasing
values of fixed effects and withhigher
values of variancecomponents.
Theupward
biases of the Poisson estimates weregenerally
stable across sets of variancecomponents,
being
always
less than 0.5.However,
the bias of the BLUEs of the fixed effects increasedsubstantially
as variance increased.BLUE estimates.
Empirical
MSEs of the fixed effects estimates aredepicted
infigures
8-11 for each of the 4 sets of variancecomponents.
As withempirical
biases,
MSEs of the linear model and Poisson
estimates,
and the differences between the MSEs of the 2procedures
tended to increase withincreasing
values of the variancecomponents
and with the true values of the fixed effects. The MSEs of the Poisson estimates wereagain
more stable across sets of variancecomponents
and,
although
tending
to increase with the value of the fixedeffects,
were much smaller than thoseof BLUE estimates. For
example
(fig 11),
whenembryo yield
was around15,
theMSE of BLUE was about 6 times
larger
than that of Poisson estimates.Because the values of fixed effects were constructed such that:
exp(
1
l + Hi
+d
- exp(
1
l + H
i
) = 1;
i = 1, 2, ... ,14
it was of interest to examine the extent to which the 2 estimators would
capture
suchdifference. If the estimates are
regressed against
truevalues,
the ’best’ estimator shouldgive
aslope
close to one. Suchregressions
were assessedby
ordinary
least-squares.
Empirical
biases and MSEs of the estimates of theregression
effectsare
given
infigures
12 and13,
respectively. Regression
estimates obtained underboth models were
slightly
biaseddownwards,
particularly
BLUE;
the absolute biasincreased somewhat as true variance
components
increased in value. The differencesbetween the biases of the 2 estimators were found
significant
in all casesusing
apaired
t-test(P
< 0.05 for variancecomponent
setA,
and P < 0.01 for variancecomponent
setsB, C,
D).
The differences inempirical
MSEs were in the samedirection,
ie BLUE hadlarger
MSEs.Empirical
average biases andempirical
average MSEs of thepredictions
of service sire effects aregiven
infigures
14 and15,
respectively.
The statistics wereslightly
(P
<0.05)
from 0only
ator2
= 0.0500. Based onpaired
t-tests,
empirical
averageMSEs were not different between the 2 methods.
Statistics associated with
prediction
ofgenetic
effects aredepicted
infigures
16and
17,
with resultspresented
separately
for basegeneration
sires anddams,
and for their female progenysurviving
to age at firstbreeding. Poisson-predicted breeding
values tended to be biaseddownwards,
whereas BLUPpredictions
were biasedsolutions were
significantly
biased(P
<0.01)
at the 2highest
variancecomponents.
Dam anddaughter
BLUP solutions weresignificantly
biased(P
< 0.01 in all cases,except
P < 0.05 for dams atufl =
0.0125).
Poisson dam solutions were biased(P
<0.01)
only
atU2
-
0.0250,
while Poissondaughter
solutions were biased ato
l
2= u
0.0250(P
<0.01)
and a
=
0.0375(P
<0.05).
Certainly,
allpredictions
reflect
uncertainty
in the baseline estimates of fixed effectsgiven
in[17]
and[20]
forPoisson and BLUP
models, respectively,
andupward
biases in BLUPpredictions
Differences in
empirical
average MSEs ofprediction
ofbreeding
values between the 2procedures
aredepicted
infigure
17.Although
the differences between the 2 methods weresmall,
apaired
t-test gavesignificant
differences fordaughters
(P
< 0.01 forU2
=0.0125,0.0375,
and0.0500;
P < 0.05 forU
2
-
0.0250)
and forbase
generation
dams,
except
ator =
0.0125(P
< 0.01 for’7!
=0.0250,0,0500;
P < 0.05 for
0
’;
=0.0375).
Differences inempirical
average MSE of
predictions
ofCONCLUSIONS
This
study
evaluated thesampling performance
of estimators of location anddispersion
parameters
in Poisson mixedmodels,
and ofpredictors
ofbreeding
valuesin the context of simulated MOET schemes. Records on
embryo yield
were drawnfrom Poisson distributions for 4
populations
characterizedby
appropriate
parameter
values.Analyses
were carried out with the Poisson model and with a linear modelIn
general,
the estimates ofparameters
and thepredictions
of random effects obtained with the Poisson model were better than their linear modelcounterparts.
This is. not
surprising because,
tobegin with,
in the Poissonanalysis,
the datawere
analyzed
with the modelsemployed
togenerate
the records in the simulation.Further,
if the distribution is indeedPoisson,
there is noappropriate
variancedecomposition independent
of locationparameters
in the linearmodel,
which makes the variancecomponent
estimatorsemployed
with normal data somewhatinadequate
forestimating dispersion
parameters,
particularly
in the mixed effects model. Alog-transformation
improved
(relative
to untransformedREML)
the meansquared
errorperformance
of REML estimates ofgenetic variance,
but worsenedthe estimates of residual variance.
Similarly,
fixed effects were estimatedby
BLUEwith a
larger
bias and meansquared
error than when estimatedby
the Poissonmodel.
BLUP was found to be robust in
predicting
randomeffects,
although
Poissonjoint
modes weresignificantly
better withrespect
to bias and MSE in manycases. This is
remarkable,
considering
that the BLUPpredictors
were based onlocation-parameter-dependent
estimates of variance.BLUE, however,
showed bias withincreasing
values of fixed effects and variancecomponents.
Truncated-Poisson estimators andpredictors always
outperformed
BLUP and REML when zero countswere excluded from the
analysis;
truncation of null counts may be common in field records on traits such as litter size(Perez-Enciso
etal,
1993).
Estimates of variance
components
obtainedby
MML in Poisson mixed models didnot exhibit the
typical
bias due to small subclass sizes encountered often in thresholdmodels.
Rather,
the downward biases of MML found in the one-way models mayet al
(1987)
attributed the ’small subclass’ bias in threshold models toinadequacy
of a normal
approximation
invoked in MML. The sameapproximation
is madewhen
estimating
variance with the Poissonmodel,
but its consequences may be less critical here.In
conclusion,
REML and BLUE do notperform
well when theassumption
of aPoisson distribution
holds,
even if allpertinent
factors in the model are includedin the
analysis.
BLUP, however,
was found to berobust, although
it had aslightly
inferior
sampling
performance.
Thisstudy
did not address the situation when the data arecounts,
but the distribution is not Poisson. For thisinstance,
estimatorsbased on the
assumption
ofnormality
mayperform
better than those thatrely
onthe Poisson
distribution,
due to central limittheory.
It should also be noted that if theoperational
Poisson model fails to consider allpertinent
explanatory
factors,
the conditional variance may be
substantially
greater
than the conditional meanof an
observation;
these 2parameters
are defined to beequal
as in!2!.
Possiblediagnostics
forinvestigating
this source ofoverdispersion
were discussedby
Dean(1989).
ACKNOWLEDGMENTS
The authors thank M Perez-Enciso and I Misztal for
making
available to them sparsematrix subroutines for computations
involving relationship
matrices. The authors also thank R Fernando and TWang
forhelpful suggestions
and comments. The work waspartially
fundedby
grants from 21stCentury
Genetics(Shawano,
WI,
USA),
Sire Power Inc(Tunkhannock,
PA,
USA)
andby
the National Pork Producers Council(Des
Moines,
10,
USA).
REFERENCES
Boldman
KG,
Van Vleck LD(1991)
Derivative-free restricted maximum likelihood estimation in animal models with a sparse matrix solver. JDairy
Sci74, 4337-4343
Carriquiry
AL,
GianolaD,
Fernando RL(1987)
Mixed modelanalysis
of a censorednormal distribution with reference to animal
breeding
Biometrics43,
929-939 Dean CB(1992)
Testing
foroverdispersion
in Poisson and binomialregression
models. J Am Stat Assoc87,
451-457Eisenstat
SC,
Gursky MC,
SchultzMH,
Sherman AH(1982)
Yale sparse matrixpackage.
I. Thesymmetric
codes. Int J Numer MethEng 18,
1145-1151Fernando
RL,
Gianola D(1986)
Optimal
properties
of the conditional mean as aselection criterion. Theor
Appl
Genet72,
822-825Foulley
JL,
GianolaD,
1m S(1987)
Genetic evaluation of traits distributed asPoisson-binomial with reference to
reproductive
characters. TheorAppl
Genet73,
870-877