HAL Id: hal-00894228
https://hal.archives-ouvertes.fr/hal-00894228
Submitted on 1 Jan 1998
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Attenuating effects of preferential treatment with
Student-t mixed linear models: a simulation study
Ismo Strandén, Daniel Gianola
To cite this version:
Original
article
Attenuating
effects
of
preferential
treatment
with
Student-t
mixed
linear
models:
a
simulation
study
Ismo
Strandén
a
Daniel Gianola
aDepartment
of AnimalSciences, University
ofWisconsin,
Madison,
WI 53706, USAb
Animal Production
Research, Agricultural
Research Centre -MTT,
31600
Jokioinen,
Finland(Received
14May
1998;accepted
16 October1998)
Abstract - Preferential treatment of cows in four herds of a
multiple
ovulation andembryo
transfer scheme under selection was simulated. Prevalence and amount ofpreferential
treatmentdepended
on a function correlated with truebreeding
value. Three mixed effect linear models werecompared
in terms of theirability
to handlepreferential
treatment: the classical Gaussianmodel,
a model with multivariatet-distributed errors clustered
by herd,
and a model withindependent
t-distributederrors. In the models with t-distributed errors, both the scale parameters and the
degrees
of freedom were considered unknown. ABayesian analysis
was carried outfor all three models via the Gibbs
sampler,
and posterior means were used to inferabout genetic
variance, herd-year effects, breeding
values and realised response toselection. Performance over repeated
sampling
was assessed via Monte Carlo meansquared
error. In the absence ofpreferential
treatment, the three models had asimilar
performance.
Whenpreferential
treatment wasprevalent
and strong, theunivariate t-model was the
best; hence,
the Gaussianassumption
for the errors wasclearly inappropriate.
It appears that some robust linear models can handlepreferential
treatment of animals better than the standard mixed effect linear model with Gaussianassumptions.
©
Inra/Elsevier,
Parisdairy cattle
/ preferential
treatment/ simulation / Bayesian
statistics/ Student-t
distribution
/ Gibbs
sampling*
Correspondence
and reprintsE-mail: [email protected]
Résumé - Atténuation des effets de traitement préférentiel dans un modèle
linéaire mixte à distribution de Student
(t).
Étude
de simulation. On a simulé letraitement
préférentiel
de certaines vaches dans quatre troupeaux de sélection utilisantont
dépendu
d’une fonction corrélée à la valeurgénétique
vraie. On acomparé
trois modèles linéaires mixtes pour leuraptitude
àprendre
en compte le traitementpréférentiel :
le modèleclassique Gaussien,
un modèle avec des erreurs t-multivariatesgroupées
par troupeau et un modèle avec des erreurs t-distribuéesindépendantes.
Dans le modèle où les erreurs suivaient une distribution t, lesparamètres
d’échelle et lesdegrés
de liberté ont été considérés inconnus. Uneanalyse
bayésienne
aété effectuée pour les trois modèles à partir de
l’échantillonnage
de Gibbs et les moyennes a posteriori ont été utilisées pour en inférer ausujet
de la variancegénétique,
des effetstroupeau-année,
des valeursgénétiques
et desréponses
réalisées à la sélection. Laperformance
des modèles a été évaluée au travers des erreursquadratiques
moyennes. En l’absence de traitementpréférentiel,
les trois modèlesont eu une
performance
similaire.Quand
le traitementpréférentiel
a étéfréquent
etd’effet important, le modèle t-univariate a été le meilleur et le modèle Gaussien a été
clairement
inadapté.
Ilapparaît
que des modèles linéaires robustes peuventprendre
encompte les traitements
préférentiels
mieux que les modèles linéaires mixtes Gaussiensclassiques. @
Inra/Elsevier,
Parisbovins laitiers
/
traitement préférentiel/
simulation / statistique bayésienne /distribution de Student
1. INTRODUCTION
Preferential treatment is any
management
practice
that isapplied
non-randomly
to animals within acontemporary
group[9].
Forexample,
betterhousing
andfeeding,
hormonaltreatment,
longer milking
intervals on testday
and
feeding according
toproduction
are known to beapplied selectively
indairy production.
Preferential treatment occurs indairy cattle, presumably
toincrease the economic value of a cow or the
probability
that it will be chosen asa bull-dam. Several studies
(e.g. [17,
20!)
have found thatgenetic
evaluationsfor milk
yield
are inconsistent withexpectations
based ontheory.
This maybe due to
inadequate
statisticalassumptions
or failure to accountproperly
for selection orpreferential
treatment of cows.Preferential treatment is often
suspected
when noapparent
reasons exist forsuch
discrepancies.
Kuhn et al.[9]
simulated effects ofpreferential
treatmenton ’animal model’
genetic
evaluations. Meansquared
error ofprediction
ofbreeding
values increased as the extent ofpreferential
treatment increased. Kuhn and Freeman[10]
found that when the dam of a sire was treatedpreferentially,
more than 30daughters
with untreated records were needed tooffset the bias in
prediction
ofbreeding
value causedby
the dam’s information. Bias increased as theproportion
and number ofdaughters receiving
preferential
treatment increased. Bias decreased when all
daughters given preferential
treatment were in the same
herd;
this is so because the’herd-year’
effect in the modelcaptures part
of thepreferential
treatment administered in aparticular
herd-year.
In order to account for
preferential
treatment,
Harbers et al.[7]
included anenvironmental correlation between related females in a
genetic
evaluation modelfor a MOET
(multiple
ovulation andembryo
transfer)
scheme. Thisimproved
accuracy of cow evaluations when
preferential
treatment was mild.Weigel
et al.[29]
simulated differentstrategies
ofpreferential
treatment and found that itwithin a herd is treated
preferentially.
Burnside andMeyer
[3]
simulated effectsof bovine
somatotropin
(bST).
Sire evaluations were least accurate when bST administration wastargeted
to the bestproducing
cows.In the context of
prediction
(e.g. !8]),
a bias takesplace
when theexpected
values of thepredictand
and of thepredi!tor
differ. Evaluation of biasrequires
knowledge
of the true modelbut,
inpractice,
this is notavailable,
so ad hocassessments of bias have been
suggested.
Several studies[15,
16, 27,
28]
foundupward
’biases’ of cow’spedigree
indexes forprotein
or milkyield
in FinnishAyrshire.
It is unclear if thisdiscrepancy
is due tochance,
butpreferential
treatment of dams of cows may be a
culprit.
On the otherhand,
Powelland Norman
[19]
found thatpedigree
indexes understated the first estimatedbreeding
values ofdaughters
of proven sires mated to lowerproducing
dams. Little work has been undertaken on how to cope withpreferential
treatment inpractice,
at least from a statisticalpoint
of view. Kuhn and Freeman[11]
studied power transformations of records but this was, at
best, slightly
effectivein
reducing
bias due topreferential
treatment. An alternativeapproach
is toconsider an error distribution with thicker tails than the
normal,
to allow formore variation. A
commonly
used one is thet-distribution,
which issymmetric
and
leptokurtic.
It has been advocated because of itssimplicity
[12],
and becauseonly
oneparameter
(the
degrees
offreedom)
is needed to describe robustness. A suitable robust distribution may becapable
ofattenuating
theimpact
of outliers on dataanalysis. Many
authors haveemployed
statisticalmodels with t-distributed residuals
[4,
12,
13, 25,
31]
in linear and non-linearregression models,
withvarying
degrees
of success. Use of the t-distribution inthe context of mixed effects or hierarchical models is
relatively
recent!1,
2,
5,
6, 22-24, 26,
30].
Our
objective
was to assessfrequentist properties
ofBayesian point
estima-tors obtained from mixed linear models where residuals were assumed to be either Gaussian or t-distributed. Milk
production
records obtained in herds inwhich some
preferential
treatment waspractised
were simulated. Theanaly-sis focused on mean
squared
error of estimation ofgenetic variance,
herd-year
effects,
breeding
values andgenetic
response to selection.2. STRUCTURE OF THE SIMULATION
2.1.
Conceptual population
Milk
production
records in ahypothetical
’adult’ MOET nucleus scheme[18]
were simulated. The scheme extended thesimple
hierarchicalmating
structureof Stranden et al.
!21].
Our modification allowed bulls of theprevious generation
to mate current
generation
females. The nucleus consisted of 32 cows andeight
bulls in everygeneration.
In eachgeneration,
every nucleus cowproduced
(by
multiple
ovulation andembryo
transfer torecipients)
eight
offspring,
four females and four males. An animal could be selectedonly
once into the nucleusas a
parent
and unselected animals were culled. The females were selectedamong those
offspring
to the nucleus that hadcompleted
a first lactation. Maleswere selected within those that had been born in the
preceding generation.
InThus,
males within a full-sibfamily
had the same estimatedbreeding
value andthree such males were
randomly
discarded. Each selected male was mated tofour cows, chosen
randomly
from those that had been selected asreplacements.
. 8 1 32 1 .
Selection pressure in males and females
was
32
4
and2g
= -,
respectively,
32 4 128 4
per
generation.
With this scheme carried out for fourgenerations,
the data included 544 cows with records(32
in the baseplus
32 x 4 x 4 = 512 femaleprogeny)
and 32 sires withdaughters
inproduction,
i.e. a total of 576 animals.A
diagram
of the simulatedpopulation
is shown infigure
1.Base
generation
cows wereassigned
to four herds inequal numbers,
i.e.eight
cows per herd. Femaleoffspring
of a cow remained in the same herd as herdam,
whereas sires were used across herds.Breeding
values of baseanimals were drawn at random from
N(0,0.25)
distribution. Records of the base animals weregenerated
by adding
aherd-year
effect(independently,
normally
distributed)
to abreeding
value and to anindependently
drawnresidual from
N(0,0.75)
distribution. Records insubsequent generations
weresimulated
similarly,
except
that thebreeding
value of an individual was formedsegregation
residual,
wherea
2
is
the additivegenetic
variance andF
isthe average
inbreeding
coefficient of theparents.
The selection criterion inthe
breeding
scheme was BLUP ofbreeding
value with the true variancecomponents.
The statistical model includedthe herd-year
as a fixed effect and animal as a random effect(but
ignored
preferential
treatment,
as discussedlater)
using
allgenetic relationships
available up to the time of selection.2.2. Preferential treatment
In
practice,
preferential
treatment takesplace
in the course of a selectionprogramme so this is the way that the
present
simulationproceeded.
None of the basepopulation
cows were treatedpreferentially,
so there were 512 cowseligible
to receivepreferential
treatment. A scheme in which thepreferential
treatment
assigned depended
on the’perceived’
breeding
value of an animal(e.g.
based on agenetic
evaluation available before the animalproduces
therecord)
wasadopted.
The records weregenerated
aswhere yij is the record of
animal j made
inherd-year
i, h
i
is aherd-year
effect,
Uj is the
breeding
value of animalj,
and eij is anindependent
residual. Thepreferential
treatment02!
was a stochastic effecttaking
the values:where
!(.)
is the standard normal cumulative distributionfunction,
pmin is aconstant smaller than the
herd-year
effecth
i
,
and u/j =
!+(u!+v!)
/
afl
+w
is a ’value’ function such that Ui rvA!(0,er!),
- jvN(0,
Q
v),
Cov(u_,,f
j
)
=0,
so Wj rv
N(À, 1).
In thepreceding,
0’
;
is the variance ofbreeding
values andw
j2
is an
’uncertainty’
variance. The ratioO’! 2 describes
theuncertainty
the herd!u
manager has about the true
breeding
value of animalj.
Forexample,
if the breeder is very uncertain about thebreeding
value of theanimal,
this ratio of variances should behigh.
Three values of theuncertainty
wereconsidered,
0’2
11
=
——,
1,
100. The correlation between Wj and thebreeding
value Uj is/ 100
j2
!l/2C1
1 + ;! 2)-1/2
giving 0.995, 0.71
and 0.10 at values of theuncertainty equal
B
!/
to 100’ 1 and
100100,
respectively.
The
preferential
treatment scheme inequation
(2)
induces a correlationbe-tween related animals
Corr(u/j , Wjl)
= a JJ ’10’2
u +
2 Cov( 2 v
J’ VI) J
where
ajj, is thetween related animals
Corr(iu,,w,’) =
—&dquo;—!—!——&dquo; v
, where a,,’ is thea! + a!
additive
relationship
betweenanimals j and
j’.
If the Vj deviates areinde-pendent,
thenCorr(u/j ,
u
/j, )
=a!!!!!!(!u
+o!)’
Forexample, if j and
j’are
full-sibs and
Q
z
=0’
;,
the
breeding
value thehigher
the amountpreferential
treatment and the chance ofreceiving
it.The constant pmin in
equation
(2)
controls the range ofproduction
associatedwith
preferential
treatment. It was setequal
to-5
Qh
where ahis the standarddeviation of
herd-year
effects. These were drawn from a normal distributionwith mean zero and variance
a
2
and two different values of theherd-year
variance were considered:
afl
=au
and2
U
2
The constant A controls theproportion
of cows to bepreferentially
treated. Normal distributiontheory
canbe
used
to find a value of A such that a desiredproportion
of cows receivespreferential
treatment. Theproportion
ofpreferentially
treated cows increaseswith
.!,
becausePr(u/j
>0)
increasesconcomitantly.
Three differentprevalences
of
preferential
treatment were considered: 1 out of10,
1 out of32,
and 1 outof 64 cows. These
correspond
to A values of-1.2816,
-1.8627 and-2.1539,
respectively.
It was intended to
keep
theproportion
ofpreferentially
treated animalsroughly
constant fromgeneration
togeneration.
To do so, it must be noted that selection isexpected
to increase meanbreeding
value and to reducegenetic
variance over time. In order to account for these
effects,
the formula for w waschanged
to:where u is the mean
breeding
value of animals available forpreferential
treatment in the
generation
to whichanimal j belongs,
andSu
is
the additivegenetic
variance for individuals born in thatgeneration.
The
probability
distribution of the amount ofpreferential
treatment(Di!)
depends
on the values of Qh and A as shown in theAppendix.
The averageamount of
preferential
treatmentactually
applied
was assessed via asimula-tion of 1000
replicates
of the MOET scheme. Mean increase(mean
of0)
inproduction
due topreferential
treatment undervarying prevalence
ofprefer-ential treatment and amount of
herd-year
variance is in table I. Asintended,
production
increased withprevalence
ofpreferential
treatment,
and withor h* 2
Average
value ofpreferential
treatment was not affectedby
level ofuncertainty
j
2
2 .
This is not shown in table1,
but it wasexpected
because the distributionQ!
of
A
zj
does notdepend
on this ratio.Table I.
Average
increase in simulated lactationproduction
due topreferential
treatment as a function of
herd-year
variance(a
h
)
and ofprevalence
ofpreferential
treatment(values
inparenthesis
are Monte Carlo standard errors from 1 000replicates
2.3. Statistical models and
computations
Three linear statistical models were
compared,
both with and withoutpref-erential treatment
incorporated
in the simulation. Theobjective
was to assessthe relative
ability
of these models to handleperturbations
causedby
unknownpreferential
treatment. In all threemodels,
the linear structure for the records included an unknownherd-year
effect(treated
as fixedcomputationally),
the unknownbreeding
value of the cow and aresidual,
distributedaccording
to anappropriate
errordistribution,
as noted below. In the threemodels,
amultivari-ate normal distribution
N(0,
Aa!),
where A is a 576 x 576relationship matrix,
was used for the
genetic effects,
so there was no difference in thisrespect.
Thethree
models, differing only
in the error distribution were thefollowing.
1)
G: apurely
Gaussian model with errorsNiid(0,
Q
e ).
2)
t-l: errors wereindependently
andidentically
distributed asunivariate-t,
t
i
(0,
0’ e 2,
v,).
Here,
the variance of the distribution isU2V
,I(V,
e
-
2),
whereQ
e
isa scale
parameter
and v, are the unknowndegrees
of freedom.3)
t-H: within herdi (i
=1,
2, 3,
4),
the error vector eihad themultivariate-t distribution
t
ni
(0,
I
ni
U2
,
)
e
v
where ni is the number of records in herd i.Here,
Var(e
i
)
=I!o!fe/(fe - 2).
Although
the errors areuncorrelated,
they
are notindependent,
thisbeing
aproperty
of the multivariate t-distribution. Errorvectors in different herds were
mutually independent, however,
but with thesame
or and
veparameters.
We refer to this model as a ’herd-clustered’ one.The G model is the usual one; model t-1 discounts outliers yzj on a ’case’
by
’case’basis,
and model t-H discountsoutlying
vectors yz for the entire herd i. Because the ’value function’ w! used togenerate
preferential
treatmentdoes not
depend
on theherd,
there is noapparent
reasonwhy
model t-H shouldoutperform
model t-1. It should be noted that as Ve -j oo, the twot-distributions tend towards the Gaussian one.
A
Bayesian
structure wasadopted
for inference. Prior distributions were the same for all three models. Herd effects wereassigned
a uniformprior and,
asnoted,
a multivariate normal process was used as aprior
distribution for thebreeding
values. Thedispersion
components
Q
u
and Q
e
were
assigned
inde-pendent
scaled invertedchi-square
distributions with fourdegrees
of freedom and meanequal
to the true variancecomponent,
i.e. 0.75 for the residualvari-ance and 0.25 for the
genetic
variance. In thet-models,
theprior
fora
is
for the scale of the distribution and not for the residualvariance,
which isa e
2v
,l(v, -
2)
as noted before. In the two modelsinvolving
thet-distribution,
the residualdegrees
of freedomparameter
v, was considered unknown.Degrees
of freedom values allowed in the herd-clustered t-model were4, 10,
100 or 1000,
allequally likely,
apriori.
In the univariate t-distributionmodel,
the spaceof v
e
was4, 6, 8, 10,
12 or14,
allreceiving equal prior
probability.
These valueswere chosen
arbitrarily.
It ispossible
to use a continuousprior
for v,[23]
butthe discrete distribution
employed
here facilitatedimplementation.
A Gibbssampler
was used to carry out theBayesian computations employing
the full conditional distributions described in Strandén[22].
Tests made in several sim-ulations withvarying starting
values indicated that a burn-inperiod
of 7 000iterates with 70 000 Gibbs iterates thereafter
(all
samples
kept)
wasenough
About 60 min of CPU time were
required
toperform
70 000iterations,
for any of themodels,
in an HP9 000(3)
computer.
2.4.
Frequentist
comparison
Each
replicate
of the simulation consisted of a data setgenerated
as per thescheme in
figure
1 under theappropriate assumptions
ofpreferential
treatment.A
Bayesian analysis
of the data setaccording
to each of the three models wascarried out in each
replicate.
Meansquared
errors ofposterior
mean estimateswere
computed,
overreplicates,
for:a)
genetic
variance,
b)
herd-year effects,
and
c)
breeding
values. Meansquared
errors were alsocomputed
for threeclasses of
breeding
values:sires,
cows who had beenpreferentially
treated and cows withoutpreferential
treatment.d)
An additionalend-point
of interest was meansquared
error of estimated response toselection,
assessedby
predicting
breeding
valuesusing posterior
means from the three models contrasted. ’True’response was the mean difference in true
breeding
value(due
to selectionusing
BLUP)
between animals born in the lastgeneration
and those born in the firstgeneration.
Differences in meansquared
errors between models should reflectthe relative accuracy of estimation of
genetic
trend.A
’pilot
run’[14]
was conducted to assess the number ofreplicates
needed toattain
enough
precision
for aparameter
of interest. Theapproximate
numberof
replications required
to achieve an absoluteprecision
r for the confidenceinterval
given
apilot
run of nreplicates
was foundusing:
where
t
i
-
1
,
1
-
a/2
is the value of a t-distribution with i -1degrees
of freedom atthe
100(1 -
a)
percentile
(’confidence’).
Ourpilot study
consisted ofcarrying
n = 20
replicates
for each of the three models. The number ofreplications
re-quired
to achieve 0.05precision
with 95%
confidence for thegenetic
variancewas less than 60 for most cases.
Hence,
it was decided that all cases wouldbe
replicated
60 times. Absoluteprecision
was recalculated after60 replicates,
and a further 40
replicates
were made for the schemesinvolving
1/10
preva-lence of
preferential
treatment. One scheme92
2 =3, -—
2 = 100 )
required
an(
a2 a2
u u1)
100/
additional 40
replicates
to achieve therequired
precision.
Table II indicates the schemes and number ofreplicates
performed..
Because of its
heavy computing requirements,
theanalysis
wasperformed
using
a network of machines administeredby
Professor MironLivny
of theDepartment
ofComputer
Science, University
of Wisconsin at Madison. This cluster was accessedusing
the Condorsystem,
which allowsrunning jobs
simultaneously
at manycomputers
while the data and program reside in onecomputer.
Eachreplicate
of each model was a process to be executed in this network ofcomputers.
There were between 10 and 15computers
available at3. RESULTS AND DISCUSSION
3.1. Absence of
preferential
treatmentThe
objective
here was to examinepossible
losses inefficiency
due tousing
the two t-distribution models when there is no
preferential
treatment and theGaussian
assumption
holdsthroughout. Averages
and meansquared
errors ofestimates of additive
genetic
variance aregiven
in table III. Theposterior
meansof
afl
for
each of the three models werepractically unbiased,
inlight
of theMonte Carlo variation.
However,
the meansquared
error waslarger
for the twot-models than for the Gaussian one.
Hence,
if the Gaussianassumption
holds,
posterior
means of additivegenetic
variance for the t-models are less accuratethan those from the G-model. The increase in mean
squared
error over theGaussian model was about 5-6
%
for the t-Hmodel,
and 7-18%
for the t-1model.
Tables IV and
Vgive
theposterior
distributions of thedegrees
of freedom for the two t-models in the absence ofpreferential
treatment. Theanalysis
carriedout with the herd-clustered t-model
clearly
favoured a model with Gaussianerrors, as indicated
by
aposterior probability
of about 90%
for thedegrees
of freedom
being
larger
than 10.Also,
the univariate t-modelassigned
thehighest posterior probability,
about 40%,
to thelargest
value of thedegrees
of freedom(v
e
=14)
considered. Theposterior
distributions were notsharp,
this
being
a function of the low informational content the data have aboutv e
.
However,
bothanalyses
favoured thelarger
valuesof v
e
or,equivalently,
theGaussian
assumption
for the errors. Forexample,
in the herd-clusteredt-model,
the
posterior
odds ratio of v, = 1000 relative to v, = 4 was 17.7 and 29:7 for2 =
1 andah
2 =
3,
respectively.
In the univariatet-model,
the odds ratio(
of v, = 14 relative to ve = 4 was 384 and 404 for the two values of the ratio
between herd and additive
genetic
variances.Mean
squared
errors of estimates of locationparameters
were similar inall models
(table VI! ,
although
slightly
smaller for the G-model. Asexpected,
mean
squared
errors werelarger
forbreeding
values of cows(smallest
amount ofIn summary, in the absence of
preferential
treatment and with the Gaussianassumption
holding throughout,
the t-models were less accurate for estimationof
Q!,
but were ascompetitive
as the Gaussian model for estimation ofbreeding
values and of
genetic
trend.3.2.
Preferentially
treated data3.2.1. Additive
genetic
varianceMean
squared
error of estimates of additivegenetic
variance are infigure
2.Differences between models were clearest when
preferential
treatment was moreprevalent
(1/10)
and when theherd-year
variance washigh
(this
affectsthe distribution of
0).
Also,
differences between models werelargest
whenuncertainty
about truebreeding
values waslow,
so the value function is ahigh
correlate ofbreeding
value. There was little difference between the G andthe t-H
models,
but the univariate t-model had the bestperformance
whenprevalence
ofpreferential
treatment was medium(1 out
of 32cows)
orhigh
(1 out
of 10cows).
The univariate t-model was robust to variation in theuncertainty
parameter;
this was not the case for the G and the t-Hmodels,
whose
performance
washampered
under severe forms ofpreferential
treatment.3.2.2. Posterior distribution of the
degrees
of freedomPosterior
probabilities
of thedegrees
of freedom under the herd-clustered t-model were oftenhigher
for thelarger
values of ve, thussupporting
theGaussian
model,
especially
whenpreferential
treatment was uncommon, oruncertainty
washigh. Only
under medium(1/32)
orhigh
(1/10)
prevalence
of
preferential
treatment and ahigh herd-year
variance thelargest
values ofthis
depended
on the level ofuncertainty
and on the amount ofherd-year
variance. Forexample,
whenprevalence
ofpreferential
treatment was1/10
and with(
T2
=3a’,
low values of thedegrees
of freedom hadhigher
posterior
probabilities
whenuncertainty
waslow; however,
asuncertainty increased,
theposterior
distribution tended to favourlarger
values of thedegrees
of freedom. Posteriorprobabilities
of v, for the univariate t-model aregiven
infigure
3.Here,
posterior
distributions tended to be flat.Higher probabilities
wereas-signed
to thelargest
values of thedegrees
of freedomonly
whenpreferential
treatment was rare and the
herd-year
variance low. As in the herd-clusteredt-model,
high
uncertainty
often resulted inhigher probabilities
assigned
to theof freedom values also received
relatively
high
probabilities.
Whenpreferen-tial treatment was
prevalent
(1/10)
and theherd-year
variance waslarge,
theposterior
distribution wassharp,
with a modal value of ve = 4 at all levels ofuncertainty.
Thispoints
away from a Gaussian distribution of the residuals.With a small data set such as the one in this simulated MOET
scheme,
oneshould not
expect
theposterior
distribution of thedegrees
of freedomparam-eter to be
highly peaked. Nevertheless,
the univariate t-modelrecognised
thenon-Gaussian situation even when
prevalence
was rare(1/64),
provided
thatthe variance between herds was
relatively large.
This is because theexpected
value of the
preferential
treatment,E(Di!),
increased withor2 h,
as illustrated in3.2.3. Estimates of
herd-year effects, breeding
values andgenetic
response to selection
Average
of meansquared
error of estimates ofherd-year
effects was similar for the three modelsexcept
whenpreferential
treatment wasprevalent
orherd-year variance was
high,
but it wasalways
smallest for the univariate t-model(figure l!).
Whenpreferential
treatment was common(1/10),
the univariateThe average of mean
squared
error of estimates of allbreeding
values isshown in
figure
5. This criterion was about the same with all modelsexcept
when
preferential
treatment was common and theherd-year
variancehigh.
Here,
whenuncertainty
washigh,
there were no differences between themodels,
but at low levels of
uncertainty,
the univariate t-model wasmarkedly
superior.
The
picture
for meansquared
errors of estimates of sire and cowbreeding
values and for
genetic
response was similar to that for of allbreeding values,
so the
figures
are notpresented.
In all cases, differences between models wereclear, favouring
the univariate t-model whenpreferential
treatment was moreprevalent
(1/10).
The same was true forpreferentially
treated cows(figure 6),
but mean
squared
errors werelarger
than forbreeding
values of cows that wereperformance
than the Gaussian or herd-clustered models whenpreferential
treatment was rare or
mildly prevalent,
but it wassuperior
when such treatmentwas common. In
particular,
at the lowest level ofuncertainty
and at thehighest
herd-year variance,
the univariate t-model gavepredictions
ofbreeding
value ofpreferentially
treated cows that had a meansquared
error of about a third of that observed with the Gaussian model. In thissituation,
the herd-clustered.
4. CONCLUSIONS
In the absence of
preferential
treatment,
the t-models were asgood
as theGaussian model for
estimating
breeding
values and response to selection. Whenpreferential
treatment wasmildly prevalent
(1/32)
the modelsperformed
simi-larly.
However,
whenpreferential
treatment was common(1/10)
andespecially
when the
herd-year
variance waslarge
relative to the additivegenetic
variance,
the univariate t-model was
clearly
thebest,
at least in terms of meansquared
error. Under
preferential
treatment,
theposterior
distribution of thedegrees
of freedom in the univariate t-model
pointed
away from the correctness of the Gaussianassumption.
The univariate t-model wasquite
robust to variation inthe simulation
parameters,
but it is unknown whether this robustness holdsacross different forms of
preferential
treatment.This simulation could not differentiate
clearly
between the Gaussian and the herd-clusteredt-models,
although
the latter wasalways slightly
better underpreferential
treatment. A reason for the lack of difference between these twomodels may be the low number of herds in the simulation. With a few clusters
(herds)
the statistical information about thedegrees
of freedom islow,
so theposterior
distribution of thisparameter
cannot be estimatedaccurately.
In
conclusion,
it appears that the univariate t-model can attenuate adverse effects ofpreferential
treatment asapplied
here. It leads to better inferencesabout
breeding
values andgenetic
trends than those obtained with the Gaussianmodel,
especially
whenpreferential
treatment isprevalent,
at least under the conditions of thestudy. If,
on the otherhand, preferential
treatment isnon-existent,
or theassumption
of a Gaussian distribution of the residuals seemsto be
true,
there is little loss inefficiency
fromusing
a robustmodel,
such as the univariate t. It isencouraging
that asymmetric
errordistribution,
such as Student
t,
improved
upon the Gaussian one under asingle-tailed
form of
preferential
treatment as inequation
(2).
Thissuggests
that a robustasymmetric
distribution may do evenbetter,
butperhaps
at the expense ofconceptual
andcomputational simplicity.
ACKNOWLEDGEMENT
We wish to thank W.G.
Hill, University
ofEdinburgh,
for some useful comments.REFERENCES
[1]
AlbertJ.H., Chib S., Bayesian analysis
ofbinary
andpolychotomous
responsedata,
J. Am. Stat. Assoc. 88(1993)
669-679.[2]
Besag
J., GreenP.,
Higdon D., Mengersen K., Bayesian computation
and stochastic systems, Stat. Sci. 10(1995)
3-66.[3]
BurnsideE.B., Meyer
K., Potentialimpact
of bovine somatotropin ondairy
sire evaluation, J.Dairy
Sci. 71(1988)
2210-2219.[4]
Geweke J.,Bayesian
treatment of theindependent
Student-t linear model,J.
Appl.
Econometrics 8(1993)
S19-S40.[5]
Gianola D., Stranden I.,Foulley
J.L., Modelos lineales con distribuciones t:potencial
engenetica quantitativa, Actas,
5ta ConferenciaEspanola
de Biometria,[6]
Gianola D., Sorensen D., A mixed effects threshold model with a tdistri-bution,
47th AnnualMeeting
of theEuropean
Association for Animal Production,Lillehammer, Norway,
1996, 15 p.[7]
HarbersA.G.F.,
LohuisM.M.,
DekkersJ.C.M.,
Correction forpreferential
treatment of MOET families
by including
an environmental correlation ingenetic
evaluation,
in:Proceedings
of the 5th WorldCongress
on GeneticsApplied
toLivestock
Production, Guelph, Canada,
1994, vol. 17, pp. 11-14.[8]
HendersonC.R., Applications
of Linear Models in AnimalBreeding, University
ofGuelph, Guelph,
Ontario,Canada,
1984.[9]
KuhnM.T.,
Boettcher P.J., FreemanA.E.,
Potential biases inpredicted
transmitting
abilities of females frompreferential
treatment, J.Dairy
Sci. 77(1994)
2428-2437.[10]
Kuhn M.T., FreemanA.E.,
Biases inpredicted transmitting
abilities of sireswhen
daughters
receivepreferential
treatment, J.Dairy
Sci. 78(1995)
2067-2072.[11]
KuhnM.T.,
Freeman A.E., Power transformations forreducing
bias in ge-netic evaluation causedby preferential
treatment, J.Dairy
Sci.(Abstr.) (1996)
suppl.
1, 143.[12]
Lange K.L.,
LittleR.J.A., Taylor J.M.G.,
Robust statisticalmodeling using
the tdistribution,
J. Am. Stat. Assoc. 84(1989)
881-896.[13]
Lange K.,
SinsheimerJ.S.,
Normal/Independent
distributions and theirap-plications
in robustregression,
J.Comp. Graph.
Stat. 2(1993)
175-198.[14]
LawA.M.,
KeltonW.D.,
SimulationModeling
andAnalysis, McGraw-Hill,
New
York,
1982.[15]
LidauerM., Mdntysaari E.A.,
Detection of bias in animal modelpedigree
indices of
heifers, Agric.
Food Sci. Finland 5(1996)
387-397.[16]
Mdntysaari E.A., Sillanpaa
M., Bias inpedigree
indices ofdairy
bulls: Shouldthe management group effects be fixed and should we use smaller
heritability?,
44 th Annual
Meeting
of theEuropean
Association for AnimalProduction, Aarhus,
Denmark,
1993, AbstractsI,
pp. 236-237.[17]
Murphy P.A., Everett
R.W., Van VleckL.D., Comparison
of first lactationsand all lactations of dams to
predict
sons’ milkevaluation,
J.Dairy
Sci. 65(1982)
1999-2005.
[18]
NicholasF.W.,
SmithC.,
Increased rates ofgenetic change
indairy
cattleby
embryo
transfer andsplitting,
Anim. Prod. 36(1983)
341-353.[19]
Powell R.L., Norman H.D.,Accuracy
of cow indexesaccording
torepeata-bility,
evaluation, herdyield,
and registry status, J.Dairy
Sci. 71(1988)
2232-2240.[20]
Rothschild M.F.,Douglass L.W.,
Powell R.L., Prediction of son’s modifiedcontemporary
comparison
frompedigree information,
J.Dairy
Sci. 64(1981)
331-341.[21]
Stranden L, Mdki-TanilaA., Mdntysaari
E.A., Genetic progress and rate ofinbreeding
in a closed adult MOET nucleus under different matingstrategies
andheritabilities,
J. Anim. Breed. Genetics 108(1991)
401-411.[22]
StrandenL,
Robust mixed effects linear models with t distributions andapplication
todairy
cattlebreeding,
Ph.D. thesis,University
of Wisconsin, Madison,1996.
[23]
Stranden I., GianolaD., Gaussian
versus Student-t mixed effects linearmod-els for milk
yield
inAyrshire cattle,
48th AnnualMeeting
of theEuropean
Association for AnimalProduction, Vienna, 1997,
Abstracts 1, pp. 262-263.[24]
Stranden I, Gianola D., Inferences about variance components in theunivari-ate mixed linear t model
using Laplacian-t approximations,
in:Proceedings
of the 6th WorldCongress
on GeneticsApplied
to Livestock Production,Armidale, Australia,
1998, vol.
25, pp. 537-540.[25]
SutradharB.C.,
Ali M.M., Estimation of the parameters of aregression
model with a multivariate t error
variable,
Comm. Stat.Theory
Meth. 15(1986)
[26]
Tempelman R.J.,
Firat M.Z.,Beyond
the linear mixed model:perceived
versus real
benefits, Proceedings
of the 6th WorldCongress
on GeneticsApplied
to Livestock Production, Armidale,Australia,
1998, vol. 25, pp. 605-612.[27]
UimariP., Mantysaari E.A., Repeatability
and bias of estimatedbreeding
values for
dairy
bulls and bull dams calculated from animal model evaluations, Anim. Prod. 57(1993)
175-182.[28]
UimariP., Mantysaari E.A., Relationship
between bull dam herd character-istics and bias in estimatedbreeding
values of bull,Agric.
Sci. Finland 4(1995)
463-472.[29]
Weigel D.J.,
PearsonR.E.,
HoescheleI., Impact
of differentstrategies
andamounts of
preferential
treatment on various methods of bull-damselection,
J.Dairy
Sci. 77
(1994)
3163-3173.[30]
WestM.,
Outlier models and prior distributions inBayesian
linearregression,
J.
Roy.
Statist. Soc. B 46(1984)
431-439.[31]
ZellnerA., Bayesian
andnon-Bayesian analysis
of theregression
model withmultivariate Student-t error terms, J. Am. Stat. Assoc. 71
(1976)
400-405.APPENDIX: Distribution of the
preferential
treatment variable When w! ispositive
and verylarge,
A
ij
tendsto hi - pm,n,
so in this caseequation
(1)
becomes:When w! is
negative,
6.
ij
=0,
as indicated in(1)
so yij= h
2
+ uj + e!.Hence,
given h
i
,
the range inproduction
records due topreferential
treatmentis
expected
to behi -
pmin= hi
+Qh
5
=(
Z
i
+5)Œ
h
where zi NN(0, 1).
Unconditionally,
theexpected
range is then5o
h
.
For6.
ij
defined inequation
(2),
the averagepreferential
treatmentapplied, conditionally
onh
i
would bewhere
0
,
B
(.)
is normaldensity
with mean A and variance 1. For A =0,
OA
(-)
is the standard normal
density
0(.).
BecauseI
f
z
00
4)(w)o(w)dw
=24)’(z) I
and!(0)
=
1,
it follows that 2 soE(A
jj
)
=E(E(02!!hi)) _ -!Jmin·
Likewise,
E(!7jlhi)
=
24
(h -Pmin)
2
for A = 0.Thus,
!
With pm;&dquo; _