HAL Id: hal-00894252
https://hal.archives-ouvertes.fr/hal-00894252
Submitted on 1 Jan 1999
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Mixed effects linear models with t-distributions for quantitative genetic analysis: a Bayesian approach
Ismo Strandén, Daniel Gianola
To cite this version:
Ismo Strandén, Daniel Gianola. Mixed effects linear models with t-distributions for quantitative
genetic analysis: a Bayesian approach. Genetics Selection Evolution, BioMed Central, 1999, 31 (1),
pp.25-42. �hal-00894252�
Original article
Mixed effects linear models
with t-distributions for quantitative genetic analysis: a Bayesian approach
Ismo Strandén Daniel Gianola
a
Department
of AnimalSciences, University
of Wisconsin,Madison,
WI 53706, USA
b
Animal
Production Research,Agricultural
Research Centre -MTT,
31600
Jokioinen,
Finland(Received
21July 1998; accepted
27 November1998)
Abstract - A
Bayesian approach
for inferences about parameters of mixed effects linear models with t-distributions ispresented,
withemphasis
on quantitativegenetic applications.
Theimplementation
is via the Gibbssampler.
Data from a simulatedmultiple
ovulation andembryo
transfer scheme indairy
cattlebreeding
with non-random
preferential
treatment of some cows is used to illustrate theprocedures.
Extensions of the model are discussed.
© Inra/Elsevier,
Parismixed effects models
/
Bayesian inference/
robust estimation/
Gibbs sampling /Student’s t-distribution
Résumé - Modèles linéaires mixtes avec distributions de Student en génétique quantitative : approche bayésienne. On
présente
uneapproche bayésienne
envue de l’inférence concernant les
paramètres
de modèles linéaires mixtes avec des distributions deStudent,
en mettant l’accent sur lesapplications
engénétique quantitative. L’application
s’effectuegrâce
àl’échantillonnage
de Gibbs. Des données provenant d’un schéma de sélection simulé utilisant le transfertembryonnaire
chezles bovins laitiers en
présence
d’un traitementpréférentiel
dequelques
vaches sontutilisées pour illustrer les
procédures.
Les extensions du modèle sont discutées.© Inra/Elsevier,
Parismodèle mixte / inférence bayésienne / estimation robuste
/
échantillonnage deGibbs
/
distribution t de Student*
Correspondence
and reprints E-mail: ismo.stranden@mtt.fi1. INTRODUCTION
Mixed effects linear models are used
widely
in animal andplant breeding
and in
evolutionary genetics [27].
Theirapplication
to animalbreeding
waspioneered by
Henderson[17, 19-21], primarily
from thepoint
of view ofmaking
inferences about candidates forgenetic
selectionby
best linear unbiasedprediction (BLUP).
Because BLUP relies onknowledge
of thedispersion structure,
estimation of variance and covariancecomponents
is central inpractical implementation [14, 18, 29, 32]. Typically,
thedispersion
structureis estimated
using
a likelihood-based methodand, then,
inferencesproceed
asif these estimates were the true values
(e.g. [8]). Although normality
is notrequired by BLUP,
it isprecisely
whennormality
holds that it can be viewedas an
approximation
to the bestpredictor [4, 8, 12, 19].
Morerecently, Bayesian
methods have been advocated for the
analysis
ofquantitative genetic
data withmixed linear models
[8, 9, 34, 39, 40],
and theBayesian
solutionssuggested employ
Gaussiansampling
models as well as normalpriors
for the random effects.It is of
practical interest, therefore,
tostudy
statistical models that are less sensitive than Gaussian ones todepartures
fromassumptions.
Forexample,
itis known in
dairy
cattlebreeding
that more valuable cows receivepreferential treatment,
and to the extent that such treatment cannot be accommodated in themodel,
this leads to bias in theprediction
ofbreeding
values[23, 24].
Another source of bias in inferences is an incorrectspecification
of theinheritance mechanism in the model. It is often
postulated
that thegenotypic
value for a
quantitative
trait is the result of the additive action of alleles ata
practically
infinite number of unlinked lociand, thus, normality
results[4].
This
assumption
is refuted in an obvious manner wheninbreeding depression
is
observed,
or when unknown genes ofmajor
effect aresegregating. However,
in the absence of
clearly contradictory evidence, normality
is apractical assumption
tomake,
as then themachinery
of mixed effects linear modelscan be
exploited.
An
appealing
alternative is to fit linear models with robust distributions for the errors and for the random effects. One of such distributions is Student’st,
both in its univariate and multivariate forms. Several authors[2, 7, 26, 37, 38, 41, 42]
have studied linear and non-linearregression problems
with Student’st-distributions,
but there is ascarcity
of literature on random effects models.West
[41]
described a one-way random effectslayout
with t-distributed errorsand a
heavy
tailedprior
for the random effects.Assuming
that the ratio betweenresidual variance and the variance of the random effects was
known,
he showedthat this model could discount effects of outliers on inferences. Pinheiro et al.
[30]
described a robust version of the Gaussian mixed effects model of Laird and Ware[25]
and used maximum likelihood.They hypothesized
thatthe distribution of the residuals had the same
degrees
of freedom as that of the randomeffects, and, also,
that random effects wereindependently
distributed.The first
assumption
is unrealistic as it is hard toaccept why
two differentrandom processes
(the
distributions of random effects and of theresiduals)
should be
governed by
the samedegrees
of freedomparameter.
The secondassumption
is not tenable ingenetics
because randomgenetic
effects of relatives may be correlated.In
quantitative genetics
the random effects or functions thereof are of central interest. Forexample,
in animalbreeding
programs theobjective
is to increasea linear or non-linear merit function of
genetic
valueswhich, ideally,
takes intoaccount the economics of
production [16, 28, 33]. Here,
it would seem naturalto consider the conditional distribution of the random effects
given
thedata,
todraw inferences. There are two difficulties with this
suggestion. First,
it is notalways possible
to construct this conditional distribution. Forexample,
if therandom effects and the errors have
independent t-distributions,
the conditional distribution of interest is unknown.Second,
this conditional distribution would notincorporate
theuncertainty
about theparameters,
a well-knownproblem
inanimal
breeding,
which does not have asimple frequentist
or likelihood-based solution(e.g. [10, 15]).
If,
on the otherhand,
theparameters (the
fixed effects and the variancecomponents)
are ofprimary interest,
the method of maximum likelihood hassome
important
drawbacks. Inferences are validasymptotically only,
underregularity conditions,
and finitesample
results for mixed effects models arenot
available,
which isparticularly
true for a model with t-distributions. Inaddition,
somegenetic
modelsimpose
constraints such that theparameter
spacedepends
on theparameters themselves,
so it would be naive toapply
a
regular asymptotic theory.
Forexample,
with apaternal
half-sibfamily
structure
[6],
the variance between families is bounded between 0 and one-third of the variance within families.Moreover,
maximum likelihood estimation in themulti-parameter
case has the notoriousdeficiency
of notaccounting
wellfor nuisance
parameters [3, 8, 13].
A
Bayesian approach
fordrawing
inferences about fixed and randomeffects,
and about variance
components
of mixed linear models with t-distributed ran-dom and residual terms is described here. Section 2
presents
theprobability model, emphasizing
a structure suitable foranalysis
ofquantitative genetic
data. Section 3
gives
a Markov chain Monte Carloimplementation.
ABayesian analysis
of a simulated animalbreeding
data set ispresented
in section 4. Poten-tial
applications
andsuggestions
for additional research are in theconcluding
section of the paper.
2. THE UNIVARIATE MIXED EFFECTS LINEAR MODEL
2.1.
Sampling
model and likelihood function Consider the univariate linear modelwhere y is an n x 1 vector of
observations;
X is aknown,
fullrank,
incidencematrix of order n x p for ’fixed’
effects;
b is a p x 1 vector of unknown ’fixed’effects;
Z is a known incidence matrix of order n x q for additivegenetic
ef-fects;
u is a q x 1 vector of unknown additivegenetic
effects(random)
and e isan n x 1 vector of random residual effects.
Although only
asingle
set of ran-dom effects is
considered,
the model andsubsequent
results can be extended in astraightforward
manner. It is assumed that u and e are distributed inde-pendently. Suppose
the data vector can bepartitioned according
to ’clusters’induced
by
a commonfactor,
such as herd orherd-year
season ofcalving
in acattle
breeding
context. The model can then bepresented
as:where m is the number of ’clusters’
(e.g. herds).
Here yi is the data vectorfor cluster
i (i
=1, 2, ... , m), X i
andZ i
and are thecorresponding
incidencematrices and ei is the residual vector
pertaining
to yi.Observations in each cluster will be modeled
using
a multivariate t- distribution suchthat, given
b and u, data in the same herd are uncorrelated but notindependent,
whereas records in different clusters are(conditionally)
independent.
Letyi !b, u, 62 N t ni (X i b
+Z i u, , 1,,, o, e 2 v e ),
where ni is the num-ber of observations in cluster
i (i = 1, 2, ... , m), or’ is
a scaleparameter
andv. is the
degrees
of freedom. If ni = 1 for alli,
thesampling
model becomes univariate t. The conditionaldensity
of allobservations, given
theparameters,
is
Although
the m distributions have the same v, andQ e parameters,
theseare not identical. In
particular,
note thatE(y2 !b, u, Qe, ve) X i b
+Z i u,
andVar(y2!b, u, Qe, ve)
=hzv!./(v! - 2),
i =1,2,...,m,
so the mean vector ispeculiar
to each cluster.Homoscedasticity
isassumed,
but this restriction canbe lifted without
difficulty.
When each cluster contains asingle observation,
the error distribution is the
independent
t-modelof Lange
et al.[26]; then,
theobservations are
conditionally independent.
When all observations areput
in asingle cluster,
the multivariate t-model of Zellner[42] results;
in this case, thedegrees
of freedom cannot be estimated.Each of the m terms in
equation (3)
can be obtained from the mixture of the normal distribution:with the
mixing
processbeing:
where
xv e
is achi-squared
random variable on Vedegrees
of freedom[26, 38,
41,42].
2.2.
Bayesian
structureFormally,
both b and u are locationparameters
of the conditional distribu- tion inequation (3).
The distinction between ’fixed’ and ’random’ isfrequentist,
but from a
Bayesian perspective
itcorresponds
to a situation where there is adifferential amount of
prior
information on b and u[8, 13].
Inparticular,
theBayesian counterpart
of a ’fixed’ effect is obtainedby assigning
a flatprior
tob,
so that theprior density
of this vector would be:in
Rp.
This distribution isimproper,
but lower and upper limits can beassigned
to each of the elements ofb,
as in Sorensen et al.[34],
to makeit proper. The
prior
distribution of additivegenetic
values u will be taken to be a multivariatet-distribution,
andindependent
of that of b. From aquantitative genetics point
of view this can beinterpreted
as anadditive,
multivariate normal model(as
in[4]),
but with arandomly varying
additivegenetic
variance. Because the multivariate t-distribution has thicker tails than thenormal,
theproposed
model isexpected
to be somewhat bufferedagainst departures
from theassumptions
made in an additivegenetic
effectsmodel,
so
’genetic
outliers’stemming
fromnonadditivity
or frommajor
genesbecome, perhaps,
less influential in the overallanalysis.
Allproperties
of the multivariate normal distribution arepreserved,
e.g. anyvector
or scalar valued linear combination of additivegenetic
values has a multivariatet-distribution,
themarginal
distributions of all terms in u aret,
and all conditional distributionsare t as well. In
particular,
if the additivegenetic
values ofparents
and thesegregation
residual of anoffspring
arejointly
distributed as multivariatet,
the additivegenetic
value of theoffspring
has a univariate t-distribution with thesame
degrees
of freedom. Thisimplies
that thecoancestry properties
of theusual Gaussian model are
preserved.
We then take asprior
distribution:with
density
Above,
q is the number of individuals included in u(some
of which may not havedata),
A is a known matrix of additiverelationships, au is
a scaleparameter
andv
uis the
degrees
of freedomparameter. Hence, Var(ulo, 2, v!)
=A<r!tt/(ftt !2),
which reduces to the variance-covariance matrix of additive
genetic
values ofa Gaussian model when vu -! oo.
The scale
parameters or2 and or2
are taken to haveindependent
scaledinverted
chi-square distributions,
with densities:respectively,
foraf >
0 andor >
0.Here, T’e ( Tu )
is astrictly positive ’degree
of belief’
parameter,
andT e (T u )
can bethought
of as aprior
value of the scaleparameter.
These distributions have finite means and variances whenever theT
parameters
arelarger
than 2 and4, respectively.
In animal
breeding research,
it is commonpractice
toassign improper
flatpriors
to the variancecomponents
of a Gaussian linear model[8, 9, 39].
Ifuniform
priors
are to beused,
it is advisable to restrict the range of valuesthey
can
take,
to avoidimpropriety (often
difficult torecognize,
see[22]). Here,
onecan take
Typically,
the lower bounds are set to zero, whereas the upper bounds canbe elicited from mechanistic
considerations,
or set uparbitrarily.
Prior distributions for the
degrees
of freedom can be discrete as in Albert and Chib[1]
andBesag
et al.[2],
or continuous as in Geweke[7],
with thejoint prior density
taken asp(ve,v!)
=p(v e )p(v u ).
In the discretesetting,
letfj,
j
=1, 2, ... , d e ,
and wk, k =1, 2, ... , d!,
be sets of states for the residualand
genetic
valuesdegrees
offreedom, respectively.
Theindependent prior
distributions are:
Because a multivariate t-distribution is
assigned
to the whole vector u, there is no information contained in the data about v,,.Therefore, equation (11)
isrecovered in the
posterior analysis.
There are at least twopossibilities
here:1)
toassign arbitrary
values to Vu and examine how variation in these values affectsinferences,
or2)
to create clusters ofgenetic
valuesby,
e.g. half-sib orfull-sib
families,
and then assume that clusters aremutually independent
butwith common
degrees
of freedom. Here the v.parameter
would beestimable,
but at the expense of
ignoring genetic relationships
other than those from half-sib or full-sib structures. Alternative(2)
may be suitable fordairy
cattlebreeding (where
most of therelationships
are due tosires)
or humans(where
most families are
nuclear).
A third alternative would be to use(2),
then findthe mode of the
posterior
distribution of vu, and then use(1)
as if this modewere the true value. In the
following derivation,
weadopt option (1).
The
joint prior density
of all unknowns is then:with obvious modifications if
equations (8)
and(9)
are used instead ofequations
(6)
and(7).
Thejoint posterior density
is foundby combining
likelihoodequation (3)
andappropriate priors
inequations (4)-(11),
to obtain:where b E
!p,
u EK,j,o! > 0, Q!
> 0 and E evf f j , j
=1, 2, ... , ,de} if
a discreteprior
isemployed.
Thehyper-parameters
are 7e,TM!u T , T e
and vu because weassume
this
last one to be known.Hereafter,
we suppress thedependency
onthe
hyper-parameters
in the notation.3. THE GIBBS SAMPLING SCHEME
A Markov chain Monte Carlo method such as Gibbs
sampling
is facilitatedusing
anaugmented posterior
distribution that results from mixture models.The t-distribution within each cluster in
equation (3)
is viewed asstemming
from the mixture processes noted earlier.
Likewise,
the t-distribution in equa- tion(5)
can be arrived atby mixing
theulA, or 2,s2 - N(O, A U 2 / 82 )
process withs2 w u N X2. Iv,,.
Theaugmented joint posterior density
ism
where s, =
(se l , ... , sP m )
! and N =! n2. Integration
ofequation (14)
withi=i
respect to Se and
s2 yields equation (13),
so theseposteriors
are’equivalent’.
There is a connection here with the
heterogeneous
variance models for animalbreeding given,
e.g. in Gianola et al.[11]
and in San Cristobal et al.[31].
Theseauthors
partitioned breeding
values and residuals into clusters aswell,
eachcluster
having
aspecific
variance that varied at randomaccording
to a scaleinverted
chi-square
distribution with knownparameters.
The full conditional distributions
required
to instrument a Gibbssampler
arederived from
equation (14).
Resultsgiven
inWang
et al.[40]
are used. Denote C =!c2!!, i, j
=1, 2,...,p + q, ,
and r ={r j , i,j
=1, 2,...,p + q to
be the coefficient matrix andright-hand
side of Henderson’s mixed modelequations, respectively,
wherep + is
the number of unknowns(fixed
and randomeffects),
given
thedispersion components
Se,sfl and
the scaleparameters Q e and or 2 u
The mixed model
equations
are:best linear unbiased estimator
(BLUE)
ofb,
and u is the best linear unbiasedpredictor (BLUP)
of u.Collect the fixed and random effects into a’ =
(b’, u’) _ (a,,
a2, ... ,ap + q) .
Let
a’
i =(a l , a 2 , ... ,
, 1-iaai+l, ... ,ap + q).
The conditionalposterior
distribu-tion of each of the elements of a is
/
P+9B
where
4 i
=c- 1
r! -y! c,j’aj L i, j
=1, 2, ... ,
p + q. This extends to blocksB
j-1 /B j$i
/
of elements of a in a natural way. If ai is a sub-vector of a, the conditional distribution of ai
given everything else,
is multivariate normal with meanai
=Ci i l
ri -E Cija i
forappropriate
definitions ofC ij ,
ri and a! asB B
j-1
j54, // matrices and vectors.The conditional
posterior density
of each of these
is in the form of a gammadensity
where s, _, is s, without
S ; i’ Equivalently,
, &dquo;e e /
Similarly,
the conditionalposterior density
ofs!
also has the gammadensity
form
The conditional
posterior
distribution ofQ e is
a scaled invertedchi-square
distribution with form
If a bounded uniform distribution is used as
prior
forQ e,
its conditionalposterior
is the truncated distribution:The conditional
posterior density
ofou is:
When a bounded uniform distribution is used as a
prior
for thegenetic variance,
we haveThe conditional
posterior
distributions of thedegrees
of freedomparameter
v
e
depends
on whether it is handled as discrete or continuous. If the discreteprior
distribution(10)
isadopted,
one hasand v, E
{f j ,j = 1, 2, ... , d e }. If,
on the otherhand,
v, isassigned
a continuousdistribution with
density p(v e ),
e.g. anexponential
one[7],
the conditionalposterior density
can be written up toproportionality only,
and its kernel isequation (26) (except C e )
timesp(v e ). Here,
arejection envelope
or aMetropolis-Hastings algorithm
can be constructed to drawsamples
from theposterior
distribution of ve.The Gibbs
sampler
iteratesthrough: 1)
p + q univariate distributions as inequation (16) (or
a smaller number of multivariate normal distributions whenimplemented
in a blockedform,
tospeed-up mixing)
for the ’fixed’ and randomeffects; 2)
m gamma distributions as inequation (18)
for thesf l parameters.
If a univariate
t-sampling
model isadopted,
m =N,
thetotal
number ofobservations; 3)
a gamma distribution as inequation (19)
forsu; 4)
a scaleinverted
chi-square
distribution as inequation (20)
or(21)
foro, 2 5)
a scaleinverted
chi-square
distribution as inequation (24)
or(25)
forQ e and 6)
adiscrete distribution as in
equation (26)
for thedegrees
of freedomparameters
(or implementing
thecorresponding step
if v, is taken ascontinuous).
A
possible
variation of the model is when theprior
for thegenetic
valuesis the Gaussian distribution u N
Nq(0, A!u),
instead of the multivariate t-genetic
distribution(5). Here,
there will not be a variables2 in
themodel,
so the Gibbs
sampler
does not visitequation (19). However,
the conditionalposterior
distribution(20)
and(21)
remain in the sameform,
but withS2 set equal
to 1.4. AN ANIMAL BREEDING APPLICATION 4.1. Simulation of the data
Preferential treatment of valuable cows is an
important problem
indairy
cattle
breeding.
To the extent that such treatment is not coded in nationalmilk
recording
schemes used forgenetic
evaluation ofanimals,
the statistical modelsemployed
for this purpose wouldprobably
lead to biased evaluations.A robust
model,
with a distribution such asequation (3)
fordescribing
thesampling
process mayimprove
inferences aboutbreeding values,
as shownby
Stranden and Gianola
[36].
In order to illustrate the
developments
in this paper, a simulation wasconducted. Full details are in the work of Stranden and Gianola
[36],
soonly
theessentials are
given.
Milkproduction
records from cows in amultiple
ovulation andembryo
transfer(MOET)
scheme weregenerated.
The nucleus consisted ofeight
bulls and 32 cows from four herds. In eachgeneration,
every cowproduced
four females and one male
(by
MOET torecipients)
that were available for selection aspotential replacements.
The data were from fourgenerations
ofselection for milk
yield using
BLUP of additivegenetic
values. Therelationship
matrix A in
equation (5)
was of order 576 x 576. The milkyields
of each cowwere simulated:
where y2! is the record of
cow j
made inherd-year
i(i
=1, 2, 3, 4), h i
is aherd-year effect,
Uj is the additivegenetic
value ofcow j ( j
=1, 2, ... , 544),
and eij is an
independent
residual. Theindependent input
distributions wereh i
- N(0, 3/4),
uj -N(0, 1/4)
and eij rvN(0, 3/4).
Thepreferential
treatmentvariable
Di!
takes values:where
<1>(.)
is the standard normal cumulative distributionfunction,
pmin =- 5 (Yh (ah
is the standard deviation ofherd-year effects)
is a constant smallerthan the
herd-year
effecth i and w j =
A +(u j
+Vj ) QW
is a ’value’function where the
independent
deviate is vj -N(0, afl),
so wj -N(!, 1).
The ratio
!2 describes (Y2
theuncertainty
a herd manager has about the true (Yubreeding
value ofcow j:
when the breeder is very uncertain about the additivegenetic
value of theanimal,
this ratio of variances should behigh. Here,
we(Y2
1took
!2 z = - ,
to illustrate a best case scenario for the robust models. The (Y u 100correlation between w, and u, is C1 +
2 J = 0.995. The constant A was
correlation between Wj and Uj is
1 <!7
= 0.995. The constant À wasset
to -1.2816,
such that about one out of ten cows would receivepreferential
treatment
(non-null
value of02!).
In the simulated dataset,
58 of 544 cowswere
preferentially
treated.4.2. Statistical models and
computations
Three statistical
models, differing only
in the errordistributions,
werecompared using
the simulated records. These models were:1)
G: apurely
Gaussian
model, 2)
t-H: a multivariate t-modelusing
herds as clusters and3)
t-1: a model withindependent
univariate t-distributions for the residuals.The
analytical
model wasequation (27)
withoutA,
thisbeing representative
of linear models usedcurrently
forgenetic
evaluations of first lactation cowsin the
dairy industry.
In all threemodels,
the multivariate normal distributionulo,2 - N(0, A C ,2)
was assumed for thegenetic effects;
this is the standardassumption
indairy
cattlebreeding.
Prior distributions of
parameters
were the same for all three models. Herd effects wereassigned
a uniformprior,
and the multivariate normaldensity
stated earlier was used as a
prior
distribution for thegenetic
values. The variancea £
of this normal distribution and the scaleparameter af
of the t-distributions were
assigned independent
scaled invertedchi-square
distributions with densities as inequations (6)
and(7); hyper-parameters
were Te = Tu=4, T
e
=1/8
andT!
=3/8 .
In the twosampling
modelsinvolving t-distributions,
the residual
degrees
of freedomparameter
was considered unknown.Degrees
offreedom states allowed in the herd-clustered t-model were
4, 10,
100 or 1000,
allequally likely,
apriori.
In the univariatet-model,
thedegrees
of freedomwere
4, 6, 8, 10,
12 or14,
allreceiving equal prior probability.
These valueswere chosen
arbitrarily,
for illustration purposes.Posterior distributions were estimated for the
following parameters:
residualdegrees
offreedom,
scale of thecorresponding t-distribution,
andbreeding
values of a
preferentially
treated cow, hersire,
dam and a full-sister. A Gibbssampler
was constructed to draw from theappropriate
conditionalposterior
distributions described in the
preceding
section of the paper. Burn-in(the period
before actualsampling begin)
was 7 000 iterates followedby
1000 000additional Gibbs
cycles.
Fordensity
estimationonly, using
a Rao-Blackwell estimator[5], samples
were retained every 200thiteration,
thusgiving
asample
of 5 000. Posterior means were estimated
using
the one millionsamples
for eachparameter.
5. RESULTS
In the herd-clustered
t-model,
the estimatedposterior
distribution of thedegrees
of freedom values wasProb(v,
=41y)
=0.07, Prob(v e
=101y)
=0.18, Prob(v,
=1001y)
= 0.43 andProb(v e
=1 OOOly)
=0.33,
afterrounding.
Corresponding
values for the univariate t-model were such thatProb( Ve 2:: 61y)
was less than
0.02,
and the mode of this distribution was vf, = 4.Although
the
posterior
distribution in the herd-clustered t-model was notsharp (there
were
only
fourclusters),
the two sets of resultspoint
away from the Gaussianassumption
for theresiduals,
andclearly
so in the t-1 model.Means of the
posterior
distributions of thedispersion parameters
aregiven
in table
1;
forcomparison
purposes, restricted maximum likelihood(REML)
estimates of
Q e and er! obtained
with a Gaussian model arepresented.
TheREML estimates did not differ very much from
posterior
means obtained with the Gaussianmodel,
and both sets of estimates were away from the true val-ues. The univariate t-model was the
closest, although
the residual scale pa- rameter is notdirectly comparable
with the residual variance of the Gaus- sian model. The estimatedposterior
densities of thegenetic
variance and residual scaleparameter
are shown infigure 1;
these werereasonably
sym- metric and unimodal. In the Gaussianmodel,
theposterior
distributions ofthe residual and of the
genetic
variance did not include the true values(0.75
and
0.25, respectively)
at anappreciable density, illustrating
aninability
tocope with the ’contamination’ created
by
thepreferential
treatment simulated.In the
t-models,
the residual scaleparameters
cannot becompared directly
with the residual variance of the Gaussian
model,
as in the former modelsVar(e ij
)
=O r 2 V,/( V , - 2).
Theposterior
means ofVar(e ij )
were 1.3742 in theherd-clustered
t-model,
and 1.7402 in the t-I process. The extraneous variationproduced by preferential
treatment is allocateddifferentially, depending
on themodel,
to the causalcomponents.
Theexpectation
is that in a univariate t-model,
ahigher proportion
of such variation would becaptured by
the residualcomponent
than in a model under Gaussianassumptions.
The
posterior
distributions ofdispersion parameters
in the two t-modelswere
markedly
different. In the case of the herd-clusteredmodel,
theposterior
distribution of
Q u resembled
that of the Gaussian model. For the residual scaleparameter,
theposterior
distribution was muchsharper
in the t-1 than in thet-H models. Two
possible
reasons for such differences are:1)
there were fourherds
only,
so clusterparameters (v e , afl)
were estimatedimprecisely;
or2)
thetwo models used
drastically
different states for the values of ve.Posterior means and
posterior
mode estimates of thebreeding
value of someanimals are
given
in table II. Anupward
’bias’ can be seen in the estimatedbreeding
value of thepreferentially
treated cow and itssire,
for both the Gaussian and the herd-clustered t-models. The univariate t-model gave anunderestimate of the true
breeding
value. Thebreeding
values of the dam and of a full-sib of thepreferentially
treated cow were overestimatedby
allmodels. Posterior
density
estimates of thebreeding
values of the four selectedindividuals,
for allmodels,
are shown infigure
2. The Gaussian and herd- clustered t-models gave similardensity estimates,
both in terms of location andspread.
Posteriordensity
estimates for the univariate t-model weresharper.
The true
breeding
value of the four individuals hadappreciable density
ineach of the
posterior
distributions. Mean ’biases’(mean squared errors)
ofthe
posterior
means ofbreeding
values over all animals were 0.52(0.71),
0.51(0.69)
and 0.14(0.17)
for theGaussian,
herd-clusteredt,
and the univariatet-models, respectively. Hence,
the individual t-modelperformed
better thanthe
competing
models in this data set. These results are consistent with thosereported by
Stranden and Gianola[36]
in a morecomprehensive
evaluation of the models.6. DISCUSSION
A
Bayesian
method foranalysis
of mixed effects linear models with t-distributed randomeffects,
withemphasis
onquantitative genetic applica-
tions,
waspresented.
Theobjective
was to obtain inferences that are more robust todepartures
fromassumptions, specially
at the level of the residualdistributions,
than those obtained with a mixed effects Gaussian linearmodel,
the current
paradigm
inquantitative genetics [19, 21, 27].
Ourapproach
wasillustrated with simulated data from a
dairy
cattlebreeding scheme,
where cowswere
subject
tofairly prevalent
andstrong preferential
treatment. A univariate t-model for the errors led to more accurate inferences about additivegenetic
variance than either a herd-clustered t-model or a Gaussian
sampling
process.The
posterior
distributions ofbreeding
values of someexample
animals weresharper
in the univariate t-model.Our model and
implementation
can be extended in severalrespects.
Forexample,
if thedegrees
of freedom of the distribution ofgenetic
values needs to beassessed,
it ispossible
to cluster thegenetic
values into’independent’
familiesand
proceed
as for the residual variance.However,
suchclustering
would leadto a loss of accuracy in the
specification
of thegenetic
variance-covariancestructure,
becauserelationships
between individuals in different clusters would not be taken into account. Another extension would be to take thedegrees
of freedom as continuous and use a
rejection algorithm
or aMetropolis- Hastings
walk to drawsamples,
combined with the Gibbssampler
for the restof the
parameters
of the model. Additional randomeffects,
such aspermanent
environmental effectsaffecting
all records of a cow, can beincorporated,
e.g.by taking
a univariate t-distribution asprior.
Residuals can be clustered in different manners. Forexample, clustering
errorsby
sire or full-sib families may cope withinadequate genetic assumptions,
e.g. unknownmajor
genes may besegregating.
Inaddition,
it ispossible
to allow forheterogeneous
variance in the model withoutmajor difficulty.
Other residual distributions such as thelogistic
or the slash may be considered as well.
At
present,
it is notyet possible
toapply
these methods to thelarge
datasets used for routine
genetic
evaluation in thedairy
cattlebreeding industry,
where the models can have millions of individual
breeding
values.Hence,
if it isestablished that models based on the t-distribution
improve genetic evaluations, computationally simpler
or faster methods should bedeveloped.
Onepossibility
would be to
employ Laplacian approximations
to assess the mode of thejoint
distribution of the
dispersion parameters
and then use some form of conditionalanalysis
to obtainpoint predictors
ofbreeding
values. In a model with t- distributed randomeffects,
modal estimates of fixed and randomeffects, given
the scale
parameters
and thedegrees
of freedom can be foundusing
an iterativeprocedure [35].
Thisrequires solving reweighted
mixed modelequations
severaltimes,
as in threshold models.Approximate
solutions after acouple
of iterationsmay be
adequate
forpractical
purposes. ThisLaplacian-iteration approach
would be
counterpart
to the standard REML-BLUPanalysis
in linear models under Gaussianassumptions.
Finally,
we would like to observe someinterpretative
differences between the Gaussian and the t-model from aquantitative genetic point
of view.Heritability
is the
regression
ofgenotype
onphenotype,
which iso’!/(<!+o’!)
in a Gaussianmodel. In the t-models
discussed,
thisregression
iswith the numerator
(and
theappropriate part
of thedenominator) being equal
to