HAL Id: hal-00894021
https://hal.archives-ouvertes.fr/hal-00894021
Submitted on 1 Jan 1994
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Bayesian analysis of mixed linear models via Gibbs sampling with an application to litter size in Iberian pigs
Cs Wang, Jj Rutledge, D Gianola
To cite this version:
Cs Wang, Jj Rutledge, D Gianola. Bayesian analysis of mixed linear models via Gibbs sampling with
an application to litter size in Iberian pigs. Genetics Selection Evolution, BioMed Central, 1994, 26
(2), pp.91-115. �hal-00894021�
Original article
Bayesian analysis of mixed linear models via Gibbs sampling with an application
to litter size in Iberian pigs
CS Wang JJ Rutledge, D Gianola
University of Wisconsin-Madison, Department of Meat
and AnimalScience, Madison, WI53706-1284,
USA(Received
26April 1993; accepted
17 December1993)
Summary - The Gibbs
sampling
is a Monte-Carloprocedure
forgenerating
random sam-ples
fromjoint
distributionsthrough sampling
from andupdating
conditional distribu- tions. Inferences about unknown parameters are madeby: 1) computing directly
sum-mary statistics from the
samples;
or2) estimating
themarginal density
of anunknown,
and then
obtaining
summary statistics from thedensity.
All conditional distributions needed toimplement
the Gibbssampling
in a univariate Gaussian mixed linear modelare
presented
in scalaralgebra,
so no matrix inversion is needed in thecomputations.
Forlocation parameters, all conditional distributions are univariate
normal,
whereas those for variance components are scaled invertedchi-squares.
Theprocedure
wasapplied
to solvea Gaussian animal model for litter size in the Gamito strain of Iberian
pigs.
Data were1 213 records from 426 dams. The model had
farrowing
season(72 levels)
andparity (4)
as fixed
effects; breeding
values(597),
permanent environmental effects(426)
and resid-uals were random. In CASE I, variances were assumed
known,
with REML(restricted
maximum
likelihood)
estimates used as true parameter values.Here,
means and variances of theposterior
distributions of all effects wereobtained, by inversion,
from the mixed modelequations.
These exact solutions were used to check the Monte-Carlo estimatesgiven by Gibbs, using
120 000samples.
Linearregression slopes
of trueposterior
meanson Gibbs means were almost
exactly
1 forfixed,
additivegenetic
and permanent environ- mental effects.Regression slopes
of trueposterior
variances on Gibbs variances were1.00,
1.01 and
0.96, respectively.
In CASEII,
variances were treated asunknown,
with a flatprior assigned
to these. Posterior densities of selected location parameters, variance com- ponents,heritability
andrepeatability
were estimated.Marginal posterior
distributions ofdispersion
parameters wereskewed,
save the residual variance; the means, modes and medians of these distributions differed from the REML estimates, asexpected
fromtheory.
The conclusions are:
1)
the Gibbssampler converged
to the trueposterior distributions,
as
suggested by
CASEI; 2)
itprovides
a richerdescription
ofuncertainty
aboutgenetic
* Present address: Morrison
Hall, Department
of AnimalScience,
CornellUniversity,
Ithaca,
NY 14853-4801,
USAparameters than
REML; 3)
it can be usedsuccessfully
tostudy quantitative genetic
varia-tion
taking
into accountuncertainty
about all nuisance parameters, at least inmoderately
sized data sets.
Hence,
it should be useful in theanalysis
ofexperimental
data.Iberian pig
/
genetic parameters/
linearmodel / Bayesian
methods/
Gibbs samplerRésumé - Analyse bayésienne de modèles linéaires mixtes à l’aide de l’échantillon- nage de Gibbs avec une application à la taille de portée de porcs ibériques.
L’échantillonnage
de Gibbs est uneprocédure
de Monte-Carlo pourengendrer
des échan-tillons aléatoires à partir de distributions
conjointes,
paréchantillonnage
dans des dis- tributions conditionnellesréajustées
itérativement. Lesinférences
relatives auxparamètres
inconnus sont obtenues en calculant directement des statistiques
récapitulatives
à partir des échantillonsgénérés,
ou en estimant la densitémarginale
d’une inconnue, et en calculant desstatistiques récapitulatives
à partir de cette densité. Toutes les distributions condi- tionnelles nécessaires pour mettre en ceuvrel’échantillonnage
de Gibbs dans un modèle univarié linéaire mixte gaussien sontprésentées
enalgèbre scalaire,
si bienqu’aucune
in-version matricielle n’est requise dans les calculs. Pour les
paramètres
de position, toutes les distributions conditionnelles sont normalesunivariées,
alors que celles des composantes de variance sontdes x 2
inverses dimensionnés. Laprocédure
a étéappliquée
à un modèleindividuel gaussien de taille de
portée
dans la souche porcineibérique
Gamito. Les donnéesreprésentaient
1 21,i observations sur426
mères. Le modèle incluait leseffets fixés
de lasaison de mise bas
(72 niveaux)
et de laparité (4 niveaux) ;
les valeursgénétiques
in-dividuelles
(597),
leseffets
de milieu permanent(426)
et les résidus étaient aléatoires.Dans le CAS
I,
les variances étaientsupposées
connues, les estimées REML(maximum
de vraisemblance
restreinte)
étant considérées comme les valeurs vraies desparamètres.
Les moyennes et les variances des distributions a
posteriori
de tous leseffets
étaient alors obtenues par la résolution dusystème d’équations
du modèle mixte. Ces solutions ex-actes étaient utilisées pour
vérifier
les estimées Monte-Carlo données par leGibbs,
enutilisant 120 000 échantillons. Les
coefficients
derégression
linéaire des vraies moyennesa
posteriori
enfonction
des moyennes de Gibbs étaient presque exactement de 1, pour leseffets fixés, génétiques additifs
et de milieu permanent. Lescoeff cients
derégression
des variances vraies a
posteriori en fonction
des variances de Gibbs étaient1,00,
1,01, et 0, 96 respectivement. Dans le CASII,
les variances étaient traitées comme des inconnues,avec une distribution a
priori uniforme.
Les densités aposteriori
deparamètres
deposition choisis,
des composantes devariance,
de l’héritabilité et de larépétabilité
ont été estimées.Les distributions a
posteriori
desparamètres
dedispersion
étaientdissymétriques, sauf
lavariance
résiduelle;
les moyennes, modes et médianes de ces distributionsdifféraient
desestimées
REML,
commeprévu d’après
la théorie. On conclut que :i)
l’échantillonneur de Gibbs converge vers les vraies distributions aposteriori,
comme lesuggère
le CASI, ii) il fournit
unedescription
de l’incertitude sur lesparamètres génétiques plus
riche queREML ; iii)
il peut être utilisé avec succès pour étudier la variationgénétique quantita-
tive avec prise en compte de l’incertitude sur tous les
paramètres
de nuisance, du moinsavec un nombre de données modéré. Il devrait donc être utile dans
l’analyse
de donnéesexpérimentales.
porc ibérique
/
paramètres génétiques/
modèle linéaire/
analyse bayésienne/
échantillonnage de GibbsINTRODUCTION
Prediction of merit or,
equivalently, deriving
a criterion for selection is animportant
theme in animal
breeding.
Cochran(1951),
under certainassumptions,
showed thatthe selection criterion that maximized the
expected
merit of the selected animalswas the mean of the conditional distribution of merit
given
the data. The conditionalmean is known as best
predictor,
or BP(Henderson, 1973),
because it minimizesmean square error of
prediction
among allpredictors. Computing
BPrequires knowing
thejoint
distribution ofpredictands
anddata,
which can seldom be metin
practice.
Tosimplify,
attention may be restricted to linearpredictors.
Henderson
(1963, 1973)
and Henderson et al(1959) developed
the best linear unbiasedprediction (BLUP),
which removed therequirement
ofknowing
the firstmoments of the distributions. BLUP is the linear function of the data that minimizes
mean square error of
prediction
in the class of linear unbiasedpredictors.
Bulmer
(1980),
Gianola and Goffinet(1982),
Goffinet(1983)
and Fernando and Gianola(1986)
showed that under multivariatenormality,
BLUP is the conditionalmean of merit
given
a set oflinearly independent
error contrasts. This holds if the second moments of thejoint
distribution of the data and of thepredictand
are known.
However,
second moments arerarely
known inpractice,
and must be estimated from the data at hand. If estimateddispersion parameters
are used in lieu of the truevalues,
theresulting predictors
of merit are nolonger
BLUP.In animal
breeding, dispersion components
are most often estimatedusing
re-stricted maximum
likelihood,
or REML(Patterson
andThompson, 1971).
Theo-retical
arguments (eg,
Gianola etal, 1989;
Im etal, 1989)
and simulations(eg,
Rothschild et
al, 1979) suggest
that likelihood-based methods haveability
to ac-count for some forms of nonrandom
selection,
which makes theprocedure appealing
in animal
breeding. Thus, 2-stage predictors
are constructedby, first, estimating
variance and covariance
components,
and thenobtaining
BLUE and BLUP of fixed and randomeffects, respectively,
withparameter
valuesreplaced by
likelihood- based estimates. Under randomselection,
this2-stage procedure
should converge inprobability
to BLUE and BLUP as the information in thesample
about variancecomponents increases; however,
itsfrequentist properties
under nonrandom selec- tion are unknown. Onedeficiency
of this BLUE and BLUPprocedure
is that errorsof estimation of
dispersion components
are not taken into account whenpredicting breeding
values.Gianola and Fernando
(1986),
Gianola et al(1986)
and Gianola et al(1990a, b, 1992)
advocate the use ofBayesian
methods in animalbreeding.
The associatedprobability theory
dictates that inferences should be based onmarginal posterior
distributions ofparameters
ofinterest,
such thatuncertainty
about theremaining parameters
isfully
taken into account. Thestarting point
is thejoint posterior
den-sity
of all unknowns. From thejoint distribution,
themarginal posterior
distribution of aparameter,
say thebreeding
value of ananimal,
is obtainedby successively
in-tegrating
out all nuisanceparameters,
thesebeing
the fixedeffects,
all the random effects other than the one ofinterest,
and the variance and covariancecomponents.
This
integration
is difficultby analytical
or numerical means, soapproximations
are
usually sought (Gianola
andFernando, 1986;
Gianola etal, 1986;
Gianola etal, 1990a, b).
The
posterior
distributions are socomplex
that ananalytical approach
is oftenimpossible,
so attention has concentrated on numericalprocedures (eg,
Cantet etal, 1992).
Recentbreakthroughs
are related to Monte-Carlo Markov chainprocedures
for multidimensional
integrations
and forsampling
fromjoint
distributions(Geman
and
Geman, 1984;
Gelfand andSmith, 1990;
Gelfand etal, 1990).
One of theseprocedures,
Gibbssampling,
has been studiedextensively
in statistics(Gelfand
and
Smith, 1990;
Gelfand etal, 1990; Besag
andCliford, 1991;
Gelfand andCarlin, 1991; Geyer
andThompson, 1992).
Wang
et al(1993)
described the Gibbssampler
for a univariate mixed lin-ear model in an animal
breeding
context.They
used simulated data to constructmarginal
densities of variancecomponents,
variance ratios and intraclass correla-tions,
and noted that themarginal
distributions of fixed and random effects could also be obtained.However,
theirimplementation
was in matrix form.Clearly,
some matrix com-putations
are not feasible in many animalbreeding
data sets because inversion oflarge
matrices is neededrepeatedly.
In this paper, we consider
Bayesian marginal
inferences about fixed and randomeffects,
variancecomponents
and functions of variancecomponents
in a univariate Gaussian mixed linear model.Here, marginal
inferences areobtained,
in contrastto
Wang
et al(1993) through
a scalar version of the Gibbssampler,
so inversion ofmatrices is not needed. Our
implementation
wasapplied
to and validated with adata set on litter size of Iberian
pigs.
THE GIBBS SAMPLER FOR THE GAUSSIAN MIXED LINEAR MODEL
Model
We consider a univariate mixed linear model with several
independent
randomfactors as in Henderson
(1984),
Macedo and Gianola(1987)
and Gianola et al(1990a, b):
where: y: data vector of order n
x 1;
X: known incidence matrix of order n x p ;Z i
:
known matrix of order n x qi ;(3:
p x 1 vector ofuniquely
defined ’fixed effects’(so
that X has full columnrank);
ui: iq x 1 randomvector;
and e: n x 1 vector of random residuals.The conditional distribution that
generates
the data is:where I is an n x n
identity matrix,
andQ e is
the variance of the random residuals.Prior Distributions
An
integral part
ofBayesian analysis
is theassignment
ofprior
distributions to all unknowns in themodel; here,
these are13, Ui (i
=1, 2, ... , c)
and the c + 1 variancecomponents (one
for each of the randomvectors, plus
theerror). Usually,
a flat oruniform
prior
distribution isassigned
to0,
so as torepresent
lack ofprior knowledge
about this
vector,
so:Further,
it is assumed that:where
G i
is a known matrix andcr!
is the variance of theprior
distribution of ui
. In a
genetic context, G i matrices
can contain functions of known coefficients ofcoancestry.
Allu i ’s
are assumed to bemutually independent
apriori,
as well asindependent
ofj3.
Note that thepriors
for uicorrespond
to theassumptions
madeabout these random vectors in the classical linear model.
Independent
scaled invertedchi-square
distributions are used aspriors
forvariance
components,
so that:and
Above, ve(v&dquo;!)
is a’degree
of belief’parameter,
ands!(s!)
can bethought
of as aprior
value of theappropriate
variance.Joint
posterior density
be
0’
without0 z . Further,
letbe the vector of variance
components
other than theresidual;
and
be the sets of all
prior
variances anddegrees
ofbelief, respectively.
Asshown,
forexample, by
Macedo and Gianola(1987)
and Gianola et al(1990a, b),
thejoint posterior density
is in thenormal-gamma
form:Inferences about each of the unknowns
(9, v, !e )
are based on theirrespective
marginal
densities.Conceptually,
each of themarginal
densities is obtainedby
successive
integration
of thejoint density [7]
withrespect
toparameters
other than the one of interest. Forexample,
themarginal density
ofa£ is
It is difficult to carry out the needed
integration analytically.
Gibbssampling
is aMonte-Carlo
procedure
to overcome such difficulties.liblly
conditionalposterior
densities(Gibbs sampler)
The
fully
conditionalposterior
densities of all unknowns are needed forimplement- ing
the Gibbssampling.
Each of the full conditional densities can be obtainedby regarding
all otherparameters
in[7]
as known. Let W =f w ij 1, i, j
=1, 2, ... , N,
and b =
{b i },
i =1, 2, ... ,
N be the coefficient matrix and theright
hand side of the mixed modelequations, respectively.
Asproved
in theA P pendix,
the conditionalposterior distribution
of each of the locationparameters
in 0 isnormal,
with meanand
variance !i
andEz :
because all
computations
needed toimplement
Gibbssampling
arescalar,
withoutany
required
inversion of matrices. This is in contrast with the matrix version of the conditionalposterior
distributions for the locationparameters given by Wang
et al
(1993).
It should be noted that distributions[8]
do notdepend
on s and v, because v is known in[8].
..The conditional
posterior density
ofo,2 is
in the scaled invertedchi-square
form:It can be
readily
seen thatwith
parameters
Each condition
posterior density
ofa u! 2
is also in the scaled invertedchi-square
form:
A set of the N + c + 1 conditional
posterior
distributions[8]-[10]
is called theGibbs
sampler
for ourproblem.
FULL CONDITIONAL POSTERIOR DENSITIES UNDER SPECIAL PRIORS
The Gibbs
sampler
with ’naive’priors
for all variancecomponents
The Gibbssampler [8]-[10] given
above is based on scaled invertedchi-squares
used as
priors
for the variancecomponents.
Thesepriors
are properand, therefore, informative,
about the variance. Apossible
way ofrepresenting prior ignorance
about variances would be to set the
degree
of beliefparameters
of theprior
distributions for all the variance
components
to zeroie,
ve = vu, =0,
for all i.These
priors
have beenused,
interalia, by
Gelfand et al(1990)
and Gianola et al(1990a, b).
In this case, the conditionalposterior
distributions of the locationparameters
are as in!8!:
because the distributions do not
depend
on s and v.However,
the conditionalposterior
distributions of the variancecomponents
nolonger depend
onhyper-parameters
s and v. The conditionalposterior
distribution of the residual variance remains in the scaled invertedchi-square
form:but now with
parameters
Each conditional
posterior density of a),
isagain
in the scaled invertedchi-square
form:with
parameters v&dquo;!
= qi andV ui
=u§Gy lui/v&dquo;!.
It has been noted
recently (Besag
etal, 1991; Raftery
andBanfield, 1991)
thatunder these
priors,
thejoint posterior density [7]
isimproper
because it does notintegrate
to 1. In thelight
ofthis,
we do not recommend these ’naive’priors
forvariance
component
models.The Gibbs
sampler
with flatpriors
for all variancecomponents
Under flat
priors
for all variancecomponents,
iep(v, (J’!)
occonstant,
the Gibbssampler
is as in!11!-!13!, except
thatv e
= n - 2 andVu,
= qi - 2 for i =1,2,...,
c.This version of the
sampler
can also be obtainedby setting
V, = -2 in[9]
andv =
(-2, -2, ... , -2)’
and s =(0, 0, ... , 0)’
in[10].
With flatpriors,
thejoint posterior density [7]
is proper.The Gibbs
sampler
when all variancecomponents
are knownWhen variances are assumed
known,
theonly
conditional distributions needed arethose for the location
parameters,
and these are as in[8]
or!11!.
INFERENCES ABOUT THE MARGINAL DISTRIBUTIONS THROUGH GIBBS SAMPLING
Gibbs
sampling:
an overviewThe Gibbs
sampler
was used first inspatial
statistics andpresented formally by
Geman and Geman
(1984)
in animage
restoration context.Applications
toBayesian
inference were described
by
Gelfand and Smith(1990)
and Gelfand et al(1990).
Since
then,
it has received extensiveattention,
as evidencedby
recent discussionpapers
(Gelman
andRubin, 1992; Geyer, 1992; Besag
andGreen, 1993;
Gilks etal, 1993;
Smith andRoberts, 1993).
Its power and usefulness as ageneral
statistical tool togenerate samples
fromcomplex
distributionsarising
in someparticular problems
isunquestioned.
Our purpose is to
generate
randomsamples
from thejoint posterior
distribu-tion
!7!, through successively drawing samples
from andupdating
the Gibbs sam-pler !8!-!10!. Formally,
Gibbssampling
works as follows:(i)
setarbitrary
initial values fore,
v anda e 2
(ii) generate O i
from[8]
andupdate B i , i = 1, 2, ... , N ;
(iii) generate u2 from [9]
andupdate Q e ;
(iv) generate u2 i
from[10]
andupdate a-!i’ i
=1,2,...,
c;(v) repeat (ii)-(iv)
k(length
of thechain)
times.As k - oo, this creates a Markov chain with an
equilibrium
distribution that has[7]
as itsdensity.
We shall call thisprocedure
asingle long-chain algorithm.
In
practice,
there are at least 2 ways ofrunning
the Gibbssampler:
asingle long
chain and
multiple
short chains. Themultiple
short-chainalgorithm repeats steps
(i)-(v)
m times and savesonly
the kth iteration assample (Gelfand
andSmith, 1990;
Gelfand etal, 1990).
Based on theoreticalarguments (Geyer, 1992)
and onour
experience,
we used thesingle long-chain
method in thepresent study.
Initial iterations are
usually
not stored assamples
ongrounds
that the chain may notyet
have reached theequilibrium distribution;
this is called’warm-up’.
Afterthe warm-up,
samples
are stored every diterations,
where d is a smallpositive integer.
Let the total number ofsamples
saved be m, thesample
size.If the Gibbs
sampler
converges to theequilibrium distribution,
the msamples
are random
drawings
from thejoint posterior
distribution withdensity !7!.
The ithsample
jo i ,v
i
and(!)J,z=l,2,...,! [14]
is then an N + c + 1
vector,
and each of the elements of this vector is adrawing
from the
appropriate marginal
distribution. Note that the msamples
in[14]
areidentically
but notindependently
distributed(eg, Geyer, 1992).
We call msamples
in
[14]
as Gibbssamples
for reference.Density
estimation andBayesian
inference based on the Gibbssamples Suppose x i ,
i =1, 2, ... , m
is one of thecomponents [14],
ie a realization fromrunning
the Gibbssampler
of variable x. The m(dependent) samples
can be usedto
compute
features of theposterior
distributionP(x) by
Monte-Carlointegration.
An intevral!
u=
J g ( x )d P (x) [ 15]
can be
approximated by
where
g(x)
can be any feature ofP(x),
such as its mean or variance. As m - 00, uconverges almost
surely
to u(Geyer, 1992).
Another way to
compute
features ofP(x)
isby
firstestimating
thedensity p(x),
and thenobtaining
summary statistics from the estimateddensity using
1-dimensional numerical
procedures. If Yi (i
=1, 2, ... , m)
is anothercomponent
of(14!,
an estimator ofp(x)
isgiven by
the average of the m conditional densitiesp(xly i
) (Gelfand
andSmith, 1990):
Note that this estimator does not use the
samples xi,
i =1, 2, ... , m; instead,
it uses the
samples
ofvariable y through
the conditionaldensity p(x!y).
Thisprocedure, though developed primarily
foridentically
andindependently
distributed(iid) data,
can also be used fordependent samples,
as notedby
Liu et al(1991)
andDiebolt and Robert
(1993).
An alternative form of
estimating p(x)
is to usesamples x i (i
=1,2,..., m) only.
For
example,
a kerneldensity
estimator is defined(Silverman, 1986)
as:where j!(z)
is the estimateddensity
atpoint z, K(.)
is a ’kernelfunction’,
and h isa fixed constant called window
width;
the latter determines the smoothness of the estimated curve. Forexample,
if a normaldensity
is chosen as kernelfunction,
then[18]
becomes:Again, though
the kerneldensity
estimator wasdeveloped
for iiddata,
the work of Yu(1991)
indicates that the method is valid fordependent
data as well.Once the
density of p(x)
is estimatedby
either[17]
or!19!,
summary statistics(eg,
mean andvariance)
can becomputed by
a 1-dimensional numericalintegration procedure,
such asSimpson’s
rules.Probability
statements about x can also bemade,
thusproviding
a fullBayesian
solution to inferences about the distribution x.Bayesian
inference about functions of theoriginal parameters Suppose
we want to make inference about the function:The
quantity
is a random
(dependent) sample
of size m from a distribution withdensity p(z).
Formulae
!16!, [18]
and[19] using
suchsamples
can also be used to make inferences about z.An alternative is to use standard
techniques
to transform from either the conditional densitiesp(x!y)
orp(y!x),
top(z!y)
orp(z!x).
Let the transformation be fromxly
toz!y;
the Jacobian of the transformation islyl,
so the conditionaldensity
ofzl y is:
An estimator
of p(z),
obtainedby averaging
m conditional densitiesof p(zly),
isAPPLICATION OF GIBBS SAMPLING TO LITTER SIZE IN PIGS Data
Records were from the Gamito strain of Iberian
pigs, Spain.
The trait consideredwas number of
pigs
born alive per litter. Details about this strain and the data arein Dobao et al
(1983)
and Toro et al(1988);
Perez-Enciso and Gianola(1992)
gave REML estimates ofgenetic parameters. Briefly,
the data were 1213 records from 426 dams(including
68 crossfosteredfemales).
There were 72farrowing
seasons and4
parity
classes as definedby
Perez-Enciso and Gianola(1992).
Model
A mixed linear model similar to that of Perez-Enciso and Gianola
(1992)
was:where y is a vector of observations
(number
ofpigs
born alive perlitter); X, Z i
andZ 2
are known incidence matricesrelating (3, u
and c,respectively,
to y;13
isa vector of fixed
effects, including
a mean,farrowing
season(72 levels)
andparity (4 levels) ;
u is a random vector of additivegenetic
effects(597 levels) ;
c is a randomvector of
permanent
environmental effects(426 levels) ;
and e is a random vectorof residuals. Distributional
assumptions
were:where
Qu, a!
andcr!
are variancecomponents
and A is the numerator ofWright’s relationship matrix;
the vectors u, c and e were assumed to bepairwise indepen-
dent. After
reparameterization,
the rank(p)
of X was 1 + 71 + 3 =75;
the rank of the mixed modelequations
was then: N = 75 + 597 + 426 = 1 098.Gibbs
sampling
We ran 2
separate
Gibbssamplers
with this dataset,
and we refer to theseanalyses
as CASES I and II. In CASE
I,
the 3 variancecomponents
were assumedknown,
with REML estimates
(Meyer, 1988)
used as trueparameter
values. In CASEII,
the variance
components
wereunknown,
and flatpriors
wereassigned
to them. Foreach of the 2 cases, a
single
chain of size 1 205 000 was run. Afterdiscarding
thefirst 5 000
iterations, samples
were saved every 10 iterations(d
=10),
so the totalnumber of
samples (m)
saved was 120 000. Thisspecification (mainly
thelength
ofa
chain)
ofrunning
the Gibbssampler
was based on our ownexperience
with thisdata and with others. It may be different for other
problems.
Due to
computer storage limitation,
not all Gibbssamples
and conditionalmeans and variances could be saved for all location
parameters. Instead,
forfurther
analysis
andillustration,
we selected 4 locationparameters arbitrarily,
one from each of the 4 factors
(farrowing
season,parity,
additivegenetic
effectand
permanent
environmentaleffect).
For each of these 4 locationparameters,
thefollowing quantities
were stored:where x
i
is a Gibbssample
from anappropriate marginal distribution, and 0 1
andv
i
are the mean and variance of the conditionaldistribution, [8]
or!11J,
used forgenerating
xi at each of the Gibbssteps.
In CASE
II,
we also saved the m Gibbssamples
for each of the variancecomponents,
andwhere s
i
is the scaleparameter appearing
in the conditionaldensity [9]
or[10]
ateach of the Gibbs iterations.
A FORTRAN program was written to
generate
thesamples,
with IMSL subrou- tines used fordrawing
random numbers(IMSL, INC, 1989).
Density
estimation and inferencesFor each of the 4 selected location
parameters (CASES
I andII)
and the 3 variancecomponents,
we estimated themarginal posterior
with estimators[17]
and!19!,
ieby averaging
m conditional densities andby
the normal kerneldensity
estimationmethod, respectively.
Estimator[17]
of thedensity
of a locationparameter
wasexplicitly:
where 0
j
and11j
are the conditional mean and variance of the conditionalposterior density
of z. For each of the variancecomponents,
the estimator was:where v is the
degree
ofbelief,
sj is the scaleparameter
of the conditionalposterior
distribution of the variance of
interest,
andr(.)
is the gamma function. The normal kernel estimator[19]
wasapplied directly
to thesamples
for location anddispersion parameters.
To estimate the
densities,
we divided the ’effective domain’ of eachparameter
into 100equally spaced intervals;
the effective domain contained at least99.5%
of the
density
mass.Running through
the effectivedomain,
a sequence ofpairs (p(z
i
), zi),
i =1, 2, ... ,101,
wasgenerated.
Densities weredisplayed graphically by plotting (p(z i ), z i ) pairs.
For the normal kerneldensity
estimator(19!,
window widthwas
specified
as: h =(range
of effectivedomain)/75.
For the 4 selected location
parameters,
the mean,mode,
median and variance of each of themarginal
distributions werecomputed
as summary features. The mean,median and variance were obtained with
Simpson’s integration
rulesby
furtherdividing
the effective domain into 1 000equally spaced intervals,
andusing
a cubicspline technique (IMSL, INC, 1989);
the mode was locatedthrough
agrid
search.For location
parameters
other than the 4 selected ones, theposterior
means andvariances were
computed directly
from the Gibbssamples.
Density
estimation for functions of variancecomponents
As mentioned
previously,
theposterior density
of any function of theoriginal parameters
can be madethrough
transformation of random variables withoutrerunning
the Gibbssampler, provided appropriate samples
are saved. In thissection,
we summarize methods fordensity
estimation of functions of variancecomponents;
see alsoWang
et al(1993).
Let the Gibbs
samples for or2,a 2
andor2 ,
berespectively:
Also,
let the scaleparameters
of thecorresponding
densities be:and
Consider first
estimating
themarginal posterior density
of the total variance:QP
=a2 + a§ c 2+U2 . e
Let the transformation be from the conditionalposterior density,
(0,2 e 1 y, e, s, v)
to( 01; 2 1 y, 0, v, s, v).
The Jacobian of the transformation is 1.Thus, using !17!,
the estimator of themarginal posterior density
ofor; is:
where v =
v e . Further,
zpi = Xui + x!i + xei,(i
=1, 2, ... , m),
is asample
from themarginal distribution’
ofo-!.
Similarly, the estimator
of themarginal posterior density
ofh 2
=C2/ u 01; is
obtained
by using
thetransformation afl - h 2
in the conditionalposterior density
ofa2.
Oneobtains
where v
= i u
andFor
repeatability,
r =( Q u
+0,2)/U,2,
we used the transformationo! —!
r in theconditional
posterior density
ofQ e
to obtainwhere c is as in
[33]
but with v =v e .
If one wishes to make inferences about the variance
ratio, !y
=(]&dquo;!/(]&dquo;!,
thetrasnformation
u2
-> qyields
the estimator of themarginal posterior density of
!.where v =
vu.
The varianceratio,
6 =or2/or2,
is estimated in the same mammer asfor q in
[35]
with thesamples
Scj substituted inplace
of s!! and v =using
thetransformation
o,2 c
6.RESULTS
When variance
components
are known(CASE I),
themarginal posterior
distribu-tions of all location
parameters
are normal(Gianola
andFernando, 1986).
The mean(mode
ormedian)
of themarginal
distribution of a locationparameter
isgiven by
the
corresponding component
of the solution vector of the mixed modelequations,
and the variance of the distribution is
equal
to thecorresponding diagonal
elementof the inverted mixed model coefficient
matrix, multiplied by
the residual variance.These are mathematical
facts,
and do not relate in any way to the Gibbssampler.
We used this
knowledge
to assess the convergence of the Gibbssampler,
whichgives
Monte-Carlo estimates of the
posterior
means and variances. In CASEI,
for thedata at
hand,
theposterior
distributions can be arrived at moreefficiently by
directinversion or iterative methods than via the Gibbs
sampler.
Table I contains results of a
comparison
between theposterior
means andvariances estimated
by
Gibbssampling (GIBBS)
with the exact values foundby solving directly
the mixed modelequations (TRUE).
Several criteria were used to compare the 2 sets of results: absolute difference(bias)
between TRUE andGIBBS;
absolute relative bias(RB)
defined as bias dividedby TRUE;
theslopes
of the linear
regression
of TRUE onGIBBS,
and vice versa; and the correlation between TRUE and GIBBS. Of course, GIBBS and TRUE were notexactly
thesame because GIBBS is
subject
to Monte-Carlosampling
errors; asm(k)
goes toinfinity,
GIBBS isexpected
to converge to TRUE. We found excellentagreement
between TRUE and GIBBS for all these criteria. The average absolute RB did not exceed
1%, except
for theposterior
means of additivegenetic
andpermanent
environmental effects. For theseeffects,
the RB criterion can bemisleading,
becausethe true
posterior
means were very small invalue,
so that even small biases madethe RB very
large.
Theregressions
and correlation coefficients between GIBBS and TRUE were all close to 1.0 for both means and variances. All these results indicate that the Gibbssampler converged
in thisapplication.
The true and estimated
posterior
distributions of the 4 selected location parame- ters aredepicted
infigure
1. The true densities aresimply
normaldensity plots
with means and variances from the TRUE
analysis.
The estimated densitieswere obtained with
[17]
and[19].
The 3 curvesoverlapped perfectly, indicating
convergence of the Gibbs
sampler
to the trueposterior
distributions.Figure
2depicts
themarginal posterior
densities of the same 4 selected locationparameters
for CASEII,
ie with unknown variances. For the 2 fixedeffects,
thedistributions were
essentially symmetric,
and similar to those found for CASE I(fig 1).
Thisindicated,
for thisapplication,
thatreplacing
the unknown variancesby
REMLestimates,
and thencompleting
aBayesian analysis
of fixed effects as if variances were known(Gianola
etal, 1986),
wouldgive
agood approximation
to inferences about fixed effects in the absence ofknowledge
about variances. Note that the variances of theposterior
distributions of the fixed effects shown infigures
1 and2 are similar. In
theory,
one wouldexpect
theposterior
variances to be somewhatlarger
in CASE II.However,
it should be borne in mind that the Gibbs variancesare Monte-Carlo
estimates,
thereforesubject
tosampling
error.A
noteworthy
feature offigure
2 was that densities of the additivegenetic
and
permanent
environmental effects wereskewed,
in contrast to the normal distributions found in CASE I.Further,
theposterior
densities weresharply peaked
in the
neighborhood
of0,
theprior
mean; this is consistent with the fact(as
discussed
later)
that in this data there was considerabledensity
near 0 for theadditive