HAL Id: hal-00894037
https://hal.archives-ouvertes.fr/hal-00894037
Submitted on 1 Jan 1994
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Bayesian analysis of genetic change due to selection
using Gibbs sampling
Da Sorensen, Cs Wang, J Jensen, D Gianola
To cite this version:
Original
article
Bayesian
analysis
of
genetic
change
due
to
selection
using
Gibbs
sampling
DA Sorensen
CS Wang
J Jensen
D
Gianola
1
National
Instituteof
AnimalScience,
Research CenterFoulum,
PB 39, DK8830Tjele, Denmark;
2
Department of Meat
and AnimalScience, University of Wisconsin-Madison,
Madison, WI53706-128.!,
USA(Received
7 June 1993;accepted
10February
1994)
Summary - A method of
analysing
response to selectionusing
aBayesian
perspective ispresented.
Thefollowing
measures of response to selection wereanalysed:
1)
total response in terms of the difference in additivegenetic
means between last and firstgenerations;
2)
theslope
(through
theorigin)
of theregression
of mean additivegenetic
value ongeneration;
3)
the linearregression slope
of mean additivegenetic
value ongeneration.
Inferences are based onmarginal posterior
distributions of the above-defined measures ofgenetic
response, and uncertainties about fixed effects and variance components are taken into account. Themarginal
posterior distributions were estimatedusing
the Gibbssampler.
Two simulated data sets withheritability
levels 0.2 and 0.5having
5cycles
of selection were used to illustrate the method. Twoanalyses
were carried out for each dataset, with
partial
data(generations 0-2)
and with the whole data. TheBayesian analysis
differed from a traditionalanalysis
based on best linear unbiasedpredictors
(BLUP)
with an animalmodel,
when the amount of information in the data was small. Inferences about selection response were similar with both methods athigh heritability
values andusing
all the data for theanalysis.
TheBayesian approach correctly
assessed thedegree
ofuncertainty
associated with insufficient information in the data. ABayesian analysis using
2 different sets of prior distributions for the variance components showed that inferences differedonly
when the relative amount of information contributedby
the data was small. response to selection/
Bayesian analysis/
Gibbs sampling/
animal modelRésumé - Analyse bayésienne de la réponse génétique à la sélection à l’aide de
l’échantillonnage de Gibbs. Cet article
présente
une méthoded’analyse
desexpériences
de sélection dans une perspectivebayésienne.
Les mesures suivantes desréponses
à la sélection ont étéanalysées:
i)
laréponse totale,
soit ladifférence
des valeursgénétiques
additives moyennes entre la dernière et la
première génération;
ii)
la pente(passant
parl’origine)
de larégression
de la valeurgénétique
additive moyenne enfonction
de lagénération;
iii)
la pente de larégression
linéaire de la valeurgénétique
additive moyenne enfonction
de lagénération.
Lesinférences
sont basées sur les distributionsmarginales
aincertitudes sur les
effets fixés
et les composantes de variance. Les distributionsmarginales
a
posteriori
ont été estimées à l’aide del’échantillonnage
de Gibbs. Deux ensembles de données simulées avec des héritabilités de 0,2 et 0,5 et 5cycles
de sélection ont été utilisés pour illustrer la méthode. Deuxanalyses
ont étéfaites
surchaque
ensemble dedonnées,
avec des données
incomplètes
(génération 0-!!
et avec les donnéescomplètes. L’analyse
bayésienne différait
del’analyse traditionnelle,
basée sur le BL UP avec un modèleanimal,
quand
laquantité d’information
utilisée était réduite. Lesinférences
sur laréponse
à lt sélection étaient similaires avec les 2 méthodesquand
l’héritabilité était élevée et que toutes les données étaient prises en compte dansl’analyse. L’approche bayésienne
évaluait correctement ledegré
d’incertitude lié à uneinformation insuffisante
dans les données. Uneanalyse bayésienne
avec 2 distributions apriori
des composantes de variance a montre que lesinférences
nedifj&dquo;éraient
que si la partd’information fournie
par les données étaitfaible.
réponse à la sélection
/
analyse bayésienne/
échantillonnage de Gibbs/
modèle animalINTRODUCTION
Many
selection programs in farm animalsrely
on best linear unbiasedpredictors
(BLUP)
using
Henderson’s(1973)
mixed-modelequations
as acomputing device,
in order topredict breeding
values and rank candidates for selection. With theincreasing computing
power available and with thedevelopment
of efficientalgo-rithms for
writing
the inverse of the additiverelationship
matrix(Henderson,
1976;
(auaas,
1976),
’animal’ models have beengradually replacing
theoriginally
used siremodels. The
appeal
of ’animal’ models isthat, given
themodel,
use is made of the informationprovided
by
all known additivegenetic relationships
among individuals. This isimportant
to obtain moreprecise
predictors
and to account for the effects of certain forms of selection onprediction
and estimation ofgenetic
parameters.
A natural
application
of ’animal’ models has beenprediction
of thegenetic
meansof
cohorts,
forexample,
groups of individuals born in agiven
time interval suchas a year or a
generation.
Thesepredicted
genetic
means aretypically computed
as the average of the BLUP of the
genetic
values of theappropriate
individuals. Fromthese, genetic
change
can beexpressed
as, forexample,
theregression
of the meanpredicted
additivegenetic
value on time or onappropriate
cumulativeselection differentials
(Blair
andPollak,
1984).
In common with selectionindex,
it is assumed in BLUP that the variances of the random effects or ratios thereofare
known,
so thepredictions
ofbreeding
values andgenetic
meansdepend
on such ratios.This,
in turn, causes adependency
of the estimators ofgenetic change
derived from ’animal’ models on the ratios of the variances of the random effectsused as
’priors’
forsolving
the mixed-modelequations.
Thispoint
was first notedby Thompson
(1986),
who showed insimple settings,
that an estimator of realizedheritability
given
by
the ratio between the BLUP of total response and the total selection differential leads to estimates that arehighly dependent
on the value ofto
expect
that the statisticalproperties
of the BLUP estimator of response willdepend
on the method with which the’prior’
heritability
is estimated.In the absence of
selection,
Kackar and Harville(1981)
showed that when the estimators of variance in the mixed-modelequations
are obtained with evenestimators that are translation invariant and functions of the
data,
unbiasedpredictors
of thebreeding
value are obtained. No otherproperties
are known andthese would be difficult to derive because of the
nonlinearity
of thepredictor.
In selectedpopulations,
frequentist properties
ofpredictors
ofbreeding
value basedon estimated variances have not been derived
analytically
using
classical statisticaltheory.
Further,
there are no results from classicaltheory
indicating
which estimator ofheritability ought
to beused,
eventhough
the restricted maximum likelihood(REML)
estimator is anintuitively appealing
candidate. This is so because thelikelihood function is the same with or without
selection,
provided
that certainconditions are met
(Gianola
etat, 1989;
Im etat, 1989;
Fernando andGianola,
1990).
However,
frequentist properties
of likelihood-based methods under selection have not beenunambiguously
characterized. Forexample,
it is not known whether themaximum likelihood estimator is
always
consistent under selection. Someproperties
of BLUP-like estimators of responsecomputed by replacing
unknown variancesby
likelihood-type
estimates were examinedby
Sorensen andKennedy
(1986)
using
computer
simulation,
and themethodology
has beenapplied recently
toanalyse
designed
selectionexperiments
(Meyer
andHill,
1991).
Unfortunately, sampling
distributions of estimators of response are difficult to deriveanalytically
and onemust resort to
approximate results,
whosevalidity
is difficult to assess. In summary, theproblem
of exact inferences aboutgenetic
change
when variances are unknown has not been solved via classical statistical methods.However,
thisproblem
has aconceptually simple
solution when framed in aBayesian
setting,
assuggested by
Sorensen and Johansson(1992),
drawing
fromresults in Gianola et at
(1986)
and Fernando and Gianola(1990).
Thestarting point
is that if the
history
of the selection process is contained in the dataemployed
in theanalysis,
then theposterior
distribution has the same mathematical form with or without selection(Gianola
andFernando,
1986).
Inferences aboutbreeding
values
(or
functionsthereof,
such as selectionresponse)
are madeusing
themarginal
posterior
distribution of the vector ofbreeding
values or from themarginal
posterior
distribution of selection response. All other unknown
parameters,
such as ’fixed effects’ and variancecomponents
orheritability,
are viewed as nuisanceparameters
and must be
integrated
out of thejoint posterior
distribution(Gianola
etat,
1986).
The mean of theposterior
distribution of additivegenetic
values can be viewed as aweighted
average of BLUPpredictions
where theweighting
function is themarginal
posterior density
ofheritability.
Estimating
selection responseby
giving
allweight
to an REML estimate ofposterior
distribution.However,
thisapproximation
may be poor if theexperiment
has little informational content on
heritability.
A full
implementation
of theBayesian
approach
to inferences about selectionresponse relies on
having
the means ofcarrying
out the necessaryintegrations
ofjoint
posterior
densities withrespect
to the nuisanceparameters.
A Monte-Carloprocedure
to carry out theseintegrations
numerically
known as Gibbssampling
isnow available
(Geman
andGeman, 1984;
Gelfand andSmith,
1990).
Theprocedure
has been
implemented
in an animalbreeding
contextby Wang
et al(1993a; 1994a,b)
in a
study
of variancecomponent
inferencesusing
simulated siremodels,
and inanalyses
of litter size inpigs.
Application
of theBayesian
approach
to theanalysis
of selectionexperiments
yields
themarginal
posterior
distribution of response toselection,
from which inferences about it can bemade, irrespective
of whethervariances are unknown. In this paper, we describe
Bayesian
inference about selectionresponse
using
animal models where themarginalizations
are achievedby
means of Gibbssampling.
MATERIALS AND METHODS Gibbs
sampling
The Gibbs
sampler
is atechnique
forgenerating
random vectors from ajoint
distributionby successively sampling
from conditional distributions of all random variables involved in the model. Acomponent
of the above vector is a randomsample
from theappropriate
marginal
distribution. This numericaltechnique
wasintroduced in the context of
image processing
by
Geman and Geman(1984)
andsince then has received much attention in the recent statistical literature
(Gelfand
andSmith,
1990;
Gelfand etal,
1990;
Gelfand etal,
1992;
Casella andGeorge,
1992).
To illustrate theprocedure,
let us suppose there are 3 randomvariables,
X,
Y andZ,
withjoint
density
p(x, y, z);
we are interested inobtaining
themarginal
distributions ofX,
Y andZ,
with densitiesp(x),
p(y)
andp(z),
respectively.
In many cases, the necessaryintegrations
are difficult orimpossible
toperform
algebraically
and the Gibbssampler provides
a means ofsampling
from such amarginal
distribution. Our interest istypically
in themarginal
distributions and the Gibbssampler proceeds
as follows. Let)
o
, z
o
, y
o
(x
represent
anarbitrary
set ofstarting
values for the 3 random variables of interest. The firstsample
is:where
p(xl
y
=y o
, Z =
z
o
)
is the conditional distribution of Xgiven
Y =y o and
Z = zo. The second
sample
is:where x, is
updated
from the firstsampling.
The thirdsample
is:repeated
ktimes,
where k is known as thelength
of the Gibbs sequence. As k -> oo, thepoints
of the kth iteration(
Xk
,
Yk,Zk
)
constitute 1sample
point
fromp(x,
y,z)
when viewedjointly,
or fromp(x), p(y)
andp(z)
when viewedmarginally.
In order to obtain msamples,
Gelfand and Smith(1990)
suggest
generating
mindependent
’Gibbs
sequences’
from marbitrary
starting
values andusing
the final value at the kth iterate from each sequence as thesample
values. This method of Gibbssampling
is known as a
multiple-start sampling
scheme,
ormultiple
chains.Alternatively,
asingle
long
Gibbs sequence(initialized
therefore onceonly)
can begenerated
andevery dth observation is extracted
(eg,
Geyer
1992)
with
the total number ofsamples
savedbeing
m. Animalbreeding applications
of the short- andlong-chain
methodsare
given
by Wang
et al(1993,
1994a,b).
Having
obtained
the msamples
from themarginal
distributions,
possibly
corre-lated,
features of themarginal
distribution of interest can be obtainedappealing
to theergodic
theorem(Geyer,
1992;
Smith andRoberts,
1993):
where
x
i
(i
=1, ... ,
m)
are thesamples
from themarginal
distribution of x, u’ is anyfeature of the
marginal
distribution,
eg, mean, variance or, ingeneral,
any featureof a function of x, and
g(.)
is anappropriate operator.
Forexample,
if u is the meanof the
marginal
distribution,
g(!)
is theidentity
operator
and a consistent estimatorof the variance of the
marginal
distribution is(Geyer, 1992).
Note thatS
m
is a Monte-Carlo estimate of u, and the error of thisestimator can be made
arbitrarily
smallby
increasing
m. Another way ofobtaining
features of a
marginal
distribution is first to estimate thedensity
using
[11,
and thencompute
summary statistics(features)
of that distribution from the estimateddensity.
There are at least 2 ways to estimate a
density
from the Gibbssamples.
One is touse random
samples
(x
i
)
to estimatep(x).
We consider the normal kernel estimator(eg,
Silverman,
1986).
where
p(x)
is the estimateddensity
at x, and h is a fixed constant(called
the windowwidth)
given
by
the user. The window width determines the smoothness ofthe estimated curve.
Another way of
estimating
adensity
is based onaveraging
conditional densitiesand
given
knowledge
of the full conditionaldistribution,
an estimate ofp(x)
isobtained from:
«
where t/i,2:i;...,t/!,2:nt are the realized values of final
Y,
Zsamples
from eachGibbs sequence. Each
pair
y2, zi constitutes 1sample,
and there are m suchpairs.
Notice that the xi are not used to estimate
p(x).
Estimation of thedensity
of afunction of the
original
variables isaccomplished
eitherby applying
[2]
directly
to thesamples
of thefunction,
orby applying
thetheory
of transformation of random variables in anappropriate
conditionaldensity,
and thenusing
!3!.
In this case, the Gibbssampling
scheme does not need to be rerun,provided
the neededsamples
aresaved. Model
In the
present
paper we consider a univariate mixed model with 2 variancecomponents
for ease ofexposition.
Extensions to moregeneral
mixed models aregiven
by Wang
et al(1993b; 1994)
andby
Jensen et al(1994).
The model is:where y is the data vector of order n
by
1;
X is a known incidence matrix of order nby
p; Z is a known incidence matrix of order nby
q; b is a pby
1 vector ofuniquely
defined ’fixed effects’
(so
that X has full columnrank);
a is a qby
1 ’random’vector
representing
individual additivegenetic
values of animals and e is a vector of random residuals of order nby
1.The conditional distribution that
pertains
to the realization of y is assumed tobe:
where H is a known n
by n
matrix,
which here will be assumed to be theidentity
matrix,
andaj
(a
scalar)
is the unknown variance of the random residuals. We assume agenetic
model in which genes actadditively
within and betweenloci,
and that there iseffectively
an infinite number of loci. Under this infinitesimalmodel,
andassuming
further initialHardy-Weinberg
andlinkage equilibrium,
the distribution of additivegenetic
values conditional on the additivegenetic
covariance is multivariate normal:In
[6],
A is the known qby
q matrix
of additivegenetic
relationships
amonganimals,
andQ
a
(a
scalar)
is the unknown additivegenetic
variance in theconceptual
basepopulation,
before selection tookplace.
The vector b will be assumed to have a proper
prior
uniform distribution:p(b)
oc constant,To
complete
thedescription
of themodel,
theprior
distributions of the variancecomponents
need to bespecified.
In order tostudy
the effect of differentpriors
on inferences aboutheritability
and response toselection,
2 sets ofprior
distributions will be assumed.Firstly,
u2
and
aj will
be assumed to followindependent
properprior
uniform distributions of the form:p
(u?)
oc constant,where
a
2maX
andafl!!!
are the maximum valueswhich, according
toprior
know-ledge,
a!
and!2
can take. In the rest of the paper, all densities which are functions ofb,
Q
a,
andae
willimplicity
take the value zero if the bounds in7
and[8]
areexceeded.
Secondly,
Q
a
and<7!
will be assumed to follow apriori
scaled invertedchi-square
distributions:where vi and
Sf
areparameters
of the distribution. Note that a uniformprior
canbe obtained from
[9]
by
setting v
i
= -2 andS2
= 0.Posterior distribution of selection response
Let a
represent
the vector ofparameters
associated with the model.Bayes
theoremprovides
a means ofderiving
theposterior
distributions of a conditional on the data:The first term in the numerator of the
right-hand
side of[10]
is the conditionalden-sity
of the datagiven
theparameters,
and the second is theprior joint
distribution of theparameters
in the model. The denominator in[10]
is anormalizing
con-stant(marginal
distribution of thedata)
that does notdepend
on a.Applying
!10!,
The
joint posterior
under a modelassuming
the proper set ofprior
uniform distributions[8]
for the variancecomponents
issimply
obtainedby
setting v
i
= -2and
SZ
= 0(i
=e, a)
in!12!.
To make the notation lessburdensome,
wedrop
fromnow onwards the
conditioning
on v and S.Inferences about response to selection can be made
working
with a function of themarginal
posterior
distribution of a. The latter is obtainedintegrating
[12]
over theremaining
parameters:
where E =
(o, a 2,a e 2)
and theexpectation
is taken over thejoint
posterior
distribution of the vector of ’fixed effects’ and variancecomponents.
Thisdensity
cannot bewritten in a closed form. In finite
samples,
theposterior
distribution of a should be neither normal norsymmetric.
Response
to selection is defined as a linear function of the vector of additivegenetic
values:where K is an
appropriately
defined transformation matrix and R can be a vector(or
ascalar)
whose elements could be the mean additivegenetic
valuesof each
generation,
or contrasts between these means, oralternatively,
regression
coefficients
representing
linear andquadratic changes
ofgenetic
means withrespect
to some measure of
time,
such asgenerations. By
virtue of the central limittheorem,
theposterior
distribution of R should beapproximately
normal,
even if[13]
is not.Full conditional
posterior
distributions(the
Gibbssampler)
Using
results fromLindley
and Smith(1972),
the conditional distribution of 0given
all otherparameters
is multivariate normal:where
and 0
satisfies:which are the mixed-model
equations
of Henderson(1973).
and
using
standard multivariate normaltheory,
it can be shown for any suchpartition
that:where
0
1
isgiven by:
Expressions
[19]
and[20]
can becomputed
in matrix or scalar form. Letb
i
bea scalar
corresponding
to the ith element in the vector of ’fixedeffects’, b_
i
.
be b without its ithelement,
xi be the ith column vector inX,
andX_.L
be thatpart
of matrix X with xi excluded.Using
[19]
we find that the full conditional distributionof
b
i
given
all otherparameters
is normal.where
b
i
satisfies:For the ’random
effects’, again using
[19]
andletting
a-i be a without its ithelement,
we find that the full conditional distribution of the scalar aigiven
all theother
parameters
is also normal:where zi is the ith column of
Z,
iic =(Var(ai!a_,,))-1
Q
a
is the element in the ith row and column ofA-
1
,
anda
i
satisfies:The full conditional distribution of each of the variance
components
isreadily
obtained
by dividing
[12]
by
p(b,
a,!-ily),
whereE-
i
is E with!2
excluded.
Since this last distribution does notdepend
ona
2
,
this becomes(Wang
etal,
1994a):
which has the form of an inverted
chi-square distribution,
withparameters:
When the
prior
distribution for the variancecomponents
is assumed to beuniform,
the full conditional distributions have the same form as in[25],
except
that,
in[26]
andsubsequent formulae, v
i
= -2 andS2
= 0
(i
=e, a).
Generation of random
samples
frommarginal
posterior
distributionsusing
Gibbssampling
In this section we describe how random
samples
can begenerated indirectly
from thejoint
distribution[12]
by sampling
from the conditional distributions!21!, [23]
and!25!.
The Gibbssampler
works as follows:(i)
setarbitrary
initial values forb,
a andE;
(ii)
sample
from!21!,
andupdate bi,
i =1, ... ,
p;(iii)
sample
from[23]
andupdate
ai, i =1, ... , q;
(iv)
sample
from[25]
andupdate
aa;
(v)
sample
from[25]
andupdate
!e;
(vi)
repeat
(ii)
to(v)
k(length
ofchain)
times.As k - oo, this creates a Markov chain with an
equilibrium
distributionhaving
[12]
asdensity.
Ifalong
thesingle path k,
msamples
are extracted at intervals oflength
d,
thealgorithm
is called asingle-long-chain algorithm.
If,
on the otherhand,
m
independent
chains areimplemented,
each oflength k,
and the kth iteration issaved as a
sample,
then this is known as the short-chainalgorithm.
If the Gibbs
sampler
reached convergence, for the msamples
(b, a, E) 1, ... ,
(b, a, E)&dquo;,
we have:where - means distributed with
marginal
posterior
density
p(.ly).
In anyparticular
of the univariate
marginal
distributionsp(bi!y)
andp(a
il
y).
In order to estimatemarginal
posterior
densities of the variancecomponents,
in addition to the abovequantities
thefollowing
sums of squares must be stored for each of the msamples:
For estimation of selection response, the
quantities
to be storeddepend
on the structure of K in(14!.
Density
estimationAs indicated
previously,
amarginal density
can beestimated,
forexample, using
thenormal
density
kernel estimator[2]
with the m Gibbssamples
from the relevantmarginal distribution,
orby
averaging
over conditional densities(equation
[3]).
Density
estimationby
the former method isstraightforward, by
applying
(2!.
Here,
we outline
density
estimationusing
(3!.
The formulae for variance
components
and functions thereof weregiven by Wang
et al
(1993, 1994b).
For each of the 2 variancecomponents,
the conditionaldensity
estimators are:
and for the error variance
component:
The estimated values of the
density
are obtainedby fixing
Q
a
andQe
at a numberof
points
and thenevaluating
[27]
and[28]
at eachpoint
over the msamples.
Noticethat the realized values of
Q
a
andQ
e
obtained in the Gibbssampler
are not used to estimate themarginal
posterior
densities in[27]
and[28].
To estimate the
marginal
posterior density
ofheritability
h
2
=a£ /(a£ +!e),
thepoint
ofdeparture
is the full conditional distribution ofa£ .
This distribution has!e
as a
conditioning
variable,
and thereforeQ
e
is treated as a constant. Since the inversetransformation is
a!
=<T!/(1 -
h
2
),
and the Jacobian of the transformation fromQ
The estimation of the
marginal
posterior
density
of each additivegenetic
value follows the sameprinciples:
where,
for thejth
sample:
Equally
spaced points
in the effective range of ai arechosen,
and for eachpoint
an estimate of its
density
is obtainedby
inserting,
for each of the m Gibbssamples,
the realized values for
o,2,
b,
a-i, and k in[30]
and[31].
Thesequantities,
together
witha
i
,
need to be stored in order to obtain an estimate of themarginal
posterior
density
of the ith additivegenetic
value. This process isrepeated
for each of theequally
spaced points.
Estimation of the
marginal
posterior
density
of responsedepends
on the wayit is
expressed.
Ingeneral
R in[14]
can be a scalar(the
genetic
mean of agiven
generation,
or response per timeunit)
or it can be avector,
whose elements coulddescribe,
forexample,
linear andhigher
order terms ofchanges
of additivegenetic
values with time or unit of selection pressureapplied.
We first derive ageneral
expression
for estimation of themarginal
posterior
density
of R and then look at somespecial
cases to illustrate theprocedure.
Assume that R contains s elements and we wish to estimate
p(Riy)
as:In order to
implement
(32!,
the full conditional distribution on theright-hand
side is needed. This distribution is obtainedapplying
thetheory
of transformations top(a
il
The s additive
genetic
values in ai must be chosen so that the matrix of the transformation from ai to R isnon-singular.
We can then write[14]
as:so that we have
expressed
R in apart
which is a function of ai and another one which is not. The matrixK
i
isnon-singular
of order sby
s, and from[33],
the inverse transformation isSince the Jacobian of the transformation from ai to R is
det(K!
1
),
letting
Vi
=Var(ai!b, a_i, E, y),
andusing
standardtheory,
we obtain the result thatthe conditional
posterior
distribution of response is normal:where, using
[19]
and!20!,
it can be shown that:and
In
[35]
and[36],
C
i
is the sby
s block of the inverse of the additivegenetic
relationship
matrix whose rows and columnscorrespond
to the elements in ai, andCZ!_2
is the sby
(q —
s)
blockassociating
the elements of aiwith those of a-i. With[34]
available,
themarginal
posterior
distribution of R can be obtained from!32!.
As a
simple
illustration,
consider the estimation of themarginal
posterior
density
of total response toselection,
defined as the difference in average additivegenetic
values between the last
generation
( f )
and the first one:where afj and a1j are additive
genetic
values of individuals in the final and firstgenerations,
and nand
n1 are the number of additivegenetic
values in the finaland first
generation, respectively.
We willarbitrarily
choose af1 and carry out a linear transformation fromp(afllb, a- 11,!, y) to p(Rlb, a- 11,!, y).
We write[37]
in the form of!33):
and the
marginal
posterior
density
is estimated as follows:As a second illustration we consider the case where response is
expressed
as the linearregression
(through
theorigin)
of additivegenetic
values on time units. ThusR is a scalar. We assume as before that there are
f
timeunits,
and the responseexpressed
in the form of[33]
and[38]
is:is their number. A little
manipulation
shows that:EXAMPLES
To
illustrate,
themethodology
was used toanalyse
2 simulated data sets.Genotypes
were
sampled using
a Gaussian additivegenetic
model. Thephenotypic
variance wasalways
10,
and basepopulation
heritability
was 0.2 and 0.5 in data sets 1(30
fixed effects in thecomplete
dataset,
but these values are of noimportance
here),
an additivegenetic
effect and a residual term. Both additivegenetic
and residual effects were assumed to follow normal distributions with null means andvariances based on the 2
heritability
levels. In thegeneration
of Mendeliansampling
effects,
parental inbreeding
was taken into account.Five
cycles
ofsingle
trait selection based on a BLUP animal model werepractised.
Each
generation
resulted from themating
of 8 males and 20females,
and each femaleproduced
3offspring
with records and 2 additional femaleoffspring
without records which were also considered as femalereplacements.
Males were selectedacross
generations
(the
best 8 eachcycle)
andfemales,
which bredonly
once,were selected from the
highest
scoring predicted
breeding
values available eachgeneration.
The total number of animals at the end of the program was968,
ofwhich 720 had records. The total number of sires and dams in each data set was 42
and 241. Details of the simulation
including
adescription
of thegeneration
of fixedeffects and additive
genetic
values can be found in Sorensen(1988).
Two
analyses
were carried out for each data set: ananalysis
based on allrecords;
and one
using
data fromgenerations
0-2only.
The rationale for this is that it isnot uncommon in the literature to find
reports
of selectionexperiments
in progress.Hence,
we can illustrate in this manner how inferences are modified as additionalinformation is collected.
For each
analysis,
marginal
posterior
densities for the additivegenetic
variancea ,
residual variance(
Q
e),
phenotypic
variance(aP)
andheritability
(h
2
)
were estimated. Inaddition,
marginal
posterior
densities of 2expressions
for selectionresponse were estimated. These were:
1)
difference between final and initial averageadditive
genetic
values or total response(TR);
and2)
linearregression
through
the
origin
of the linearregression
of additivegenetic
values ongeneration.
Unlessotherwise
stated,
theanalyses reported
below wereperformed assuming
thatvariance
components
follow apriori
independent
uniform distributions as shown in(8!.
The upper bounds of the additive and environmental variances(a
2
2
were
arbitrarily
chosen to be 10 and20,
respectively.
The Gibbs
sampler
wasimplemented
using
asingle
chain of totallength
1 205 000,
with a warm-up(initial
iterationsdiscarded)
oflength
5 000,
and asub-sampling
interval betweensamples
of d = 10.Thus,
a total of m = 120 000samples
were saved. Calculations were
programmed
in Fortran in doubleprecision.
Random numbers weregenerated
using
IMSL subroutines(IMSL
Inc,
1989).
Plots of the estimated
marginal
posterior
densities ofQ
a, Q
e,
h
2
,
TR and 6 were obtained with both the average of conditionals and with the normal kerneldensity
estimator;
those fora!, t50
and6
1
weregenerated
using
the normal kerneldensity
estimatoronly,
for technical reasons. The window used for the normalkernel estimator was the range of the
effective
domain of theparameter,
dividedby
75. The effective domain covered99.9%
of thedensity
mass. Each of theplots
was
generated by dividing
the effective domain of that variable into 100evenly
spaced
intervals.Summary
statistics ofdistributions,
such as the mean, medianand variance were
computed
by
Simpson’s integration rules, by
furtherdividing
the effective domain into 1 000
evenly
spaced
intervalsusing
cubic-spline techniques
(IMSL
Inc,
1989).
Modes were locatedthrough grid
search. Forillustration
and notwas carried out.
First,
REML estimates of variancecomponents
were obtained.These REML estimates were then used in lieu of the true values in Henderson’s mixed-model
equations
to obtainestimates
of additivegenetic
values.Finally,
estimatedgenetic
responses to selection werecomputed
based onappropriate
averages of the estimated additivegenetic
values. Results of this classicalanalysis
are
reported
later under theheading
ST infigures
5-8. The estimatedmarginal
posterior
densities of the variancecomponents
and ofheritability
are shown infigures
1 and 2 for theanalysis
based onpartial
(PART)
and whole(WHOLE)
data,
respectively,
when basepopulation
heritability
was 0.2.Corresponding
plots
For the data simulated with
h
2
=0.2,
theposterior
distributions reflectedconsiderable
uncertainty
about the values of theparameters.
Asexpected,
the variances of theposterior
distributions were smaller in WHOLE(fig 2)
than inPART
(fig 1).
Inparticular,
the PARTanalysis
ofU2
and h2
showed
highly
skewed
distributions;
the mean, the mode and the median differed from eachother. The REML estimates in the PART
analysis
for the additivegenetic
variance andheritability
were very close to zero.Asymptotic
confidence intervals were notcomputed,
butthey
wouldprobably
include aregion
outside theparameter
space.The
Bayesian analysis
indicated with considerableprobability
that bothparameters
was
reduced,
butpoint
estimates in theBayesian analysis
ofgenetic
variance and ofheritability
were different from those in the REML estimates. Forexample,
the
posterior
mean, mode and median ofheritability
were0.13,
0.08 and0.12,
respectively,
incomparison
with the REML estimate of 0.07. The lack ofsymmetry
of theposterior
distribution ofQ
a
and h2
clearly
suggests
that the simulated selectionexperiment
does not have ahigh degree
of resolutionconcerning
inferencesabout
genetic
parameters.
Thispoint
would not be asforcefully
illustrated in a standard REMLanalysis,
where,
atbest,
one wouldget
joint approximate
In the case of
h
2
= 0.5(fig
3 and4),
it should be noted that the distributionswere less skewed than for
h
2
=0.2,
although
still skewed for PART(fig 3).
Boththe
Bayesian
and the REMLanalyses suggested
a moderate tohigh heritability,
but the fact that theposterior
distribution ofheritability
for PART(fig
3)
wasvery skewed indicates that more data should be collected for accurate inferences about
heritability.
The WHOLEanalysis
(fig 4)
yielded nearly
symmetric posterior
distributions ofheritability
with smallervariance,
centered at 0.55(the
REML estimate was0.543).
It should be noted that the residual variance waspoorly
estimated in PART(fig 3),
but theBayesian analysis clearly depicted
the extent ofuncertainty
about thisparameter.
This also illustrates the fact that a variancecontrary
to the common belief in animalbreeding
that residual variance isalways
well estimated.
The assessment of response to selection for the
population
simulated withh
2
= 0.2 isgiven
infigures
5 and 6 for the PART and WHOLE dataanalyses,
respectively.
The true response in the simulated data at agiven generation, computed
as the average of truegenetic
values within thegeneraton
inquestion,
was TR = 0.643 units and TR = 1.783units,
for PART and WHOLE. Theregressions
of true response ongeneration
were 0.321 and0.343,
for PART andWHOLE,
respectively.
The
hypothesis
of no response to selection cannot berejected
in the PARTanalysis
(fig 5).
The classicalapproach
gave an estimated response of zero.However,
theBayesian analysis
documents the extent ofuncertainty,
andassigns
considerableposterior
probability
to theproposition
that selection may beeffective;
theposterior
probability
of response(eg,
Ql > 0 or TR =t
2
-
to
>0)
was muchlarger
than that of thecomplement.
Thestrength
of additional evidencebrought
up in the wholeanalysis
(fig 6)
supports
thehypothesis
that selection was effective. Forexample,
the difference ingenetic
mean betweengeneration
0 and 5 was centeredaround
1.852,
though
the variance of theposterior
distribution was aslarge
as 1.075. The classicalanalysis
gave an estimated value of ST = 1.296units,
but no exact measure of variation can be associated to this estimate.Approximate
results could beobtained,
forexample,
using
Monte-Carlosimulations,
but this was notattempted
in thepresent
work.In the simulation with
h
2
=0.5,
the true response in the simulateddata,
computed
as the average of truegenetic
values from the relevantgenerations,
was TR = 3.258 units and TR = 7.148 units for the PART and WHOLE data
sets,
respectively.
The associatedregressions
ongeneration
were 1.629 and1.436,
respectively.
TheBayesian
and classicalanalyses
indicated response to selectionbeyond
reasonable doubt(fig
7 and8).
Note, however,
that theposterior
distribution of TR =t
2
-
to
in PART was skewed(fig 7).
Infigure
8,
all theposterior
distributions werenearly
symmetric,
and the classical andBayesian
analyses
wereessentially
inagreement.
Inspite
of thissymmetry,
considerableuncertainty
remained about the true rate of response to selection. Forexample,
values of
t
5
-
to
ranging
from 3 to 11 units haveappreciable
posterior
density
(fig 8).
In thelight
ofthis,
results from a similarexperiment
where realizedgenetic
change
is,
say,50%
higher
or lower than ourposterior
mean of about 6.7units,
could not be
interpreted
asyielding conflicting
evidence.In order to
study
the influence of differentpriors
on inferences aboutheritability
and response to
selection,
part
of the aboveanalyses
using
the samedata,
wasrepeated assuming
that variancecomponents
followed,
apriori,
the inversechi-square distributions
(9!.
The’degree
of belief’parameters v
i
and the apriori
valueof the variance
components
S
2
,
were assessedby
the method of moments froman
independently
simulated data set with basepopulation heritability
of0.5,
andconsisting
of 3cycles
of selection. An REML estimate ofheritability
from this datawas 0.46 with an
asymptotic
variance of 2.94 x10-
2
.
The method of momentsas described in
Wang
et al(1994b),
produced
thefollowing
estimates: va =11;
ve =
464;
Sa
= 4.50;
Se
= 5.19. Areanalysis
of the PART and WHOLEreplicate
(heritability = 0.5)
yielded
the results shown in table I. Thefigures
in the tablethe
input
contributedby
theprior.
Thus in the PARTdataset,
the uniformprior
and the informativeprior
(which
implies
aprior
value ofheritability
of0.46)
lead to clear differences in theexpected
value of themarginal
posterior
distribution ofheritability
and selection response.However,
in the WHOLE dataset,
inferencesusing
eitherprior
are very similar. Thefigures
also illustrate how the variance ofmarginal
posterior
distributions is reduced when an informativeprior
is used. Theexamples
indicate the power of theBayesian
analysis
to revealuncertainty
appropriate parameters
is small(eg,
PARTanalysis
ath
2
=0.2).
Otherplots,
such asmarginal
posterior
distribution ofgeneration
means, forexample,
arepossible
and could illustrate the evolution of the drift variance as theexperiment
progresses.DISCUSSION
We have
presented
a way ofanalysing
response to selection inexperimental
oris the construction of a
marginal
posterior
distribution of a measure of selectionresponse
which,
in thisstudy,
was defined as a linear function of additivegenetic
values. Themarginal
distribution can be viewed as aweighted
average of aninfinite number of conditional
distributions,
where theweighting
function is themarginal
posterior
distribution of variances and otherparameters,
which are notnecessarily
of interest. Theapproach
reliesheavily
on the result(Gianola
andFernando,
1986)
that if the data on which selection was based are included in thethat the
joint posterior
or anymarginal
posterior
distribution is the same with or without selection. This is aparticular
case of what has been known as’ignorable
selection’(Little
andRubin, 1987;
Im etal,
1989)
and itimplies
that selectioncan be
ignored
if the information of any data-based selection is contained in theBayesian
model.Furthermore,
inferencespertain
toparameters,
eg,heritability,
of the basepopulation.
As shown with the simulated
data,
themarginal
posterior
distributions ofvari-ances and functions thereof are often not
symmetric.
This fact is taken into accountwhen
computing
themarginal
posterior
distributions of measures of selectionre-sponse. In this way, the
uncertainty
associated with all nuisanceparameters
in the model is taken into account whendrawing
inferences about response to selection. This is in marked contrast with the estimation of response that is obtainedusing
BLUP with an animalmodel,
which assumes the variance ratioknown,
andgives
100%
weight
to an estimate of this ratio.The
analysis
of the simulated data used uniform distributions for the fixed effects and variancecomponents.
The choice ofappropriate prior
distributionsis a contentious issue in
Bayesian inference,
especially
when thesepriors
aresupposed
to convey vague initialknowledge. Injudicious
choice of noninformativeprior
distributions can lead toimproper
posterior distributions,
and this may not beeasy to
recognize,
especially
when theanalysis
is based on numericalmethods,
as inthe
present
work. Thesubject
is still a matter of debate and animportant
concept
is that of reference
priors
(Bernardo,
1979;
Berger
andBernardo,
1992),
which have theproperty
thatthey
contribute with minimum information to theposterior
distribution while the informationarising
from the likelihood is maximized. In thesingle
parameter
case, referencepriors
oftenyield
theJeffreys
priors
(Jeffreys, 1961).
The uniformprior
distributions we have chosen for the fixed effects and the variancecomponents represent
a state of littleprior
knowlege,
but areclearly
not invariantunder transformations and must be viewed as
simple
ad iaocapproximations
to theappropriate
referenceprior
distributions. We haveillustrated,
however,
that when the amount of information contained in the data isadequate
inferences are affectedlittle
by
the choice ofpriors.
It is notgenerally
asimple
matter to decide whenone is in a
setting
ofadequate information,
and it may be thereforerevealing
to carry outanalyses
with differentpriors
tostudy
how inferences are affected(Berger,
1985).
If use of differentpriors
leads to very differentresults,
this indicates that the information in the likelihood is weak and more dataought
to be collected in order to draw firmer conclusions.The richness of the information that can be extracted
using
theBayesian
approach
is at the cost ofcomputational
demands. The demand is in terms ofcomputer
time rather than in terms ofprogramming complexity. However,
animportant
limitation of iterative simulation methods is thatdrawing
from theappropriate posterior
density
takesplace only
in thelimit,
as the number of draws becomes infinite. It is not easy to check for convergence to the correctdistribution,
and there is littledeveloped theory indicating
howlong
the Gibbs chain should be. The rate of convergence can beexceedingly
slow,
especially
when random variablesare