HAL Id: hal-00894199
https://hal.archives-ouvertes.fr/hal-00894199
Submitted on 1 Jan 1998
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Bayesian estimation of dispersion parameters with a
reduced animal model including polygenic and QTL
effects
Marco C.A.M. Bink, Richard L. Quaas, Johan A.M. van Arendonk
To cite this version:
Original
article
Bayesian
estimation of
dispersion
parameters
with
a
reduced
animal
model
including polygenic
and
QTL
effects
Marco
C.A.M.
Bink
Richard
L.
Quaas
Johan A.M. Van Arendonk
a
Animal
Breeding
and GeneticsGroup, Wageningen
Institute of AnimalSciences,
Wageningen Agricultural University,
PO Box338,
6700 AHWageningen,
the Netherlandsb
Department
of Animal Science CornellUniversity, Ithaca,
NY14853,
USA(Received
21April
1997;accepted
29 December1997)
Abstract - In animal
breeding,
Markov chain Monte Carloalgorithms
areincreasingly
used to draw statistical inferences about
marginal
posterior distributions of parameters ingenetic
models. The Gibbssampling algorithm
is mostcommonly
used andrequires
full conditional densities to be of a standard form. In thisstudy,
we describe aBayesian
method for the statistical
mapping
ofquantitative
trait loci((aTL),
where theapplication
of a reduced animal model leads to non-standard densities fordispersion
parameters. TheMetropolis Hastings algorithm
is used to obtainsamples
from these non-standard densities. Theflexibility
of theMetropolis Hastings algorithm
also allows uschange
theparameterization
of thegenetic
model.Alternatively
to the usual variance components,we use one variance component
(= residual)
and two ratios of variance components, i.e.heritability
andproportion
ofgenetic
variance due to the(aTL,
toparameterize
thegenetic
model. Prior
knowledge
on ratios can moreeasily
beimplemented, partly by
absence ofscale effects. Three sets of simulated data are used to
study performance
of the reduced animalmodel,
parameterization of thegenetic model,
andtesting
the presence of theQTL
at a fixed
position. ©
Inra/Elsevier,
Parisreduced animal model / dispersion parameters
/
Markov chain Monte Carlo/
quantitative trait loci
*
Correspondence
andreprints
E-mail:marco.bink@alg.vf.wau.nl
Résumé - Estimation Bayésienne des paramètres de dispersion dans un modèle animal réduit comprenant un effet polygénique et l’effet d’un QTL. En
génétique
animale,
lesalgorithmes
de Monte-Carlo par chaînes de Markov sont utilisés deplus
endu modèle
génétique. L’algorithme d’échantillonnage
de Gibbs est utilisélargement
etdemande la connaissance des densités
conditionnelles,
dans une forme standard. Danscette
étude,
on décrit une méthodeBayésienne
pour lacartographie statistique
d’un locusà effet
quantitatif
((aTL),
oùl’application
d’un modèle animal réduit conduit à des densités deparamètres
dedispersion, qui
n’ont pas de forme standard. On utilisel’algorithme
deMetropolis-Hastings
pourl’échantillonnage
de ces densités non standard. Lasouplesse
de
l’algorithme
deMetropolis-Hastings
permetégalement
dechanger
laparamétrisation
du modèlegénétique :
au lieu des composantes de varianceshabituelles,
on peut utiliserune composante de variance
(résiduelle)
et deux rapports de composantes de variance : l’héritabilité et laproportion
de la variancegénétique
dûe auQTL.
Il estplus
facile despécifier
l’information apriori
sur desproportions,
enpartie
parcequ’elle
nedépend
pasde l’échelle. Trois fichiers de données simulées sont utilisés pour étudier la
performance
dumodèle animal
réduit,
par rapport au modèle animal strict, l’effet deparamétrisation
du modèlegénétique
et laqualité
du test de laprésence
d’unQTL
à une position donnée.©
Inra/Elsevier,
Parismodèle animal réduit
/
paramètres de dispersion/
méthode de Monte-Carlo par chaînes de Markov/
locus quantitatif1. INTRODUCTION
The wide
availability
ofhigh-speed
computing
and the advent of methods basedon Monte Carlo
simulation,
particularly
thoseusing
Markov chainalgorithms,
haveopened powerful pathways
to tacklecomplicated
tasks in(Bayesian)
statistics[9,
10].
Markov chain Monte Carlo(MCMC)
methodsprovide
means forobtaining
marginal
distributions from acomplex
non-standardjoint density
of all unknownparameters
(which
is not feasibleanalytically).
There are avariety
oftechniques
forimplementation
[9]
of which Gibbssampling
[11]
is mostcommonly
used in animalbreeding.
Theapplications
include univariatemodels,
thresholdmodels,
multi-traitanalysis,
segregation
analysis
andQTL mapping
[15,
17, 29,
31,
33].
Because Gibbs
sampling requires
directsampling
from full conditionaldistribu-tions,
dataaugmentation
[22]
is often used so that ’standard’sampling
densities areobtained.
Often,
however,
this is at the expense of a substantial increase innum-ber of
parameters
to besampled.
Forexample,
the full conditionaldensity
for agenetic
variancecomponent
becomes standard(inverted
gammadistribution)
whena
genetic
effect issampled
for each animal in thepedigree,
as in a(full)
animalmodel
(FAM).
Thedimensionality
increases even morerapidly
when the FAM isapplied
to theanalysis
ofgranddaughter designs
[34]
inQTL mapping experiments,
i.e. markergenotypes
ongranddaughters
are not known and need to besampled
aswell. In
addition,
absence of marker datahampers
accurate estimation ofgenetic
effects withingranddaughters,
which form themajority
in agranddaughter design.
This
might
lead to very slowmixing properties
of thedispersion
parameters
(see
also Sorensen et al.
!21!).
The reduced animal model
(RAM,
Quaas
andPollak,
[19])
isequivalent
to theFAM,
but cangreatly
reduce thedimensionality
of aproblem
by eliminating
effects[14, 18!,
however,
a standarddensity
is notrequired,
infact,
thesampling
density
needs to be knownonly
up toproportionality.
Another alternative for the FAM is theapplication
of a sire model whichimplies
thatonly
sires are evaluated based on progeny records. With a siremodel,
thegenetic
merit of the dam of progeny isnot accounted for and
only
thephenotypic
information onoffspring
is used. The RAM offers theopportunity
to include maternalrelationships, offspring
with known markergenotypes
and information ongrandoffspring.
As a result the RAM is bettersuited for the
analysis
of data with acomplex pedigree
structure.The
flexibility
of the MHalgorithm
also allows for agreater
choice of theparam-eterization
(variance
components
or ratiosthereof)
of thegenetic
model. If Gibbssampling
is to beemployed,
theparameterization
is often dictatedby
mathemat-icaltractability
to obtain thesimple sampling density.
The MHalgorithm readily
admits muchflexibility
inmodelling
prior
beliefregarding
dispersion
parameters,
which is an
advantageous
property
inBayesian
analysis
!16!.
In this paper, we
present
MCMCalgorithms
that allowBayesian
linkage analysis
with a RAM. We
study
two alternativeparameterizations
of thegenetic
model anduse a test statistic to
postulate
presence of aQTL
at a fixedposition
relativeto an informative marker bracket. Three sets of simulation data
using
atypical
granddaughter design
are used.2. METHOD
2.1. Genetic model
The additive
genetic
variance(o,2)
underlying
aquantitative
trait is assumed to be due to twoindependent
randomeffects,
due to aputative
QTL
and residualindependent polygenes.
TheQTL
effects(v)
are assumed to have aN(0,
GO,2)
prior
distribution where G is thegametic
relationship
matrix[2, 8],
andui
is
the variance due to asingle
allelic effect at theQTL.
Matrix Gdepends
upon oneunknown
parameter,
the mapposition
of theQTL
relative to the(known)
positions
ofbracketing
(informative)
markers. Here we consider the location of theQTL
to be known. Thepolygenic
effects(u)
have aN(0,
Au u 2)
prior
distribution,
where Ais the numerator
relationship
matrix. Thegenetic
modelunderlying
thephenotype
of an animal iswhere b is the vector with fixed
effects,
vi
andv?
are the two(allelic)
QTL
effects foranimal i,
and ei !N(0,
lo,2). e
(QTL
effects within individual areassigned
according
to markeralleles,
asproposed by Wang
et al.[32]).
The sum of the threegenetic
effects is the animal’s
breeding
value(a).
In addition togenetic effects,
locationparameters
comprise
fixed effects that are, apriori,
assumed to follow the proper uniform distribution:f (b) -
U[b
n
,;
n
,
bmax! !
wheren
b
i
m
andb
max
are the minimumand maximum values for elements in b. 2.2. Reduced animal model
(RAM)
with neither descendants nor marker
genotypes,
i.e.ungenotyped
non-parents.
Thephenotypic
information on these animals caneasily
be absorbed into theirparents
without loss of information.
Absorption
ofnon-parents
that have markergenotypes
becomes more
complex
whenposition
ofQTL
isunknown;
it is therefore better to include themexplicitly
in theanalysis.
In the remainder of the paper, it is assumed that markergenotypes
onnon-parents
are not available. Thegenetic
effects ofnon-parents
can beexpressed
as linear functions of theparental genetic
effectsby
thefollowing
equations
[4],
and
where each row in P contains at most two non-zero elements
(= 0.5),
and each row inQ
has at most four non-zero elements[32],
the terms wnon-parcnts and§non-parents
pertain
toremaining genetic
variance due to Mendeliansegregation
of alleles. In agranddaughter design,
the P andQ
forgranddaughters,
nothaving
markergenotypes
observed noraugmented,
have similarstructures,
where Q9 denotes the Kronecker
product,
and J is aunity
matrix[20].
Thisequality
does not hold if markergenotypes
areaugmented,
sincephenotypes
containinformation that can alter the marker
genotype
probabilities
forungenotyped
non-parents
[2].
The
phenotypes
for aquantitative
trait can now beexpressed
as,for row vectors
P
i
andQ
i
(possibly null),
andwhere u)
i
reflects the amount of total additivegenetic
variance that ispresent
inE
.
2
Based on thepedigree,
fourcategories
of animals aredistinguished
in theR
A
M
(table 1).
The vectorsP
i
andQ
i
containpartial regression
coefficients. Forparents,
theonly
non-zero coefficientspertain
to the individual’s owngenetic
effects(ones);
fornon-parents,
the individual’sparents’ genetic
effects(halves).
Note thatP
i
andGZ
i
are null for anon-parent
with unknownparents,
and thatnon-parents’ phenotypes
in thiscategory
contribute to the estimation of fixed effects andphenotypic
(residual)
varianceonly.
2.3. Parameterization
Let B denote the set of location
parameters
(b,
u andv)
anddispersion
We consider the
following
twoparameterizations
for thedispersion
parameters
where
and
In the
first,
0
v
c ,
theparameters
are the variancecomponents
(VC).
This is theusual
parameterization.
Adifficulty
with this is that it isproblematic
for an animalbreeder to elicit a reasonable
prior
of thegenetic
VC. Animalbreeders,
it seemsto us, are much more
likely
tohave,
and be able tostate,
prior opinions
about suchthings
as heritabilities.Consequently,
inO
RT
,
parameter h
2
is theheritability
of atrait,
andparameter &dquo;(
is theproportion
of additivegenetic
variance due to theputative QTL.
Thisparameterization
allows more flexiblemodelling
ofprior
knowledge
becauseh
2
and -y do notdepend
on scale. Theobald et al.[23]
useda variance
ratio,
a u/Ue 2 2,
parameterization
but noted that the animal breeder mayprefer
to think in terms ofheritability.
Weprefer
thepart-whole
ratiosh
2
and
y.The
components
or2
and
<7! can
beexpressed
in terms ofQ
e,
h
2
2 and
and
2.4. Priors
used to describe
uncertainty
on VC. The inverted gamma(IG)
distribution,
or itsspecial
case the invertedchi-square distribution,
is common because it is often theconjugate prior
for the VC if the FAM(or
siremodel)
isapplied.
Hence,
the full conditional distribution for VC will then be aposterior
updating
of a standardprior
!9!.
Thissimplifies
Gibbssampling.
We will use the IG as theprior
for0
’; - though
with a RAM it is not
conjugate,
where x =
e, u, or v. The rhs of
(10)
constitutes the kernel of thedistribu-tion. The mean
(p)
of an3)
(
IG(o:,
is((a -
1)(3)
-1
,
and the varianceequals
((a- 1)!-2)/!)’B
Van Tassell et al.[29]
suggest
setting
a = 2.000001 and/3
!(!)-1
for an ’almost flat’prior
with a meancorresponding
toprior expectation
(p,).
The IG distributions for three differentprior expectations
aregiven
infigure
1. When theprior expectation
is close to zero(p,
=5.0),
the distribution is morepeaked
and has less variance because mass accumulates near zero. When theprior
expectation
isrelatively high
(p,
=60),
theprobability
ofor2
being
equal
to zerois very
small,
whichmight
be undesirableand/or
unrealistic forui.
An alternativeprior
distribution foror2
is
which is a proper
prior
forufl
with a uniformdensity
over apre-defined large,
finiteinterval,
forexample
from zero to 200(figure 1).
Theseprior
distributions for VCare used
mainly
torepresent
prior uncertainty
!21,
30,
31!.
Corresponding
to(10) (11)
there is anequivalent
prior
distribution forA(!y).
However,
because neither(10)
nor(11)
were chosen for any intrinsic’rightness’
weprefer
asimpler
alternative ofusing
Beta distributions for the ratioparameters
A
and -y
torepresent
prior
knowledge,
where x =
h
2
or-y. When
prior
distributionparameters
ax and/
3
x
are bothset
equal
to1,
theprior
is a uniformdensity
between 0 and 1(figure 2),
i.e.flat
prior.
Alternatively,
ax and!3!
can bespecified
torepresent
prior
expecta-tions for
parameters
of interest(figure 2).
Forexample,
one can centre theden-sity
forheritability
of ayield
trait indairy
cattle around theprior expectation
(= 0.40),
with arelatively
flat(Beta
(2.5, 3.75))
orpeaked
(Beta
(30.0, 45.0))
distribution when
prior certainty
is moderate orstrong,
respectively. Furthermore,
2.5. Joint
posterior
density
animals of
category
i(table 1),
the total number of observationsbeing given
asN,
and let q denote the number animals with
offspring,
i.e.parents.
Then, 2q
is the number ofQTL
effects(two
allelic effects peranimal).
WithO
v
c,
Under
B
RT
,
dispersion
parameters,
andpriors thereof,
are different from0
vc
the2.6. Full conditional densities
From the
joint posterior
densities(13)
and(14),
the full conditionaldensity
for each element in B can be derivedby
treating
all other elements in 0 as constants andselecting
the termsinvolving
theparameter
of interest. When this leads to the kernel of a standarddensity,
e.g. Normal for locationparameters
or an IGdistribution,
e.g. variance
components
withFAM,
Gibbssampling
isapplied
to drawsamples
for that element in 0.
Otherwise,
the full conditionaldensity
is non-standard andsampling
needs to be doneby
othertechniques.
(All
full conditional densities aregiven
in theAppendix).
2.7.
Sampling
non-standard densitiesby Metropolis-Hastings
algorithm
Sampling
a non-standarddensity
can be carried out avariety
of ways,including
various
rejection
sampling techniques
[6,
7, 12,
13!,
andMetropolis-Hastings
sam-pling
within Gibbssampling
!6!.
We use theMetropolis-Hastings algorithm
(MH).
Let
!r(x)
denote thetarget
density,
the non-standarddensity
of aparticular
element in0,
and letq(x, y)
be the candidategenerating
density.
Then,
theprobability
ofmove from current value x to candidate value y
for O
i
is,
When y is not
accepted,
the value for0
z
remainsequal
to x, at least until the nextupdate
for0z .
Chib andGreenberg
[6]
described several candidategenerating
densities for MH. We use the random walk
approach
in whichcandidate y
is drawnfrom a distribution centred around the current value x. To ensure that all
sampled
parameters
are within theparameter
space thesampling
distribution,
q(x, y),
wasU(B
L
,
B
u
)
withwhere t is a
positive
constant determinedempirically
for eachparameter
togive
acceptance
rates between 25 and 50%
[6, 24].
For each of the non-standarddensities,
a univariate MH was used. We
perform
univariate MH iterations(ten
times)
within a MCMCcycle
to enhancemixing
in the MCMCchain,
assuggested
by
Uimari et al.[26].
2.8.
Comparison
to a full animal model(FAM)
From the conditional densities
presented,
twohybrid
MCMC chains can be usedto obtain
samples
of all unknownparameters
(O
vc
orB
RT
)
using
a RAM. Forand
B
RT
).
The conditional densities for the FAM are aspecial
case of RAM(see
table
I ):
all animals are incategory
4 and wz = 0. In case ofO
vc
the conditionaldensities for
o, 2, o
l
2,
and<7!
are nowrecognizable
IG distributions and Gibbssampling
can be used to drawsamples
from these densitiesdirectly.
In the caseof
O
RT
the conditional densities forh
2
and
q remain non-standard and MH is used to drawsamples.
Table IIgives
the four constructed MCMCsampling
schemes.2.9. Post MCMC
analysis
Depending
on thedispersion parameterization
(O
vc
orq
RT
),
three of fiveparameters
weresampled
(table II).
In each MCMCcycle, however,
theremaining
two were
computed, using
(6)
and(7)
or(8)
and(9),
to allowcomparison
of results of differentparameterizations.
Forparameter
X,
the auto-correlation of a sequencem-l
of
samples
was calculatedas
-
1
:
[(a! — /!)(.E,+i — j
i
z )]
/s!
where m = numberm i= l
’
x
of
samples, ji
z
andA
x
areposterior
mean and standarddeviation,
respectively.
The correlation among
samples
forparameters
x and z, within MCMCcycles,
wasI rn
computed
as - E
[(x
i
-
)]
[Lz
-
Zi
)(
Lc
[
1[(!xg,].
For eachparameter
an effective mz= i i
sample
size(ESS)
wascomputed
which estimates the number ofindependent
samples
with information contentequal
to that of thedependent
samples
!21!.
The null
hypothesis that -y
= 0 - theQTL explains
nogenetic
variance - was.. mode
{p(r)}
}
.tested via an odds ratio
p(! - 0) >
20following
Janss et al.(17!.
They
suggest
P(-! = 0)
3. SIMULATION
In this
study, granddaughter designs
weregenerated by
Monte Carlo simulation.The unrelated
grandsire
families each contained 40 sires that were half sibs. Thenumber of families was 20
except
in simulation III wheredesigns
with 50 familieswere simulated as well
(table
IIB.
Polygenic
andQTL
effects forgrandsires
weresampled
fromN(0,
Q
u)
andN(0,
er!),
respectively.
Thepolygenic
effect for sireswas simulated as
US
=!(UGs) 2
+4lz ,
where UGS is thegrandsire’s polygenic
effect,
and*!t,
Mendeliansampling,
is distributedindependently
asN(0, Var (4l
j
))
withVar(4lz)
= 0.75 xQ
u
(no inbreeding).
Each sire inherited oneQTL
at random from its(grand)
sire. Thematernally
inheritedQTL
effect for a sire was drawnfrom
N(0,
er!).
Each sire had 100daughters
withphenotypes
observed,
that weregenerated
aswhere p is a
0/1
variable. In all simulations thephenotypic
variance and theheritability
of the trait were 100 and0.40,
respectively.
Theproportion
ofgenetic
variance due to theQTL
(= 1’
)
wasby
default0.25,
or 0.10 in simulation III(table
777).
Twogenetic
markersbracketing
theQTL
position
at lOcM(Haldane
mapping
function)
were simulated with five alleles at eachmarker,
withequal
frequencies
over alleles per marker. Forgrandsires,
the markergenotypes
werefully
informative,
i.e.heterozygous,
and thelinkage phase
between marker alleles is assumed to be known apriori.
Theuncertainty
onlinkage phase
in sires can4. RESULTS AND DISCUSSION
4.1. Simulation I:
comparison
RAM versus FAMFor each of the four MCMC
algorithms
that aregiven
in tableII,
asingle
MCMC chain was run and 2 000 thinned
samples
were used forpost-MCMC analysis
(table III).
In the case of0
v
c , prior
distributions forQ
e,
Q
u
and
a
2
were
’flat’ IGs(figure 1)
withexpected
meansequal
to60,
30 and 5(values
used forsimulation),
respectively.
In the case ofB
RT
,
theprior
fora
was
again
an IG andpriors
forh
2
2and -y
were Beta(2.5, 3.75)
and Beta(0.9, 2.7),
respectively. Figure
3 presents
themixing properties
forparameter
ui
within
the chains for theRAM-0
vc
andRAM-0vc
alternatives andpoints
to slowermixing
whenusing
the FAM. This slowmixing
is also indicatedby high
auto-correlation(! 1)
amongsamples
forparameters
ui
2and
when
the FAM was used(table IV).
With the samethinning,
the auto-correlation amongsamples
in the RAM isx
0.70. The estimates forposterior
mean and coefficient of
variation,
derived fromsamples
of the fourchains,
aregiven
in table V. These estimates are very similar over models(RAM
andFAM)
and
parameterizations
(0
v
c
andB
RT
).
The coefficients of variation forui and
’Yare
relatively large
and indicate that aposteriori
knowledge
on theseparameters
remains
small,
while estimates foroe and h2
are
accurate. Themagnitude
of thesampling
correlation amongparameters
within MCMCcycles
was very similar for both models andparameterizations.
Thesamples
foro,
and
ufl
showed amoderately
high
negative
correlation(-0.7),
while thesampling
correlation betweenh
2
and
7
wasrelatively
low andpositive
(0.2).
The correlation amongsamples
foru
£
and
h
2
was veryhigh
butapparently
did notadversely
affect the auto-correlation of theseparameters.
Taking
100 ESS as a minimum[26]
the MCMC chain was rather shortfor statistical inferences
for -y
inFAM-B
RT
.
However, running
alonger
MCMC chain was notpractical
since theFAM-0
vc
MCMC chain needed 68 593 min CPU(47
days)
on a HP 9000-735(125Hz)
workstation. This was almost 100 times the 11 h that were needed to run the RAM with similar chainlength.
The slow
mixing
ofparameters
for a FAM waslikely
to be due to the lack of marker data ongranddaughters.
Distinction betweenpolygenic
andQTL
effectswithin these animals is
hardly possible. Consequently, they provide
little informa-tionregarding dispersion
but becausethey
are so numerousthey
dominate thedistribution from which the next
sample
for thedispersion
parameter
is drawn.Heuristically,
one firstgenerates
u and v with variancesreflecting
currenta
2
.
Sub-sequently
onesamples
a newo
2
from
apeaked
distribution with a mean near thesample
variance of the u and v. Notsurprisingly
onegets
back aa
2
very
similar to theprevious
one, as a result of which the chain isslowly
mixing.
The data from simulation I were also used to examine the effect of
priors
onposterior
inferences on theproportion
ofQTL
whene
RT
was used. Four differentpriors
for
7
wereused, ranging
from a ’flat’(but
not a’non-informative’)
uniformprior
to adensity peaked
at zero. The latter reflects theprior expectation
that thegenetic
variance due to theQTL
is small orequal
to zero.Figure
4 presents
bothprior
andposterior
densities. The uniform and the’peaked-at-zero’
prior
resulted in thehighest
(0.20)
and lowestposterior
mean estimate(0.10),
respectively.
For thisAll
priors
studied, however,
showedconsistency
for theposterior
probability
of y =0,
i.e. the datasupported
thepresence of a
QTL
at the studiedposition
of the chromosome.4.2. Simulation II:
parameterization
of thegenetic
modelIn simulation
II,
fivereplicates
of data were used tostudy
the effects of alternativeparameterizations
of thegenetic model,
for the RAMonly.
Genetic andpopulation
parameters
were similar to those in simulation I(table III).
Basedon the results for ESS from the initial MCMC chains
(table I!,
the MCMCchains were run for 250 000
cycles
and every 250th wassample
used foranalysis
(m
=1000).
Now,
uniformpriors
for alldispersion
parameters
were used. Thesampling
correlations wereaveraged
over the fivereplicates
and arepresented
intable VI. These correlations are consistent with those from the initial MCMC chains
(table IV);
i.e. auto-correlations werehighest
amongsamples
for0
&dquo;
(in
Ovc)
andq
(in O
RT
),
i.e. around 0.68. Theseparameters
also had lowest and similar ESS(! 230).
These results indicate thatsampling efficiency
is similar for the two studiedparameterizations
(0vc
andO
RT
)
of thegenetic
model and shorter chains may suffice. Theposterior
meanestimates,
averaged
over fivereplicates,
for alldispersion
parameters
were in closeagreement
with the values used for simulation4.3. Simulation III: presence of the
QTL
In simulation
III,
two differentdesigns
(20
or 50grandsire
families)
were studiedin combination with two different sizes of the
QTL
(explaining
either 10 or 25%
of the
genetic
variance).
Two differentpriors
for
were studied with theO
RT
parameterization.
For each combination ofdesign
and-y,
test runspreceding
the 25replicates
were used toempirically
determine values for t in the MHalgorithm,
in order to achieve the desiredacceptance
rates. From themarginal
posterior density
an odds ratio was
computed
and the presence of theQTL
wasaccepted only
if this ratio exceeded a critical value of 20.Using
this test statistic wepostulated
the powerof
detecting
theQTL
forspecific
designs
andusing
differentpriors
(table V77).
The smalldesign
(20
x40)
has low power ofQTL detection,
i.e.only
25%,
for aQTL
that
explains
10%
of thegenetic
variance. Power increased when either theQTL
explained
moregenetic
variance or when alarge design
(50
x40)
was used. Forthe
large
design
with arelatively large
QTL,
power of detection is 100%,
for bothpriors
considered. The use of the’peaked-at-zero’ prior
reduced power in the two intermediate cases but increased power in the smalldesign
with the smallQTL.
Estimates forposterior
mode,
mean and HPD90 wereaveraged
over the 25replicates
and these averages are
presented
infigure
5. When the’peaked-at-zero’
prior
wasused, point
estimates were lowercompared
tousing
the uniformprior.
Thisprior
also led to shorter - and closer to zero - HPD90region
in all combinations ofdesign
and
5. CONCLUSIONS
We
presented
MCMCalgorithms,
using
the Gibbssampler
and the MHparameters
and also bettermixing
of thedispersion
parameters.
Information onindividual
phenotypes
led to accurate estimation of both residual variance andheritability,
as was similar to Van Arendonk et al.!27!.
On thecontrary,
daughter
yield
deviations[28]
may result in poor estimation ofpolygenic
and residual variances[25].
The use ofB
RT
allows a betterrepresentation
ofprior
belief aboutdispersion
parameters
whilesampling
efficiency
was similar to the usual0
vc
parameterization.
Considering
ratios of variancecomponents
rather than variancecomponents
themselves insampling
procedures
has beenpreviously
proposed
!23!.
However,
ourratios can be
interpreted
directly
and haveimplicit
boundaries(zero
andone),
whereTheobald et al.
[23]
needed aspecific
restriction on their ratio. Therepresentations
of
prior
knowledge
in the twoparameterizations
were notequivalent
and differences inposterior
estimates can beexpected.
However,
the use of vaguepriors
(absence
of
prior
knowledge)
in the twoparameterizations
lead to very similar results. In thisstudy, position
of theQTL
was assumed known. Extension of the MCMCalgorithm
to allow estimation ofQTL
position
has been studied andimplemented
(3!.
Currently,
the method of Bink et al.[2]
tosample
genotypes
for asingle
markeris
being
extended tomultiple
markers linked to anormally
distributedQTL. Then,
arobust MCMC method becomes available for
linkage analysis
inmultiple
generation
pedigrees allowing incomplete
information on both traitphenotypes
and markergenotypes.
ACKNOWLEDGEMENT
The authors wish to thank Luc Janss and
George
Casella forstimulating
discussions andsuggestions.
Comments from anonymous reviewers and the editorconsiderably improved
the paper. The first authoracknowledges
financial support from NWO while on researchleave at Cornell
University, Ithaca,
NY. The financial support of Holland Genetics isgratefully acknowledged.
REFERENCES
[1]
Berger J.O.,
Statistical DecisionTheory
andBayesian Analysis,
2nded.,
Springer-Verlag,
NewYork, NY,
1985.[2]
BinkM.C.A.M.,
Van ArendonkJ.A.M, Quaas R.L., Breeding
value estimation withincomplete
markerdata,
Genet. Sel. Evol. 30(1998)
45-48.[3]
BinkM.C.A.M.,
JanssL.L.G., Quaas R.L., Mapping
apoly-allelic
quantitative traitlocus
using
simulatedtempering,
Proc. 6th WorldCongr.
Genet.Appl.
Livest.Prod.,
Armidale, University
of NewEngland,
Armidale NSW2351,
vol.26,
Australia, 1998,
pp. 277-280.
[4]
CantetR.J.C.,
SmithC.,
Reduced animal model for marker assisted selectionusing
best linear unbiasedprediction,
Genet. Sel. Evol. 23(1991)
221-233.[5]
CasellaG., Berger R.L.,
StatisticalInferences, Duxbury
press,Belmont, CA,
1990.[6]
ChibS, Greenberg
E.,Understanding
theMetropolis Hastings algorithm,
Am. Stat. 49(1995)
327-335.[7]
Devroye L.,
Non-uniform Random VariateGeneration,
Springer-Verlag
Inc.,
New[8]
FernandoR.L.,
GrossmanM.,
Marker-assisted selectionusing
best linear unbiasedprediction,
Genet. Sel. Evol. 21(1989)
467-477.[9]
GelfandA.E.,
Gibbssampling
(a
contribution to theencyclopedia
of StatisticalSciences),
TechnicalReport, Department
ofStatistics, University
ofConnecticut,
1994.[10]
GelfandA.E.,
SmithA.F.M., Sampling-based approaches
tocalculating marginal
densities,
J. Am. Statist. Assoc. 85(1990)
398-409.[11]
GemanS.,
GemanD.,
Stochasticrelaxation,
Gibbs distributions and theBayesian
restoration of
images,
IEEE Trans. Pattern. Anal. MachineIntelligence
6(1984)
721-741[12]
GilksW.R.,
WildP., Adaptive rejection sampling
for Gibbssampling, Appl.
Stat. 41(1992)
337-348.[13]
GilksW.R.,
BestN.G.,
TanK.K.C., Adaptive rejection Metropolis sampling
within Gibbssampling, Appl.
Stat. 44(1995)
455-472.[14]
Hastings W.K.,
Monte Carlosampling
methodsusing
Markov chains and theirapplications,
Biometrika 57(1970)
97-109.[15]
Hoeschele I.,Bayesian QTL mapping
via the GibbsSampler,
Proc. 5th WorldCongr.
Genet.Appl.
Livest. Prod.Guelph, University
ofGuelph, Canada,
vol.21, 1994,
pp. 241-244.
[16]
HoescheleI.,
Van RadenP.M., Bayesian analysis
oflinkage
betweengenetic
markers and quantitative traitloci,
I Priorknowledge,
Theor.Appl.
Genet. 85(1993)
953-960.[17]
JanssL.L.G., Thompson R.,
Van ArendonkJ.A.M., Application
of Gibbssampling
for inference in a mixedmajor gene-polygenic
inheritance model in animalpopulations,
Theor.
Appl.
Genet. 91(1995)
1137-1147.[18]
Metropolis, N..,
RosenbluthA.W.,
RosenbluthM.N.,
TellerH.,
TellerE., Equations
of state calculationsby
fastcomputing machines,
J. Chem.Physics
21(1953)
1087-1091.[19]
Quaas R.L.,
PollakE.J.,
Mixed modelmethodology
for farm and ranch beef cattletesting
programs, J. Anim. Sci. 51(1980)
1277-1287.[20]
SearleS.R.,
MatrixAlgebra
Useful forStatistics,
JohnWiley
&Sons,
NewYork,
NY, 1982.
[21]
SorensenD.A.,
AndersenS.,
GianolaD.,
Korsgaard I., Bayesian
inference in threshold modelsusing
Gibbssampling.,
Genet. Sel. Evol. 27(1995)
229-249.[22]
TannerM.A., Wong W.H.,
The calculation ofposterior
distributionsby
dataaugmentation,
J. Am. Stat. Assoc. 82(1987)
528-540.[23]
TheobaldC.M.,
FiratM.Z., Thompson R.,
Gibbssampling, adaptive rejection
sampling
and robustness toprior specification
for a mixed linearmodel,
Genet. Sel. Evol.29
(1997)
57-72.[24]
Tierney L.,
Markov chains forexploring
posterior distributions(with discussion),
Ann. Stat. 22(1994)
1701-1762.[25]
UimariP.,
HoescheleI., Mapping
linkedquantitative
trait lociusing Bayesian
analysis
and Markov chain Monte Carloalgorithms,
Genetics 146(1997)
735-743.[26]
UimariP.,
ThallerG.,
Hoeschele I., The use ofmultiple
markers in aBayesian
method for
mapping quantitative
traitloci,
Genetics 143(1996)
1831-1842.[27]
VanArendonkJ.A.M.,
TierB.,
BinkM.C.A.M.,
BovehuisH.,
Restricted maximum likelihoodanalysis
betweengenetic
markers andquantitative
trait loci for agranddaughter
design,
J.Dairy.
Sci.(1997).
[29]
Van TassellC.P.,
CasellaG.,
PollakE.J.,
Effects of selection on estimates ofvariance components
using
Gibbssampling
and restricted maximumlikelihood,
J.Dairy
Sci. 78(1995)
678-692.[30]
Van TassellC.P.,
VanVleckL.D., Multiple-trait
Gibbssampler
for animal models: flexible programs forbayesian
and likelihood-based(Co)variance
componentInference,
J. Anim. Sci. 74
(1996)
2586-2597.[31]
Wang C.S., Rutledge J.J.,
GianolaD., Marginal
inferences about variancecompo-nents in a mixed linear model
using
Gibbssampling, Genet,
Sel. Evol. 25(1993)
41-62.[32]
Wang C.S., Quaas R.L.,
PollakE.J., Bayesian analysis
ofcalving
ease scores and birthweights,
Genet. Sel. Evol. 29(1997)
117-143.[33]
Wang T.,
FernandoR.L.,
VanderBeekS.,
GrossmanM.,
Van ArendonkJ.A.M.,
Covariance between relatives for a markedquantitative
traitlocus,
Genet. Sel. Evol. 27(1995)
251-272.[34]
WellerJ.I.,
KashiY.,
SollerM.,
Power ofdaughter
andgranddaughter designs
for
determining linkage
between marker loci andquantitative
trait loci indairy cattle,
J.Dairy
Sci. 73(1990)
2525-2537.A1. APPENDIX: !11 conditional densities
Al.l. Location
parameters
The conditional densities for location
parameters
are the same with either set ofdispersion
parameters
(0
v
c
orO
RT
).
Whensampling genetic effects,
the ratiosa2
of VC needed can be
computed
from eitherparameterization,
i.e.a
u
u 1 =2
=or2 e
(
h2
a2
1
h2
aeC (
1
-
y
)
xI
-h
2
anda
-1 v
1 =- 92 e
=
C 2 !
x1
-h2
In thisstudy
we consideredB
1 -h )
o!B!2
2 1 -n//
only
one fixedeffect,
an overall mean m, for which the conditionaldensity
becomeswhere
ji
r
equals
yk corrected forgenetic effects, following
thecategorization
in table L The conditional variance of this overall mean is aweighted
average overcategories.
Again,
forphenotypes
on animals incategories
1 to3,
the residualvariance,
0
’2
e
i
7
containsparts
of thegenetic
variances. The conditionaldensity
for thepolygenic
where
where
y
2
is the ithphenotype
for animalj,
corrected for alleffects,
other thanpolygenic,
y
l
.
is the average ofphenotypes
onnon-parent
l,
also corrected for alleffects other than
polygenic,
op(j)
represents
theoffspring
of animalj,
which areparents
themselves,
o
n
p(j)
represents
theoffspring
of animalj,
which arenon-parents.
Furthermore,
UM,k is thepolygenic
effect of the other(if known)
parent
(mate
of animalj)
ofoffspring k,
nj is the number ofphenotypes
for animalj,
6
j
,
=1,
4/3,
2 when0, 1,
or 2parents
of j
are identified(with
noinbreeding).
(8;
1
1is the fraction
Q
u
term
4>j.)
Finally,
wi is thereciprocal
of the amount of variancepresent
in the residuals ofphenotypes
onanimal l,
and can be calculated aswhere n, is the number of observations on
animal l,
andD¡
=I
2
-
Q
l
xQT
(with
no
inbreeding,
see also Bink et al.[2]).
The conditionaldensity
for the xthQTL
effect ofanimal j
can
begiven
aswhere
y
z
is the ithphenotype
foranimal j,
corrected for all effects other thanQTL,
W
l
.
is the average ofphenotypes
onnon-parent
l,
also corrected for all effects other thanQTL,
dq!’1
is the first element of the xth row ofD7’QT
foranimal j,
and corrects for the covariance at theQTL
betweenparent
andoffspring. Similarly,
dqd!’1
is the first element of the xth row off!!D! 1Q!
for animalj,
and corrects for the covariance betweenparent
and the matebelonging
to aparticular offspring
of thatparent j.
A1.2.
Dispersion
parameters
In the
RAM,
the residuals(e)
have different variances over thecategories
ofanimals
(table 1).
Hence,
conditional densities for VC in0vc
are not standard densities. Forexample,
whenderiving
the full conditionaldensity
foro,,2,
the termW
i (a2+
2a
2
)
is known in the likelihoodpart
of thejoint posterior
density
(13).
Itcan thus be treated as a
constant,
but it does notdrop
out of theequation.
WithB
RT
,
the conditionaldensity
ofu2
is
standard,
but those forh
2
2 and -y
are not. With0vc ,
the conditionaldensity
of variancecomponent
x, for x = e, u or v, iswhere
and