HAL Id: hal-00894262
https://hal.archives-ouvertes.fr/hal-00894262
Submitted on 1 Jan 1999
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Behaviour of the additive finite locus model
Ricardo Pong-Wong, Chris S. Haley, John A. Woolliams
To cite this version:
Original
article
Behaviour of the
additive
finite
locus model
Ricardo
Pong-Wong*
Chris S.
Haley,
John A.
Woolliams
Roslin Institute(Edinburgh),
Roslin,
Midlothian EH259PS, Scotland,
UK(Received
7September 1998; accepted
2April
1999)
Abstract - A finite locus model to estimate additive variance and the
breeding
valueswas
implemented using
Gibbssampling.
Four different distributions for the size of thegene effects across the loci were considered:
i)
uniform with loci of differenteffects,
ii)
uniform with all loci
having equal effects,
iii)
exponential,
andiv)
normal. Stochastic simulation was used tostudy
the influence of the number of loci and the distribution of their effect assumed in the modelanalysis.
Theassumption
of loci with different anduniformly
distributed effects resulted in an increase in the estimate of the additivevariance
according
to the number of loci assumed in the model ofanalysis, causing
biases in the estimatedbreeding
values. When the gene effects were assumed to beexponentially distributed,
the estimate of the additive variance was stilldependent
on the number of loci assumed in the model ofanalysis,
but this influence was muchless. When
assuming
that all the loci have the same gene effects or whenthey
werenormally distributed,
the additive variance estimate was the sameregardless
of the number of loci assumed in the model ofanalysis.
The estimates were notsignificantly
different from either the true simulated values or from those obtained when
using
the standard mixed model
approach
where an infinitesimal model is assumed. The results indicate that if the number of loci has to be assumed apriori,
the most useful finite locus models are thoseassuming
loci withequal
effects ornormally
distributedeffects.
©
Inra/Elsevier,
Paris’
finite locus
model / gene
effect distribution/
Gibbs sampling / infinitesimal modelRésumé - Comportement des modèles additifs à nombre fini de loci. On a
utilisé,
via la méthode del’échantillonnage
deGibbs,
des modèles à nombre fini de loci pour estimer les variancesgénétiques
additives et les valeursgénétiques.
On aconsidéré quatre distributions différentes des effets de
gènes
sur l’ensemble des loci :i)
distribution uniforme avec loci à effetsvariables,
ii)
distribution uniforme avecloci à effets
égaux,
iii)
distributionexponentielle,
etiv)
distribution normale. Lasimulation
stochastique
a été utilisée pour étudier l’influence du nombre de loci et de*
Correspondence
andreprints
la distribution
supposée
de leurs effets.L’hypothèse
d’effets différents et uniformément distribués a entraîné le fait que la variancegénétique augmentait quand
le nombresupposé
de lociaugmentait,
cequi
a causé des biais dans l’estimation des valeursgénétiques. Quand
les effets degènes
ont été distribuésexponentiellement,
l’estimée de la variancegénétique
additive a été encoredépendante
du nombre de locisupposé,
quoiqu’à
un moindredegré. Quand
on asupposé
que tous les loci avaient les mêmeseffets de
gènes
ouquand
ils ont été normalementdistribués,
l’estimée de la variancegénétique
additive a été lamême, quel
que soit le nombre de locisupposé
dansl’analyse.
Les résultatsindiquent
que si le nombre de loci estsupposé d’après
des considérations apriori,
les modèles à nombre fini de loci lesplus
utiles sont ceuxqui
supposent des loci à effets
égaux
ou à distribution normale.©
Inra/Elsevier,
Paris modèle fini/
distribution d’effets/
échantillonnage de Gibbs/
modèle in-finitésimal1. INTRODUCTION
Genetic evaluation in livestock has
traditionally
been carried outusing
aninfinitesimal
genetic model,
where the trait is assumed to be influencedby
aninfinite number of genes, each with an
infinitesimally
small effect.Although
such a model is
biologically
incorrect,
its use has beenjustified
because itallows the
handling
of the total additivegenetic
effect as anormally
distributedvariable so that standard statistical mixed model
techniques
can beapplied.
Indeed,
solutions from the normalapproximation
appear to be robustenough
forpractical
selection purposes,provided
the trait is not controlledby
a smallnumber of
loci,
fewgenerations
are considered(so
that there are no substantialchanges
in the allelesfrequencies
due to selection ordrift)
and the additivegenetic
effect alone is considered!17!.
The
arguments
justifying
the use of the infinitesimal model are,however,
being
weakenedby
theincreasing
knowledge
about thegenetic
architecture ofquantitative
traits.Single
genes that have arelatively large
effect onquantitative
traits(e.g.
Booroola gene, double muscle gene,Callipyge
gene)
are
expected
to have arapid change
in allelefrequency
due to selection. Under thesecircumstances,
the infinitesimal model wouldwrongly predict
the evolution of thegenetic
variance even when the selected trait is also affectedby
alarge
number of loci with small effects[8].
Moreover,
theassumptions
required
to describe dominance with the infinitesimal model are unclear[25].
Thus,
alternativeapproaches
toincorporating
the extraknowledge
about thegenetic make-up
ofquantitative
traits should be considered.In this paper, an additive finite locus model is defined and
implemented
using
Gibbssampling.
The effects of theassumptions
about the number of loci and the distribution of the size of their effects arestudied, extending
the resultspreviously reported by Pong-Wong
et al.!24!.
The results obtained with the finite locus model arecompared
with those obtainedusing
the mixed model2. MATERIALS AND METHODS
2.1. Finite-locus
genetic
modelA
quantitative
trait is assumed to begenetically
controlledby
L unlinked biallelic loci.Following
the same notation as Falconer[4],
each locusl,
hasan additive
(a,)
effect with afrequency
of the favourable allele in the basepopulation of pi .
The additive varianceexplained
by
locus l is then2P
I
(1-
PI
)af.
Since the loci are assumed to be unlinked and inlinkage
equilibrium
the total additive variance(or a 2)
is the sum over all the loci. The trait is also assumed tobe affected
by
an environmental deviation which isnormally
distributed withmean zero and variance
o,
2
.
Other environmental fixed and random effects mayalso be included in the model
but,
forsimplicity, they
are not considered here. In matrixalgebra
the linear model isexpressed
as:where y is the
(n
x 1)
vector ofphenotypic records,
p the overall mean,a the
(L
x1)
vector of additive(a)
effects for eachlocus,
e the(n
x1)
vector of environmental
deviation,
andW
a
is the(n x L)
matrix of additive effects associated to the individual’sgenotype.
Assuming
that thegenotypes
are denoted as
AA,
AB and BB(BB
the least favourablegenotype),
the valuein column l of
W
a
would be1,
0 or-1,
for aphenotypic
observation froman individual with
genotype
(at
thel locus)
AA,
AB orBB,
respectively.
Thevector a-, is defined the same as a but
excluding
the effect at the locus 1. 2.1.1. Distribution of the size of gene effectsSince the size of the effects across the different loci are assumed to be
different,
anassumption
about how the gene effects are distributed isrequired.
Here,
threepossible
distributions to model the gene effects are examined:i)
uniform,
ii)
exponential,
andiii) (folded-over)
normal.The
probability
density
functions for the distribution of the size of the additive effects(
0
(a))
whenassuming
theuniform, exponential
and the(folded-over)
normaldistributions,
respectively,
are:where
Aa
is the scaleparameter
for theexponential
and the normal distribu-tion. Thedensity
function0(a)
is definedonly
for the range of thepositive
numbers(including
zero)
since ais,
by definition,
the effect of the favourableor
exponentially
distributed is consistent with thegeneral
belief that most ofthe loci
affecting
agiven
quantitative
trait would have a smalleffect,
whileonly
a few genes have amajor
effect on the trait inquestion.
2.2.
Implementation
of the finite locus modelusing
Markov chain Monte CarloGenetic
analyses
assuming
theproposed
finite locus model involve the esti-mation of the gene effect at eachlocus,
theparameter
defining
the distributionof the gene
effects,
thegenotype
probability
for each individual at all the loci and their allelefrequencies.
In the model ofanalysis,
the number of lociaffect-ing
the trait inquestion
as well as the distribution of their effects are assumedknown. The total additive variance is estimated as a linear function of the effect
and allele
frequency
across all the loci(i.e.
er!
=2
2!(1 -p!a!).
Agraphical
i
representation
of the finite locus model ispresented
infigure
1.The main
problem
inimplementing
a finite locusgenetic
modelusing
astandard likelihood
approach
is the calculation of thegenotype
probability
for all the loci. Inpractice
this task iscomputationally
very difficult because of thelarge
number ofpossible
genotype
combinations that need to beconsidered,
anumber which
rapidly
increases with the number of individuals. Thisproblem
becomes further exacerbated with
complex
pedigree
structuresinvolving loops
In order to avoid this
problem,
the finite locus modelproposed
isimple-mented
using
a Markov chain Monte Carlo(MCMC)
approach
based uponGibbs
sampling algorithms previously suggested
forsegregation
studies ofun-typed single
genes incomplex pedigree
structures(e.g.
[16,
18]).
These algo-rithms aresimply
extended to include L lociaccounting
for the entiregenetic
effects. Because all loci are assumed to be unlinked the
sampling
of thegenotype
at each locus is
performed
independently.
A
sampling
protocol
forupdating
the relevantparameters
(conditional
onthe
others)
of a finite locus model in the Markov chain would then be as follow:1)
sample
overall mean;2)
sample
thegenotype
configurations
locusby
locus;
3)
sample
the gene effects locusby
locus;
4)
sample
the scaleparameter
of the assumed distribution of gene effects(not
needed whenassuming
a uniformdistribution);
5)
sample
all other environmental fixed and random effects(not
includedhere);
6)
sample
non-permanent
environmental variance and variance for otherrandom effects.
The
sampling
of the allelefrequencies
for each locus may also be added in thesampling
scheme. In thisstudy, however, they
were not estimated butthey
were fixed to be 0.5.
The full conditional distributions for the gene effects and the scale parame-ter for the distribution of gene
effects,
neededduring
thesampling
process, arepresented
below. The conditional distributions of otherparameters
(e.g.
geno-type
configuration,
environmentalvariance,
other random and fixedeffects)
arenot shown here since
they
have been described inprevious
studiesreported
in the literature. For thedescription
of thealgorithms
used tosample
genotypes
see Guo and
Thompson
[16]
and Janss et al.[18] (the
latteralgorithm
was usedhere,
since it allows a bettermixing
inpedigrees
withlarge
family
sizes).
Forthe use of Gibbs
sampling
in moregeneral
genetic
evaluations and thecondi-tional distributions of other environmental
effects,
see Firat[7]
andWang
et al.[29, 30].
2.2.1. Joint
posterior density
(conditional
on thegenotype
structure)
The full conditional
density
for the effect at each locus as well as the scaleparameter
of the distribution of gene effects are obtained from theirjoint
posterior
density by
extracting
the termscontaining
the variable inquestion.
Thejoint posterior
density
of 0’; , a
andA
a
conditional on thegenotype
structure(considered
as known tosimplify
theexpression)
is of the form:and
P(a§)
are theprior
distributions ofA
a
and0’
;,
respectively.
The respec-tiveconjugate prior
distribution forA
a
whenassuming
the gene effectsbeing
exponentially
andnormally
distributed isproportional
to(A
a
)-
v
-’exp(-vs/
Aa
)
and
-
),
(A
exp(-0.5vs/A
l
a
2
/
,
) -
a
where v is thedegree
of belief and s theprior
value of
A
a
.
Assuming
that v isequal
to zero(i.e.
there is no belief in anyparticular
value ofs)
gives
the ’naive’prior,
which isproportional
to1/Aa-This
prior
denotes a lack ofprior
knowledge
about theparameter
and it hasbeen used as a
prior
for variancecomponents
including
some animalbreed-ing implementations
[9, 29!.
In thisstudy
’naive’priors
were used for bothA
a
and
a 2
2.2.2. Conditional distributions for the
(size
ofthe)
gene effects The conditional distribution of the gene effectsdepends
on theassumption
of how
they
are distributed.!.!.!.1.
Uniform
andindependent
When the additive effects are assumed to be
uniformly
distributed,
theconditional
density
depends only
on the first term ofequation
(5)
(i.e.
thesecond term is a
constant).
Thus,
the conditional distribution for the effect ofthe locus l is
proportional
to:which is
equivalent
to a truncated normal distribution with meanii,
andvariance
or
evaluated
in the range ofpositive
values. The value fora
l
is the solution from the linear modelequal
to(2:
YAA
-
2: Y
BB
)
/(n
AA
+n
BB
),
andQ
Z
its
error varianceequal
toAA
0,2 e /(n
+n
BB
),
where yg is theadjusted
phenotype
of individuals withupdated
genotype
g, and ng is the number ofrecords from individuals with such a
genotype.
The solution of the linear modelâ
l
,
isequivalent
to the coefficient from theregression
(passing
through
theorigin)
of thephenotype
(adjusted
for the effect of other loci and any other environmentaleffects)
on thegenotype
value(i.e.
1,
0 or -1 for the recordfrom an individual
sampled
to havegenotype
AA,
AB orBB,
respectively).
The conditional distribution
resulting
fromassuming
a uniform distributionhas been
generally
used tosample
themajor
gene effect in mixed inheritance models(e.g.
[18]).
2.2.2.2.
Uniform
and constantregression
is on the number of locisampled
as AA minus the number of locisampled
as BB for the individualcontributing
to therecord).
2.2.2.3.
E!ponential
The full conditional distribution of the effect of locus l is
proportional
to:where
a
l
andQ2
are defined as inequation
(6).
Rearranging
theprevious
equation
results in thefollowing:
where the first term is
proportional
to a normal distribution with meana,
l
-
U2.!a
and
varianceQ2
,
and the second term is a constant.Substitut-ing
the valuesa,
anda
as
defined inequation
(6),
the full conditional dis-tribution is a truncated normal defined for thepositive
values with mean(! yAA -
YBB
-
£
0
’;À-
1
)/(n
AA
+)
n
BB
and varianceoe 2 / (n
AA
+n
BB
)
2.2.2.l!.
Folded-over normalExtracting
the termscontaining
a, inequation
(5),
its conditional distribu-tion isproportional
to:and when
substituting
the values ofat
and!2,
theprevious expression
can berearranged
aswhich is
proportional
to a truncated normal with mean(2:
yAA -2
:
YBB)
(n
AA
+ nBB+0’; À;;:-l)
1
and
varianceAA
(n
+ nBB +0’
; À;;:-l )-
1
0
’
;.
2.2.3. Conditional distribution of the scale
parameter
of the gene effect distributioneffects is
being
assumed. The estimation of thisparameter
is notrequired
whenassuming
that the gene effects areuniformly
distributed.The conditional
density
ofA
a
under theassumption
that the gene effects areexponentially
distributed and with ’naive’prior
is:which is
equivalent
to:where ’Y(1,L) is a gamma distribution with scale and
shape
parameters
equal
to1 and
L,
respectively.
Similarly,
when the gene effects arenormally
distributed,
the conditionaldistribution of
Aa
assuming
a ’naive’prior
is:which is a scaled inverted
chi-squared
of the form:2.3. Simulated
population
2.3.1.Population
structureThe structure of the simulated
population
consisted of a basepopulation
of 80 unrelated individuals(40
males and 40females)
plus
five other discretegenerations.
At eachgeneration
five males and 20 females were chosen andrandomly
mated toproduce
fouroffspring
(two
males and twofemales)
perfemale. Selection of
parents
was at random unless otherwise noted in the results. All individuals had onephenotypic
record.2.3.2. Genetic model
The total
genetic
effects were accounted forby
20independent
and diallelicfrequency
was 0.5. Thegenotype
at each locus of the base individuals wassampled
from theexpected
genotype
frequency
of a locus inHardy-Weinberg
equilibrium.
Thegenotype
of individuals from furthergenerations
weresampled
assuming
Mendelian inheritance. The totalgenetic
effects of an individual arethe sum of all the
genotype
effects over all loci.2.3.3. Parameters used
For all the cases the environmental variance was assumed to be
80,
theadditive
genetic
variance 20. In order to account for the totalgenetic variance,
the effect of each locus was simulated in two ways:i)
assuming
that all the 20 loci have the same effect(i.e.
a =J
2);
orii)
that each effect wassampled
from anexponential
distribution with scaleparameter
equal
to 1(which
isexpected
to
yield
the correct totalgenetic
variance).
2.4. Situationscompared
Data sets simulated
using
thepopulation
structureexplained
above wereused to
study
the behaviour of the finite locus model(FIN)
ingenetic
eval-uations. Each data set(replicate)
wasanalysed
with several FINapproaches
varying
in theassumptions
about the distribution of gene effects and thenum-ber of loci taken in the model of
analysis.
These variations in
assumptions
were thefollowing.
i)
The distribution of the gene effects: effects of lociuniformly
andinde-pendently
(FIN-UNI),
uniformly
but constant(i.e.
equal
effects;
FIN-CON),
exponentially
(FIN-EXP)
ornormally
(FIN-NOR)
distributed.ii)
The number of loci:5, 10,
20 or 30.As
previously
stated,
the allelefrequencies
in the basepopulation
for each locus were not estimated in theanalysis.
Insteadthey
were fixed at 0.5.The case when all loci have the same effects
(FIN-CON)
is similar to the finite locus modelproposed
by
Fernando et al.!6!.
The same data sets were also
analysed
using
the standard mixed modelapproach
(MM)
where an infinitesimalgenetic
model is assumed. In orderto make the results
comparable
with those obtained with the FINanalyses,
the MM was alsoperformed
using
a Gibbssampling approach
to obtainthe
marginal
posterior density
of each variancecomponent.
From aBayesian
perspective,
the variance estimates from MMusing
a restricted maximumlikelihood
(REML)
approach
are the mode of theirjoint posterior distribution,
which are not
expected
to coincide with the mode of theirmarginal
distributions[11].
Theimplementation
of the mixed modelusing
Gibbssampling
andits differences from REML
approaches
have been much studied(e.g.
Wang
et al.!30!).
2.4.1. Criteria of
comparison
The criteria of
comparison
were the estimates of the variancecomponents
3. RESULTS
3.1. Gibbs
sampling implementation
The results
presented
below are the summaries of 50replicates.
The varianceestimates of each evaluation within a
replicate
is the mean of a Markov chain of1 000 realisations
sampled
every 50cycles
after aburning
period
of 5 000cycles
(i.e.
totallength
of the chain = 55 000cycles).
Thissampling protocol
ensured that the autocorrelation between consecutive realisations was less than 0.1 forall the
parameters studied
here.3.2. True model: the same gene effects across all loci
(random
selection)
3.2.1. FIN- UNI
The estimates of the variance
components
assuming
that all loci have different effects and areuniformly
distributed are shown in table 7. These results werehighly
dependent
on the number of loci assumed in the model ofanalysis.
The estimate of the additive variance increased when more loci were assumedin the model of
analysis.
This trend wasconsistently
observed across all thereplicates.
The additive variance estimate closest to the true simulated valuewas
produced
whenonly
five loci were assumed in the model ofanalysis,
whichis
substantially
less than the true number used to simulate the data.The increase in the estimated additive variance when
assuming
more loci in the model ofanalysis
was alsoaccompanied by
a decrease in the estimateden-vironmental variance.
However,
this reduction did notcompletely
compensate
for the extra estimated additive
variance,
thusresulting
in an overestimate inthe total
phenotypic
variance. The estimated total variance increased from 105 whenassuming
five loci to 129 when theanalysis
was carried outassuming
30 loci
(the
simulated value was100).
The excess of additive variance which
appeared
whenincreasing
the number of loci hadrepercussions
on the estimatedbreeding
values. Asexpected,
theindividuals with extreme EBV became even more extreme when more loci were
assumed in the model of
analysis. Additionally,
theprediction
error varianceassociated with the EBV also tended to increase with the number of loci: The
mean
prediction
error variance of the EBV whenusing
five loci was 16compared
with 25 when the EBV were obtained
assuming
30 loci(for
the MM the meanprediction
error variance was12.5).
Nevertheless,
it isimportant
to note thatalthough
the EBV were very sensitive to the number ofloci,
the correlationbetween the different estimates was
always
greater
that 0.9(table II).
Thus,
the
ranking
of individuals was little affected.The variance estimates when
assuming
all loci hadthe
same effects is summarised in table III. Under thisassumption
the estimates of the additive variance were the sameregardless
of the number of loci assumed in the modelof
analysis.
The results from FIN-CON were notsignificantly
different from the simulated values or from those obtained with the MM. The EBV and their3.2.3. FIN-EXP
Table IV shows the summary of the variance
components
estimatedassuming
the gene effectsbeing exponentially
distributed.Increasing
the number of loci used in the model ofanalysis yielded
aslight
increase in the estimated additive variance.However,
this trend was very smallcompared
with theresults from FIN-UNI. In contrast to the case of
FIN-UNI,
the estimate ofthe total
phenotypic
variance remained constant. The correlation among the EBV obtained with the FIN-EXPanalyses
with different numbers of loci wasalways higher
that 0.95(results
notshown).
3.2.4. FIN NOR
The results when the gene effects were assumed to be
normally
distributedappeared
not to be affectedby
the number of loci used in the model ofanalysis
(table V).
The EBV were also the sameregardless
of the number of loci usedin the model of
analysis.
The results obtained with FIN-NOR were similar to3.3. True model: gene effects simulated as
exponentially
distributed
The main purpose of
using
simulated dataassuming
the gene effects to beexponentially
distributed was to test whether the observed behaviour of FIN-EXP is the same even when itcorresponds
to the true model.3.3.1.
Population
under random selectionTable VI summarises the results when the
population
was under randomselection. The estimated additive variance showed the same trend to increase when more loci were assumed in the model of
analysis.
The best estimates wereobtained when
using
20loci,
whichcorresponds
to the truegenetic
model usedto simulate the data. For this case, the variance
component
estimates were the same as the values used to simulate the data. The correlations between EBV obtained whenusing
different numbers of loci have a correlationgreater
than 0.99(results
notshown).
3.3.2.
Population
under truncation selectionTable VII shows the results of FIN-EXP when the
population
wassurprisingly,
themagnitude
was smaller than that observed with randomselection. The correlation between EBV calculated
assuming
different numbers of loci wasalways
greater
than 0.97(results
notshown).
4. DISCUSSION
In this paper a
genetic
modelassuming
a finite number of lociaffecting
aquantitative
trait wasimplemented
using
Gibbssampling.
Thebehaviour
of theresults when
changing
the number of loci and the distribution of the gene effects assumed on the model ofanalysis
were studiedusing
stochastic simulation.The use of
genetic
modelsassuming
a finite number of loci has so far beenhardly
studied. Chevalet[3]
proposed
agenetic
model which allows theestima-tion of the effective number of loci
affecting
aquantitative trait,
but this model is still based upon the same Gaussianassumptions
made with the infinitesimalmodel. A model which does not
depend
on normaltheory
wasproposed by
Fernando et al.
[6]
and is known as thehypergeometric
model[20].
In thismodel the calculation of the multilocus
genotype
probability
issimplified by
not
treating
thegenotype
at each locusindependently,
butby identifying
their combinedgenotypes
as the total number of favourable allelespresent
across allloci. This
simplification,
however,
forces theassumption
that all loci must have the sameeffect,
and the model is notstrictly
consistent with Mendelian trans-mission[6, 20!.
However,
the main purpose of thehypergeometric
model has been to mimic the results of the infinitesimal model but with a lowercomplex-ity
whencalculating
thelikelihood,
thereby greatly reducing
thecomputational
difficulties ofsegregation
andlinkage analyses
ofsingle
major
genes(28!.
Morerecently,
Goddard
[14]
andPong-Wong
et al.[24]
studied thefeasibility
ofes-timating
dominanceusing
a finite locus modelassuming
that the gene effects wereuniformly
distributed.The results from this
study
show a remarkable interaction between thedistribution of gene effects and the number of loci assumed in the model of
analysis.
When the gene effects were assumed to follow a uniform distribution(FIN-UNI),
the estimate of the additive variancesharply
increased whenadding
more loci to the model of
analysis.
A lessmarked
trend was also observedwhen
assuming
that the gene effects wereexponentially
distributed(FIN-EXP).
When the model ofanalysis
assumed the allelic effects to benormally
distributed(FIN-NOR)
or constant over all loci(FIN-CON),
the results werethe same
regardless
of the number of loci assumed in the model.However,
despite
thesimilarity
in the trend of the additivevariance,
the results fromFIN-UNI and FIN-EXP are
qualitatively
different.The
slight
increase in theadditive variance observed with FIN-EXP was
only
due to differences in thepartition
of the totalvariance,
whereas with FIN-UNI there was also an increase in the totalphenotypic
variance observed in thesystem.
From thispoint
ofview,
the behaviour of FIN-EXP is more similar to FIN-NOR than to FIN-UNI.This difference in the behaviour of FIN-UNI
compared
with FIN-EXP orFIN-NOR
is,
perhaps,
notsurprising
whenexamining
the statisticalmeaning
of these models. From astrictly
statisticalpoint
ofview,
it can be seen thatthe gene effects in FIN-UNI are treated as fixed effects while with FIN-EXP
the gene effects
using
FIN-UNI isexpected
toyield
different answers to those obtained with FIN-NOR andFIN-EXP,
andthereby,
the total additive variance which is calculated as a linear combination of the gene effect estimates.The consequences of
adding
more loci to the model ofanalysis
whentreating
their effects as fixed appears to create anoverparameterised
model. The extragene effects
(fixed effects)
that need to be estimated in the model result insome of them
being
confounded andexplaining
spurious
effects.Thus,
the moreloci fitted in the
model,
the morespurious
effects areestimated, increasing
both the
genetic
and the total variance.Hence,
the reduction in the number ofparameters
to be estimatedby
assuming
that all the loci have the same effects(only
one effect is estimatedcompared
with L effects estimated whenassuming
all loci
having
differenteffects)
may avoid thisoverparameterisation,
which wouldexplain why
the results of FIN-CON are insensitive to the number of loci. On the otherhand,
thepossibility
ofspurious
effectsarising
whenincreasing
the number of loci in FIN-EXP or FIN-NOR is better controlled since the estimatesof the gene effects
(random
effects)
areregressed
towards zero,restricting
theirdispersion
accordingly
to their scaleparameter
(A
a
).
The difference betweentreating
a variable as fixed or as random is well known in animalbreeding.
Forexample,
the variance of the estimated sire effects obtained aftertreating
themas fixed would be
greater
than the estimated inter-sire variance whenassuming
them to be random normal variables.
The trend of
0’
;
when
using
FIN-EXP was also observed when the true model also assumed the gene effects to beexponentially
distributed.Surprisingly,
this trend was smaller when thepopulation
wasundergoing
selection. Thisconsistency
of results across simulated data setsassuming
differentgenetic
modelssuggests
that the overall trend observed with FIN-EXP is morelikely
to be a true characteristic of this model of
analysis
rather thanbeing
a MonteCarlo error due to a small number of
replicates.
Anotherinteresting
result is thefact that the mean estimate of the
genetic
variance whenassuming
ten loci wasmarginally
closer to the true simulated value than when 20 loci were assumed(table V7).
Intuitively,
the latter would beexpected
toyield
better answers asit
corresponds
exactly
to the model used to simulate the data.However,
thedifference in results is too small to
firmly
conclude which is the better model ofanalysis,
so theirrating
should not be basedonly
on the average estimate(across
thereplicates)
relative to the true simulated value. The estimation of theBayes
factor to assess thegoodness
of fit of these models should also beconsidered before
concluding
which one better describes the data.The difference in the results between FIN-EXP and FIN-NOR
prompts
the need for further studies to evaluate the behaviour of finite locus modelsassum-ing
other distributions of gene effects. Theassumption
of a normal distribution appears toyield
robust/consistent
results,
butideally
the distribution to be assumed should be oneclosely reflecting
thereality
of the trait inquestion.
Although
the characterisation of the distribution of gene effects foreconomi-cally
important
traits in farm animals is stillincomplete,
someknowledge
inthis area may be obtained from studies of mutation effects in
Drosophila.
Forinstance,
Keightley
[19]
suggested
that the gamma distribution may be suitable formodelling
the gene effects since itdepends
on fewparameters
(note
thatbe
parameterised
todisplay leptokurtosis.
Other alternatives have also beenproposed by
Caballero andKeightley
!2!.
An alternative way to avoid the
problem
ofuncertainty
on the true distri-bution of gene effects may be to select the distributionduring
theanalysis.
Using
the Markov chainframework,
Green[15]
proposed
atechnique,
called the reversiblejump,
which allows for model choiceduring
theanalysis.
For thisparticular situation,
a set of distributions may bepredefined
and a Markovprocess built
allowing
the chain to move among these distributionsaccording
to their
probabilities. Using
the sameprinciple,
one may also be able tosample
the number of loci
!27!.
Obviously,
thecomplexity
of a Markov chainimplemen-tation
allowing
for model choice in several of theparameters
would behigher,
andgreater
care should be taken whenassessing
the convergence of the chain as well as wheninterpreting
the results. Another alternative means to conclude which set ofparameters
(e.g.
distribution of geneeffects,
number ofloci)
fit best the data would be the estimation ofBayes
factors!9!.
One of the consequences of
assuming
other distributions of geneeffects,
suchas gamma, is that the
resulting
full conditional distribution may be of unknown form with no standardsampling
routine available. The full conditionaldensity
of the gene effects
resulting
from the three distributions examined in thisstudy
are
proportional
to a truncatednormal,
for which standardsampling
routinesare available. The use of
techniques
such asadaptive rejection
sampling
[12, 13]
and
’slicing-the-density’
[23]
allowsampling
from non-standarddistributions,
but thecomputational
cost is alsoexpected
to increase.Because of the
computational
demand of Gibbssampling implementations,
thestudy
of theproperties
of finite locus models should also becomplemented
with theproposal
of efficientalgorithms
toimprove
themixing
and convergence of the Markov chain. Severalapproaches
toimproving
theefficiency
ofsampling
thegenotype
structure incomplex pedigree
are now available(e.g.
[10,
21,
22]),
and their use may prove beneficial in
reducing
thecomputational
demand of afinite locus model.
In this
study,
allelefrequencies
were not estimated but were assumed to be 0.5.However,
thefrequencies
areonly
fixed in the basepopulation,
so the modelis able to account for
changes
in thegenetic
level due to drift or directional selection.Although
the estimation of the allelefrequencies
does not add much extracomplexity
to themodel, practical problems
in the Gibbsimplementation
were encountered
(unpublished).
Allowing
variable allelefrequencies
can result in slowmixing
withproblems arising
from locibecoming temporally
fixed. Since inferences from MCMC are validonly
when convergence has beenobtained,
poor
mixing requires
thelength
of the chain to beconsiderably
increased,
with consequences incomputing
time. Apreliminary analysis
of a non-selectedpopulation
where the allelefrequencies
were estimated but restricted to between 0.2 and 0.8(to
avoid the Gibbssampling
problem
due tofixation)
yielded
similar results as whenanalysis
wasperformed assuming
thefrequencies
to be 0.5.Obviously,
this restriction on the genefrequency
estimation may need tobe relaxed when
considering populations
withdeeper pedigree
structure andundergoing
selection.A
positive
characteristic of the finite locus modelproposed
here is itsability
to account for the
linkage disequilibrium
between loci built upduring
selectiongeneration
which results in a reduction of the total observedgenetic
variance(i.e.
thegenetic
variance in theoffspring
generation
estimatedusing
their totalgenetic
effects across all loci is smaller than the sum of the variancesexplained
by
each individuallocus).
For the case of selectionpresented
in thisstudy,
the loss in variance due to
disequilibrium
for the first threegenerations
from selectedparents
was6.3,
7.2 and 9.5%,
respectively.
Conversely,
it is also desirable that thegenotypes
of individuals from the basepopulation being sampled
are inlinkage equilibrium
to avoidpotential
bias in the estimation of0
’; (as
the formula used to estimate it assumes nocorrelation between
loci).
Considering
theimpact
on the results iflinkage
disequilibrium
in the basepopulation
being
built up due tosampling,
Q
a
was
also estimated
using
the totalbreeding
value of each individual reconstructedgiven
theirgenotypes
and the gene effects across all loci(thus
accounting
forany correlation
appearing
due tosampling).
With theexception
of the cases ofFIN-UNI
assuming
20 and 30loci,
the estimate of0
’; was
the same as when notaccounting
for anypotential
correlation. Forinstance,
or estimated
from the totalbreeding
valuesusing
FIN-EXPassuming 10,
20 and 30 loci were21.58,
23.36 and
25.12,
respectively,
which are very similar to the valuesreported
in table IV. When theanalysis
wasperformed
using
FIN-UNI with 20 or 30loci,
the estimation of
0’; from
the totalbreeding
valuesyielded
smaller results than whenassuming
the locibeing independent
(i.e.
53.34 and67.4,
respectively,
compared
with results from table7),
but these differencesexplain
less than 10%
of the total overestimation of0’
;.
Thus,
the conclusion that the estimates of both thegenetic
and the total variance whenusing
FIN-UNI arelargely
dependent
on the assumed number of loci still remains valid(i.e.
FIN-UNI is still aquestionable
model to beused).
Here we considered
only
the case of acomplete-additive
genetic
model wherethe results from the mixed model are
expected
to berobust,
and a finite locusmodel would add little
improvement
topractical
genetic
evaluations.However,
there are other situations wheredeparting
from the infinitesimal model mayprove to be beneficial. For
instance,
a finite locus model mayprovide
a morenatural
approach
toextending
marker-assisted selection(MAS)
accounting
formultiple
quantitative
trait loci(QTL).
Genetic maps in most farm animals arebecoming
very dense so that the use of MAS toexploit
information ofonly
one
QTL
at a time seems to be a waste of resources and time.Approaches
tostudying linkage
between aQTL
and agenetic
markerusing
Gibbssampling
have been
suggested
in the literature(e.g.
[16, 18!),
and theirimplementation
in a finite locus model seems to bestraight
forward. From the mixed modelapproach, multiple QTL
may also be accounted forby extending
the MASmethod
using
a BLUP framework[5].
Thismethod, however,
presents
theproblem
that it does not account forchanges
in genefrequency
due to selection.Additionally,
since eachQTL
is modelled with two normalvariables,
the method becomescomputationally complex
as the rank of theresulting
linear model increasesby
twice the number of individuals per eachQTL
included in the model.Another
potential
use of a finite locus model is the estimation of dominance.Although
the mixed model has been used to estimate dominancedeviance,
theassumptions
justifying
thisapproach
are not well understood[25].
Despite
thethe results of a mixed model
analysis
may be difficult tointerpret
(for
anexample,
see!26!).
Preliminary
results have shown that inclusion of dominancein a finite locus model adds very little extra
complexity
to the model whilstmaintaining
arelationship
between both the dominance variance and theinbreeding
depression
(24!.
Finally,
another benefit of finite locus models is thatthey
offer a more’biologically
appropriate’ genetic
model which wouldprovide
agreater
under-standing
ofquantitative
traits. Forexample,
it would bepossible
to examinecharacteristics of these traits
(e.g.
the distribution of the geneeffects,
effective number ofloci).
Moreover,
unlike the infinitesimalmodel,
some of thenewly
gained knowledge
about the architecture of these traits couldeasily
be included in a finite locus model toimprove
theprediction
ingenetic
evaluations.ACKNOWLEDGEMENTS
The authors
acknowledge
financial support from theMinistry
ofAgriculture,
Fisheries andFood,
theBiotechnology
andBiological
Research Council and thePig
Improvement
Company.
We thank S.C.Bishop
and two anonymous readers for usefulcomments on the
manuscript.
REFERENCES
!1!
BulmerM.G.,
The effect of selection ongenetic variability,
Am. Nat. 105(1971)
201-211.
[2]
CaballeroA., Keightley P.D.,
Apleiotropic
nonadditive model of variation inquantitative traits, Genetics 138
(1994)
883-900.[3]
ChevaletC.,
Anapproximate theory
of selection assuming a finite number of quantitative traitloci,
Genet. Sel. Evol. 26(1994)
379-400.[4]
FalconerD.S.,
Introduction toQuantitative Genetics,
3thed., Longman
Sci-entific andTechnical, Essex,
1989.[5]
FernandoR.L.,
Grossman M., Marker-assisted selectionusing
best linearun-biased
prediction,
Genet. Sel. Evol. 21(1989)
467-477.[6]
Fernando R.L. StrickerC.,
ElstonR.C.,
The finitepolygenic
mixed model: analternative formulation for the mixed model of
inheritance,
Theor.Appl.
Genet. 88(1994)
573-580.[7]
FiratM.Z., Bayesian
methods in selection of farm animals forbreeding,
PhDthesis, University
ofEdinburgh,
1995.[8]
Fournet-Hanocq F.,
Elsen M., On the relevance of threegenetic
models forthe
description
ofgenetic
variance in smallpopulation undergoing selection,
Genet. Sel. Evol. 30(1998)
59-70.[9]
GelmanA.,
CarlinJ.B.,
SternH.S.,
RubinD.B., Bayesian
DataAnalysis,
Chapman
and Hall.London, UK,
1995.[10]
Geyer C.J.,
ThomsonE.A., Annealing
Markov chain Monte Carlo withap-plication
to ancestralinference,
J. Am. Stat. Assoc. 90(1995)
909-920.[11]
GianolaD., Foulley J.L.,
Variance estimation fromintegrated
likelihood(VEIL),
Genet. Sel. Evol. 22(1990)
403-417.[12]
GilkW.R.,
Wild P.,Adaptive rejection sampling
in Gibbssampling, Appl.
Stat. 41(1992)
337- 348.[13]
Gilk W.R., BestN.G.,
TanK.K.C., Adaptive rejection Metropolis sampling,
[14]
GoddardM.E.,
Gene based model forgenetic
evaluation - An alternative toBLUP?,
Proc. 6th WorldCong.
Genet.Appl.
Livest. Prod. 26(1998)
33-36.[15]
GreenP.J.,
Reversiblejumping
Markov chain Monte Carlocomputation
andBayesian
modeldetermination,
Biometrika 82(1995)
711-732.[16]
GuoS.W., Thompson E.A.,
A Monte Carlo method for combinedsegregation
andlinkage analysis,
Am. J. Hum. Genet. 51(1992)
111-1126.[17]
HillW.G., Quantitative genetic theory:
Chairman’sintroduction,
Proc. 5th WorldCong.
Genet.Appl.
Livest. Prod. 19(1994)
125-126.[18]
JanssL.L.G., Thompson R.,
Van ArendonkJ.A.M., Application
of Gibbssampling
for inference in a mixedmajor gene-polygenic
inheritance model in animalpopulations,
Theor.Appl.
Genet. 91(1995)
1137-1147.[19]
Keightley P.D.,
The distribution of mutation effects onviability
inDrosophila
meLa!,ogaster,
Genetics 138(1994)
1315-1322.[20]
Lange K.,
Anapproximate
model ofpolygenic inheritance,
Genetics 147(1997)
1423-1430.[21]
LinS., Thompson E.A., Wijsman E.,
Analgorithm
for Monte Carlo estima-tion of genotype probabilities oncomplex pedigrees,
Ann. Hum. Genet. 58(1994)
343-357.
[22]
LundM.,
JensenC.S.,
Multivariateupdating
of genotypes in a Gibbssam-pling algorithm
in the mixed inheritancemodel,
Proc. 6th WorldCong.
Genet.Appl.
Livest. Prod. 25
(1998)
521- 524.[23]
NealR.M.,
Markov chain Monte Carlo methods based on’slicing’
thedensity
function,
Technical report 9722, Dept. Stat, Univ., Toronto, 1997.[24]
Pong-Wong R., Shaw F., Woolliams J.A.,
Estimation of dominance variationusing
a finite-locusmodel,
Proc. 6th WorldCong.
Genet.Appl.
Livest. Prod. 26(1998)
41-44.
[25]
RobertsonA.,
HillW.G., Population
andquantitative genetics
of many linked loci in finitepopulations,
Proc. R. Soc. Lond. B 219(1983)
253-264.[26]
Shaw F., WoolliamsJ.A.,
Variance component analysis of skin andweight
data forsheep subjected
torapid inbreeding,
Genet. Sel. Evol. 31(1999)
43-59.[27]
Stephens D.A.,
FishR.D., Bayesian
analysis
ofquantitative
trait locus datausing
reversiblejump
Markov chain MonteCarlo,
Biometrics 54(1998)
1334-1347.[28]
StrickerC., Fernando R.L., Elston R.C., Linkage analysis
with an alternativeformulation of the mixed model of inheritance: the finite
polygenic
mixedmodel,
Genetics 141(1995)
1651- 1656.[29]
Wang C.S., Rutledge J.J.,
GianolaD., Marginal
inferences about variancecomponents in mixed linear model
using
GibbsSampling,
Genet. Sel. Evol. 25(1993)
41-62.