HAL Id: hal-00894013
https://hal.archives-ouvertes.fr/hal-00894013
Submitted on 1 Jan 1993
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Genetic evaluation for a quantitative trait controlled by
polygenes and a major locus with genotypes not or only
partly known
A Hofer, Bw Kennedy
To cite this version:
Original
article
Genetic
evaluation
for
a
quantitative
trait controlled
by polygenes
and
a
major
locus
with
genotypes
not
or
only partly
known
A Hofer
BW
Kennedy
2
1
Department
of
AnimalSciences,
Federal Instituteof Technology
(ETH),
CH-8092Zvrich, Switzerland;
2
Centre
for
GeneticImprovment of Livestock, University of Guelph,
Guelph, Ontario,
N1 G 2W1, Canada(Received
4 March 1992;accepted
5August
1993)
Summary - For a quantitative trait controlled
by polygenes
and amajor
locus with 2alleles,
equations for the maximum likelihood estimation ofmajor
locus genotype effects andpolygenic breeding values,
as well as major allelefrequency
and major locus genotypeprobabilities,
were derived. Because theresulting
expressions arecomputationally
un-tractable for
practical application, possible approximations
werecompared
with 2 otherprocedures suggested
in the literatureusing
stochastic computer simulation.Although
thefrequency
of the favourable allele wasseriously
underestimated when major locusgeno-types were
entirely unknown,
theproposed
method comparesfavourably
with the 2 otherprocedures
under certain conditions. None of theprocedures compared
cansatisfactorily
separate
major genotypic
effects frompolygenic
effects.However,
theproposed
method has somepotential
forimprovement.
major locus
/
genetic evaluation / segregation analysisRésumé - Évaluation génétique pour un caractère quantitatif contrôlé par des
polygènes et un locus majeur à génotypes inconnus ou seulement partiellement
connus. Pour un caractère contrôlé par des
polygènes
et un locusmajeur
à 2allèles,
leséquations
pour l’estimation du maximum de vraisemblance deseffects génotypiques
au locusmajeur
et des valeursgénétiques polygéniques
ont étédérivées,
permettant aussi d’estimer lafréquence
de l’allèle majeur et lesprobabilités
desgénotypes
à ce locus. Les expressionsobtenues étant incalculables en
pratique,
desapproximations possibles
ont étécomparées
par simulation
stochastique
à 2 autresprocédures proposées
dans la littérature. Bien que lafréquence
de l’allèlefavorable
soit sérieusement sous-estiméelorsque
lesgénotypes
aulocus
majeur
sont entièrementinconnus,
la méthodeproposée
aquelques
avantages sursatisfaisante
pourséparer
l’efJet
desgénotypes majeurs
deseffets polygéniques. Cependant,
la méthodeproposée
estsusceptible
d’être améliorée.locus majeur
/
évaluation génétique/
analyse de ségrégationINTRODUCTION
Statistical methods based on the infinitesimal
model,
theassumption
of manyun-linked loci all with small effects
controlling
quantitative
traits,
have beensuccess-fully applied
inanimal
breeding.
Anincreasing
number ofstudies, however,
havereported
single
locihaving large
effects onquantitative
traits. Such loci are referredto as
major
loci.Examples
are theprolactin
(Cowan
etal,
1990)
and the weaverloci
(Hoeschele
andMeinert,
1990)
indairy
cattle,
and the halothanesensitivity
locus(Eikelenboom
etal,
1980)
and a locusacting
on&dquo;Napole&dquo;
yield
(Le
Roy
etal,
1990),
apork quality
trait,
inpigs.
Only
in the case of the halothane locus has theresponsible
gene been identified andprocedures
for itsgenotyping
become available(l!TacLennan
andPhillips,
1992).
There is no
difficulty
withgenetic
evaluation for traits controlledby
amajor
locus and
polygenes
whenmajor
locusgenotypes
are known. A fixedmajor
locus effect has to be added to the linear model andmajor
locus effects andpolygenic
breeding
values can be estimatedby
the usual mixed modelequations
(Kennedy
etal,
1992).
Whengenotypes
areunknown, however,
satisfactory
statistical methodsare still
lacking.
Selection decisions couldpossibly
be based on animal models thatinclude the
major
locus effects in thepolygenic
part of the model. In cases wherethe
allele
has somepositive
effect on 1 trait butnegative
effects onothers,
it wouldbe desirable to have
separate
estimates of themajor
locus andpolygenic
effects available. The 2 estimates would then be combinedaccording
to thebreeding
objective.
Becausegenotyping
of all the animals of apopulation
islikely
to be tooexpensive
if at allpossible,
statistical methods arerequired
that estimatemajor
locusgenotype
effects as well aspolygenic
effects andmajor
locusgenotype
probabilities
for each candidate.Such a method was first
proposed
in humangenetics
by
Elston and Stewart(1971).
The unknown parameters of the model are estimatedby
maximizing
thelikelihood of the data. For models with both
major
locus andpolygenic
effects exact calculations are veryexpensive
and become unfeasible forpedigrees
with more than! 15 individuals. Several studies
compared
thepower of different
approximations
of the likelihood function to detect a
major
locus in half-sibfamily
structures in animalbreeding
data(Le
Roy
etal, 1989;
Elsen and LeRoy,
1989;
Knott etal,
1992a).
Hoeschele(1988)
developed
an iterativeprocedure
to estimatemajor
locusgenotype
probabilities
and effects as well aspolygenic breeding
values. Theequations produced
for the estimation ofgenotype
probabilities
were derived forsimple population
structures and were based on anapproximation
of the likelihoodfunction.
Kinghorn
et al(1993)
used the iterativealgorithm
of van Arendonkregression
ongenotype
probabilities.
A method wasproposed
to correct for the bias inherent in suchanalyses.
The
objectives
of thisstudy
were:i)
to derive exact maximum likelihood equa-tions to estimatemajor
locusgenotype
probabilities
and effects for aquantitative
trait with mixed
major
locus andpolygenic
inheritance without anyrestrictions
onpopulation
structure;
ii)
to examinepossible
approximations;
andiii)
to compare theseapproximations
with the methods of Hoeschele(1988)
andKinghorn
et al(1993)
by
stochasticcomputer
simulation.METHODS
Model
Consider a
quantitative
trait which is controlledby
1 autosomalmajor
locus with2
alleles,
A and a, and many other unlinked loci with alleles of small effects. Mendeliansegregation
is assumed for all alleles at all loci. The allele with themajor effect, A,
has afrequency
of p in the basepopulation,
which is assumed to beunselected,
not inbred and inHardy-Weinberg
andgametic equilibria.
In the basepopulation
the 3possible
genotypes
at themajor
locus(AA,
Aa andaa),
which will be denoted as1,
2 and 3throughout
this paper, are thereforeexpected
to occur in
frequencies
of p
2
,
2p(1-p)
and(1-p)
2
,
respectively.
Becausegenotyping
of animals
might
beimpossible
or tooexpensive,
we assume for the moment that thegenotypes
at themajor
locus are not known. With 1 observation per animal thefollowing
mixed linear model can be formulated:where y = observation vector
b = vector of
non-genetic
fixed effectsg = vector of fixed
major
locusgenotype
effects[g
1
92g3!!
a = vector of random
polygenic
breeding
valuese = vector of random errors
X,Z =
known incidence matricesT = unknown incidence matrix
indicating
truemajor
locusgenotypes
of all the animals in thepopulation
The
expectation
and variance of the random variables are assumed to be:action of the
polygenes
is assumed but dominance is allowed for at themajor
locus. In order tokeep
the modelsimple,
it is further assumed that the variancecomponents
Q
a
and Q
e
are
known. Thisassumption
implies
that thegenetic
variance causedby polygenes
is known but not thegenetic
variation causedby
thesegregating
major allele,
which is determinedby
themajor
genotype
effects andfrequencies.
This criticalassumption
has to bekept
in mind whendiscussing
tlte simulation results.Likelihood function
The likelihood for mixed model
[1]
was first discussedby
Elston and Stewart(1971).
The likelihood can be written as:
is a normal
density
andPr(Tlp)
is theprobability
of Tgiven
the allelefrequency
p and the
pedigree
information. Because variancecomponents
are assumed to beknown,
cl =(27r)&dquo;°’!&dquo; -
!V !
.ol e 21-1.1,
with no as the number ofobservations,
is a constant.Following
Elston and Stewart(1971), Pr(Tlp)
can becomputed
as aproduct
ofprobabilities:
,,
where N
is
the total number of animals in thepopulation
andPr(! !s!d)
is theprobability
of animali having
genotype
indicatedby
t
i
,
the ith row ofT, given
the
genotypes
of itsparents
s andd,
and is assumed to be known. Elston and Stewart(1971)
give
Pr(ti!t9,td)
for autosomal and sex-linked loci. When theparents
are unknown
Pr(tz!ts,td)
isreplaced
by
thefrequency
of thegenotype t
i
in thebase
population.
Knownmajor
locusgenotypes
can be accomodatedby setting
Pr(! !,!)
to zero whenever ti conflicts with the knowngenotype
of animal i. With the basepopulation
(animals
with unknownparents)
inHardy-Weinberg
equilibrium,
Pr(Tlp)
can be written as:where nl, n2 and n3 are the number of base animals of
genotype
AA,
Aa and aa,respectively,
and nb = nl + n2 + n3 is the total number of base animals.With 3
possible
genotypes
the sum in[2]
is over3
N
elements. For 20 animals the sum isalready
over 3.5 x10
9
possible
incidence matrices T. Whenever T conflicts with thepedigree
informationPr(Tlp)
is zero.Therefore,
depending
on thepedigree
structure,
alarge
number of the elements to sum are zero, but there remains aAs
pointed
outby
Elston and Stewart(1971)
the 3 likelihoods conditional on ananimal’s
genotype t
i
areproportional
to theprobabilities
of animali having
1 of the 3possible
genotypes.
The conditional likelihoods can be obtainedby skipping
animal i in the summation over all
possible
incidence matrices T. Maximum likelihood estimationIn order to maximize
L(y),
we need the first derivatives withrespect
tob,
g and p:The
probability
of Tgiven
the data and theparameters
ofthe
model will be denoted wT and can becomputed
aswhere c2 is the
product
of cl and ascaling
factor such thatE
WT
= 1. Note thatT
without
scaling
this sum isequal
to the likelihoodL(y).
Aftersetting
to zero andrearranging
weget
the 2following
equations:
Solving
forp
in the lastequation
leads to:This
equation
can be rewrittenby
replacing
2n
1
+ n2by
v!.
T.[2 1 0!’,
withv’
a row vector oflength
N with ones for base animals and zeros for the other animals.Because mT
depends
onb,
g and p,equations
[3]
and[4]
have to be solvedtrue values and
Q’ =
L
wTT.
Note that the ikth element ofQ!
at convergence isT
an estimate of the
probability that
animal i is ofgenotype
kgiven
the data and the estimates for the fixedeffects b,
themajor
locus effectsg
and the allelefrequency p.
As mentionedabove,
the same estimate can be obtainedby calculating
likelihoodsconditional on an animal’s 3
genotypes.
Using
thesedefinitions,
equations
[3]
and[4]
can be written as:The solutions for
b
T
,
i’
andp
r
converge to maximum likelihood(VIL)
estimates. Local maxima inL(y)
could pose aproblem
and will be discussed later. Hoeschele(1988)
estimated the allelefrequency
from thegenotype
probabilities
of all animals with records whereas[6]
considersonly
baseanimals,
which is inagreement
with Ott(1979).
Becausegenotype
probabilities
of base animals take information from their descendants into account, all information on the allelefrequency
in the basepopulations
isproperly
usedby
!6J.
Animal breeders are not
only
interested inestimating major
locus effects g and allelefrequency
p but also inpredicting polygenic breeding
values a. This isusually
doneby
regressing
phenotypic
observations corrected for fixed effects:where
Q
isQ!
at convergence.Using
V-
1
=1
[ZAZ
>.-
,
+1]!!
= I -ZMZ’,
whereM =
[Z’Z
+A-
I
>.]-
1
(Henderson, 1984),
a can also becomputed
as:The same solutions for
b,
g
and a are obtainedby
iterating
on thefollowing
equations
together
with[6]
instead ofusing
(5!, [6]
and!7!:
Note that
2.:: wTT’Z’ZT
=diag(v§ .
q[)
=D
r
,
wherevb
is a row vectorT
difficulty
with thisapproach
is that it is not feasible tocompute
Q’
and!
tUy -
*T
T’Z’ZMZ’ZT
forlarge populations.
Approximations
Above
Q
r
was defined as:There are 2
problems
associated with thecomputation
ofC!’’. Firstly,
thesummation is over all
possible
incidence matrices Tand,
secondly,
aquadratic
form
involving
V-’
has to becomputed
for each element in this sum. It can beshown that the
following
is anequivalent
expression
notinvolving
V-
1
:
where
£11
=MZ’(y -
Xbr -)
r
ZTg
(Le
Roy
etal,
1989).
BecauseaT
depends
on
T,
we would have tocompute
fill
for everypossible
T,
which is not feasible. In order tosimplify
thecomputations,
we couldreplace
*11
by
M which does notdepend
on T. Note that âr =L wT’
âT.
Thisapproximation
was also consideredT
by
Hoeschele(1988).
Theapproximated
Q!
is then:Instead of
using
asingle
estimate of thepolygenic
breeding
value for each animalirrespective
of itsgenotype,
we could use 3 values for each animaldepending
onits
genotype
butindependent
of thegenotypes
of all the other animals. A similarapproximation
was consideredby
Elsen and LeRoy
(1989)
and Knott et al(1992a,
1992b)
for a sire model and was found to besuperior
to[9].
We considered thefollowing approximation:
where xi and
t
ik
are the ith rows of X andZT,
a
?3
is theijth
element ofA-
1
,
and cii is the
diagonal
element of the coefficient matrix in[8]
pertaining
to the ith animalequation.
The summation over all
possible
incidence matrices T in[9]
or[10]
can be avoidedby
using
algorithms developed
to estimategenotype
probabilities.
Here,
the iterativealgorithm
of van Arendonk et al(1989)
wasapplied.
Thisprocedure
will bebriefly
described in the next section.
As with
Q!
thedifficulty
withexpression
E w’ -
T’Z’ZMZ’ZT
istwo-fold;
the sum is over allpossible
T,
and thecomputation
of each element in that sum isexpensive.
Let m2! be theijth
element ofZ’ZMZ’Z,
andt
ik
(t
jl
)
be the elements of T for animali(j)
andgenotype
/c(l).
Now,
the klth element ofL
wTT’Z’ZMZ’ZT
can be calculated as:
Note that at convergence
W’ -
tik .<_,;
is an estimate of theprobability
thatT
animal i is of
genotype
k andanimal j
of
genotype
L,
given
the data. Forindependent
animals thisquantity
isequal
toq’
ik
qj’l
theproduct
of thecorresponding
elements inQ’’
and, therefore,
the contributions ofL wTT’Z’ZMZ’ZT
andQ&dquo; Z’ZMZ’ZQ’
T
to B’’ cancel out. For
dependent
animals the contributions to the klth element of B’ are:Now if we
neglect
thedependencies
between animals for thecomputation
ofL
w2..
tik .t
jl
weget:
T
and
[8]
becomes identical to the mixed modelequations
given
by
Hoeschele(1988).
Another way toapproximate
B’’ is to assume that A = I. We thenget:
Estimation of
genotype
probabilities
Van Arendonk et al
(1989)
developed
an iterativealgorithm
to estimategenotype
probabilities
for discretephenotypes. Kinghorn
et al(1993)
applied
thisalgorithm
to continuous traits. Thecomparison
of thisalgorithm
with non-iterative methods revealed some errors in the formulaegiven
in theoriginal
paper(LLG
Janss and JAM vanArendonk, 1991;
CStricker,
1992;
personal
communications).
Weapplied
a corrected version of this
algorithm.
For each
animal,
genotype
probabilities
from 3 different sources of information arecomputed using approximation
[9]
or[10].
One round of iteration involves 3steps.
Firstgenotype
probabilities
arecomputed
using
information fromparents
andcollateral relatives
proceeding
from the oldest to theyoungest
animal. In the secondstep, genotype
probabilities
are calculatedusing
information from the progenyproceeding
from theyoungest
to the oldest animal.Finally,
genotype
probabilities
using
information from each individualperformance
are calculated and the 3 sourcesof information combined. The iteration process is
stopped
when the solutions forgenotype
probabilities
reach agiven
convergence criterion.The
algorithm
works forsimpler pedigree
structures as simulated in thisstudy
but does not allow forloops
in thepedigree,
also known ascycles
(Lange
andElston,
1975).
Loops
in apedigree
occurthrough
genetic paths
(inbreeding loops),
mating
paths,
or a combination of the 2(marriage loops),
eg, a sire mated to 2genetically
related dams. Bothinbreeding
andmarriage
loops
are common in animalbreeding
data. A non-iterative
algorithm
forpedigrees
withoutloops
wasrecently proposed,
which should be more efficient than the one used in this
study
(Fernando
etal,
1993).
Method of Hoeschele
(1988)
Hoeschele
(1988)
used aBayesian approach
to derive an iterativeprocedure
to estimategenotype
probabilities Q,
allelefrequency
p andmajor
locus effects g forsimple
pedigree
structures. Thegenotype
probabilities
were estimatedby
formulae that were
developed
for thespecific pedigree
structures consideredusing
approximation
[9].
In contrast to[6],
Hoeschele(1988)
estimated p from thegenotype
probabilities
of all animals with records:where no is the number of animals with records and
vo
is a row vector with ones foranimals with records and zeros otherwise. The
equations
that estimate the effectsof model
[1]
are the same as[8]
approximated
with[11].
Weapplied
this methodin the simulation
study
using
the iterativealgorithm
described above but withapproximation
[9]
to estimategenotype
probabilities
instead of the formulaegiven
by
Hoeschele.Method
of Kinghorn
et al(1993)
the
least-squares
estimates are biased(see,
forexample,
Johnston, 1984,
p428).
Kinghorn
et al(1993)
treated the unknown incidence matrix T as the unknowntrue
independent
variable and thegenotype
probabilities
Q
as an estimate for T associated with some errors.Using
Q
instead of T in the model leads to biased estimates ofg
*
.
Kinghorn
et al(1993)
derived a correction matrixW,
such thatg
=W!!§* .
Given certainassumptions,
they
showed that W =V!V(,
whereV
t
is a 3 x 3 covariance matrix of elements in the 3 columns of T andV
9
is thecorresponding
covariance matrix of elements in the 3 columns ofQ.
Because(co)variances
inV
Q
aregenerally
smaller than(co)variances
inV
t
,
major
locus effects are overestimated in absolute terms whenusing
Q
instead of T. The(co)variances
inV
9
were calculated from the actual solutions for estimates ofgenotype
probabilities
of all animals with records. Covariances inV
t
werecomputed
as:
where
q
.k
is the averagegenotype
probability
forgenotype
k of all animals with records and can beregarded
as an estimate of thefrequency
of thatgenotype
in thepopulation.
Genotype
probabilities
were estimated with thealgorithm
ofvan Arendonk et al
(1989).
Thisalgorithm
requires
the allelefrequency
p as aninput
parameter.Kinghorn
et al(1993)
kept
the initial value for p constant over alliterations,
ieregarded
the initial p as the true value. But if p wasknown,
Cov(t
k
,t¡)
could also be derived from the
expected
frequencies
of the 3genotypes.
In ourimplementation
Cov(t!,tl)
wascomputed
with[14]
and the allelefrequency
p wasestimated with
(13!,
which is a natural deduction from!14!.
The linear model can be written in matrix notation as:
Kinghorn
et al(1993)
assumed thatVar(a
*
)
=Var(a)
= A -Q
a
andVar(e
*
) =
Var(e)
= I -Q
e.
The matricesQ
and W are not known and haveto
be estimatedfrom the data as described above.
Therefore,
thefollowing
system
ofequations
has to be solvediteratively:
Estimates for g should be unbiased but estimates for b and a are still biased. We
attempted
to correct forthe bias
inb by
adding
(X’X)-
l
X’ZQ(W -
I)g’’
+1
,
theexpected
difference betweenb
r+1
andb
*r+1
under theassumptions
E(T)
=E(Q),
Simulation
The methods of Hoeschele
(1988)
andKinghorn
et al(1993)
werecompared
withthe method
developed
in thisstudy applying
approximations
[10]
and[12]
using
stochasticcomputer
simulation.Phenotypic
observations weregenerated by
using
the
following
mixed model:where
hys
i
is the fixed effect of herd x year x sexi,
g! is the fixed effect ofmajor
locusgenotype
j,
a2!! is thepolygenic
breeding
value and e2!! is the random residual effect. The effects in the model weresampled
as follows:f hys
i
N(0,I
J
fI)
fa
ijk
} -
N(0,A
J§
)
and{e2!! } N
N(0,I
J§
) .
Major
locusgenotypes
were simulatedwith 2
segregating
alleles.Genotypes
of base animals weregenerated
by sampling
2 alleles from a uniform distribution between 0.0 and 1.0 with threshold p, the
frequency
of allele A.Genotypes
of progeny were determinedaccording
to mendeliansegregation.
The effect ofgenotype
3 was set to zero as there is adependency
between fixed herd x year x sex and
major
locus effects.Three different sets of
parameters
were used(table I).
Only
additive effects ofthe
major
locus wereconsidered,
although
all of the methodscompared
allow fordominance. In the first set of
parameters, 50%
of thephenotypic
variance(variance
due tomajor
locus +polygenic
variance + residualvariance)
is due togenetic
effects,
75%
of thegenetic
variance is due to themajor locus,
and25%
is due to thepolygenes.
Thefrequency
of allele A withmajor
effect is25%
in the basepopulation,
which results in an allele substitution effect a of1.0,
iegenotype
effects of 2.0(AA),
1.0(Aa)
and 0(aa).
Inparameter
set2,
the allelefrequency
p is0.5,
but thegenotype
effects and all the otherparameters
are the same as in set 1. Thus the variance due to themajor
locus is increased from 0.375 to0.5,
and thephenotypic
variance
changes
from 1.0 to 1.125. Inparameter
set3,
the allelefrequency
p is
0.25 and50%
of thephenotypic
variance is due togenetic effects,
as inparameter
set1,
but theproportion
ofgenetic
variance due to thepolygenes
is increased from 25 to40%,
which results in an allele substitution effect a of0.8.
simple.
In each of 10herds,
20 base dams each had a record in year 1. A group of 20 base sires each with their own record in a common herd x year(eg
teststation)
was mated with these base dams. Each sire was
randomly
mated with 1 dam in each herd. Eachmating
produced
5 progeny in year 2. The sex of each progeny was determinedby sampling
from a uniform distribution between 0.0 and 1.0 withthreshold 0.5. The
population
size was1220,
made up of 220 base animals and1 000 progeny.
In each of the
alternatives,
the same sequence of random numbers was used.Therefore,
identical data sets wereanalysed
with each of the 3 methods considered. Each alternative wasreplicated
25 times.With each of the 3
methods,
final solutions are obtainedby
repeatedly
computing
genotype
probabilities
andsolving
asystem
ofequations
toget
new solutions formajor
genotype
effects andpolygenic breeding
values. Astopping
criterion of the form:was used for
major
genotype
effects g and the allelefrequency
p.RESULTS
When the
genotypes
of all animals with records areknown,
the estimates formajor
locus effects g are identical for all 3 methods considered
(table II).
Estimates for theallele
frequency
p,however,
differedslightly.
Using
formula[13]
(Hoeschele,
1988;
Kinghorn
etat,
1993)
the standard deviations(SD)
of estimated p werelarger
than estimates
by
[6].
The estimates for g and p agree well with the true values. Estimates of g acrossparameter
sets areconsistently slightly larger
than the truevalues,
which can beexplained by sampling
effects and the fact that for each of the25
replicates,
data for the 3parameter
sets weregenerated
with the same set of random numbers. Asexpected
from theheritabilities,
the correlations between true andpredicted breeding
values were the same forparameter
sets 1 and 2 andslightly
higher
forparameter
set 3. The correlations betweenpredicted breeding
values and estimatedmajor
locus effects were close to zero,showing
that the 2 effects werewell
separated
in all cases.Table III shows the simulation results for the 3
parameter
setsusing
all 3procedures
whenmajor
locusgenotypes
were unknown. Forparameter
sets 1 and2,
estimates ofmajor
locus effects g were close to the true values orslightly
underestimated with
approximated
maximum likelihood(AML),
underestimatedby
about20%
with the method of Hoeschele(1988)
and overstimatedby
25 to30%
with the method ofKinghorn
et at(1993).
Forparameter
set3,
estimates ofmajor
locus effects g were zero for 2replicates
using
AML and for 21replicates
using
the method of Hoeschele(1988).
Non-zero estimatesof g
were biasedupwards
by
14%
with A1VIL andby
47%
with the method ofKinghorn
et at(1993).
Both ANIL and the method of Hoeschele(1988)
showed alarge variability
of the non-zero estimates ofmajor
locus effects forparameter
set 3. When the true alleleAML,
but estimatedquite
well with the other 2 methods. Correlations between true andpredicted
breeding
values were similar for AML and the method of Hoeschele(1988),
but zero for the methodof Kinghorn
et al(199_3).
Forparameter
sets 1 and2,
the correlations between true(Tg)
and estimated(Qg)
major
locus effects weresimilar for all 3 methods. When
major
locus effects were smaller(parameter
set3)
these correlations werelargest
with the methodof Kinghorn
et al(1993).
Predicted
breeding
values werepositively
correlated to estimatedmajor
locus effectsQg
with AML and to alarger
extent with the method of Hoeschele(1988).
Using
the method ofKinghorn
et al(1993)
these correlations werestrongly
negative.
Because poor estimation of p also affects all the other
estimates,
additional simulations were done with the allelefrequency
fixed at the true(expected)
value. Results arereported
in table IV for A1VIL and the method of Hoeschele(1988)
forparameter
sets 1 and 3. All other results were close to those of table III and aretherefore not shown.
Major
locus effects g were underestimated less with AML andthe correlations were similar for both
methods.
Forparameter
set3,
the number ofreplicates
with estimates of zero formajor
locus effects wasagain
muchlarger
withthe method of Hoeschele
(1988).
Table V compares the 3 methods for the case where all sires and
50%
of thedams are
gendtyped
at themajor
locus. There was still atendency
for AML to underestimate the allelefrequency
p when the truefrequency
was 0.25. The method of Hoeschele(1988)
underestimatedmajor
locus effectsconsiderably
more thanAML
(9
to31%
ver.sus 1 to11%),
whereas these effects were overestimatedby
22 to
43%
with the method ofKinghorn
et al(1993).
The accuracies ofpredicted
breeding
values wereagain
similar for AML and the method of Hoeschele(1988)
butmuch lower for the method of
Kinghorn
et al(1993).
The accuracies of estimatedgenetic
values at themajor
locus were similar for all 3 methods with atendency
oflower accuracies for the method of
Kinghorn
et al(1993).
When all the sires butintermediate between the 2 cases of no animals and all sires
plus
50%
of the damsgenotyped.
So
far,
final solutions have beenreported
for iterations wherestarting
valueswere
equal
to true(expected)
values. Table VI shows the number ofreplicates
thatconverged
to the same solutionsusing
differentstarting
values. Lowstarting
valueswere half the true values and
high
starting
values were 1.5 times the true values ofmajor
locus effects g and allelefrequency
p. Whenmajor
locusgenotypes
werenot
known,
none to a fewreplicates
converged
to asingle
set of solutions with all 3 differentstarting
values. For the method of Hoeschele(1988)
withparameter
set3,
most of thereplicates
thatconverged
to the same solutionsconverged
to an estimate of zero formajor
locus effects g. For AML and the method of Hoeschele(1988),
allthe sires
(but
none of thedams)
were known. Thelargest
number ofreplicates
with all 3 solutions different was found with the methodof Kinghorn
et al(1993).
DISCUSSION
The method
proposed
here(AML)
generally
slightly
underestimatesmajor
locus effects g andseriously
underestimates allelefrequency
p when the truefrequency
is 0.25. The underestimationof p
leads to increased estimatesof g, although
not to the extent that the varianceexplained by
themajor
locusstays
constant(tables
III andIV).
This variance ishigher
when the allelefrequency
is fixed at the true value. The allelefrequency
was stillconsiderably
underestimated forparameter
set 1 when thepppulation
size was 10 timeslarger
than considered here(results
notshown).
The allelefrequency
was estimatedby
(6!,
which was derivedby
maximizing
the likelihood of the
data,
whereas the other 2 methods used[13].
Additional simulation runs withparameter
sets 1 and 3 andapproximations
[9]
and[11]
together
with[6]
showedconsiderably
lower estimates of p andhigher
estimates of g than results for the same 2approximations
applied together
with[13],
the method of Hoeschele(1988) (results
notshown).
There seems to be aproblem
inapplying
[6]
together
withapproximations [10]
and[12]
or, to a lesserextent,
with[9]
and(11!.
Nevertheless[6]
is the correctequation
for the estimation of the alleleThe method of Hoeschele
(1988)
consistently
underestimatedmajor
locus effects g which is inagreement
with the simulation results of the same author. For smallerallele effects
(parameter
set3),
although
stillquite
large,
most of the estimates of g were zero,indicating
that thegenotype
effects have to belarge
in order to berecognized.
The same is true forA1VIL,
but to a lesser extent. There was atendency
for theaccuracies
ofpredicted polygenic breeding
values(a)
and estimatedmajor
locus effects(6g)
to beslightly
higher
with AML than with the method of Hoeschele(1988).
In an unselectedpopulation
as simulated here theexpected
correlation between true
polygenic
andmajor
locus effects is zero. The correlations(gametic disequilibrium)
which will makeseparation
of the 2 effects more difficult.For AML and the method of Hoeschele
(1988),
the mean correlations ra,a werelower and
r- o-
werehigher
when the allelefrequency
was 0.5(parameter
set2)
than when the same allele had afrequency
of 0.25(parameter
set1) (tables
III andV).
Although
theproportion
of varianceexplained by
themajor
locus ishigher
withparameter
set 2 it seems to be more difficult to separatepolygenic
andmajor
locus effects with intermediate allele
frequencies.
This was also foundby
Knott etal
(1992a)
for similarapproximations.
Forparameter
sets 1 and2,
both methods showed alarge
reduction of 35 to40%
for ra,a and 25 to32%
for7
-
T
g,
Q
g
when
genotypes
were unknown rather than known(tables
II andIII).
With the method
of Kinghorn
et al(1993),
estimates of the allelefrequency
p weregenerally
closer to the true values than with the other 2procedures. However, major
locus effects were overestimated and the correlations between true andpredicted
breeding
values were close to zero which is inagreement
with their simulation results. The methodattempts
to correct for the bias inherent inmajor
locus estimatesby
regression
on theindependent
variableZQ!,
an estimate from thedata,
which is associated with some error. The termZQ!
ispostmultiplied
by
the correction matrix W!.ZQ’’W
r
is then used the same way as a usual incidencematrix in the mixed model
equations.
Multiplication by
wr increases the variance of theindependent
variable to the varianceexpected
for the unknown term ZT. Because wr is calculated over all animals withrecords,
the new variance is correctonly
on the average. For an animal with knowngenotype,
the elements inQ!
arecloser to an
identity
matrix incomparison
to dams. Inaddition, breeding
valuesestimated
by
[15]
are still biased. These 2problems
areprobably responsible
forthe overestimation of g and very poor
prediction
ofpolygenic breeding
values. Theperformance
of the method was,however,
less affectedby
smaller allele effects(parameter
set3)
than the other 2procedures.
For all 3
procedures
there was aproblem
of different solutions with differentstarting
values whengenotypes
were unknown. For AML and the method ofHoeschele
(1988)
the cause could be themultimodality
of the likelihood function.It seems to be necessary to
compute
approximated
likelihoods which then can beused to select the solutions with the
highest
likelihood. This could of course alsobe done with the method of
Kinghorn
et al(1993)
but this method has no directrelationship
with maximum likelihood.In this
study
variancecomponents
were assumed to be known but inpractice
have to be estimated.Using
incorrect values could lead to biased estimates ofmajor
genotype
effects andfrequencies.
Forexample, using
an underestimatedgenetic
variance
might
result in an overestimation of themajor
genotype
effects. If amajor
allele is known to be
segregating
variance components free ofmajor
genotype
effects would have to be estimated with model!1!.
This could be very difficult because evenwhen the true variance
components
wereused,
all 3 methodsperformed poorly
whenno animals were
genotyped.
Clearly,
none of the methods issatisfactory
for aseparate
genetic
evaluation forthe
major
locus and thepolygenes.
In thisstudy only
large
effects were considered.AML
and,
especially,
the method of Hoeschele(1988)
were unable to detect smaller effects than used withparameter
set 3. Forexample,
the effects estimated for theprolactin
locus in a Holstein sirefamily
(Cowan
etal,
1990)
were much smallerthan considered here. The method
proposed
has somepotential
forimprovement.
Future research should focus on the
development
ofalgorithms
to estimategenotype
probabilities
without any restriction onpedigree
structures. The estimation ofjoint
genotypeprobabilities
for any 2pairs
of animalstogether
with sparse matrixtechniques
tocompute
the elements of M could avoid the need for some of theapproximations
made in thisstudy.
ACKNOWLEDGMENT
This research was conducted while AH was a
visiting
scientist at theUniversity
ofGuelph.
Financial support from the Schweizerischer
Nationalfonds,
Switzerland,
isgratefully
acknowledged.
REFERENCES
Cowan
CM,
Dentine1VIR,
AxRL,
Schuler LA(1990)
Structural variation aroundprolactin
gene linked toquantitative
traits in an elite Holstein sirefamily.
TheorAppl
Genet79,
577-582Eikelenboom
G,
MinkemaD,
van EldikP,
Sybesma
W(1980)
Performance of Dutch Landracepigs
with differentgenotypes
for the halothane-inducedmalignant
Elsen
JVI,
LeRoy
P(1989)
Simplified
versions ofsegregation
analysis
for detection ofmajor
genes in animalbreeding
data.40th
AnnualMeeting of
EAAP, Dublin,
27-31August,
1989Elston
RC,
Stewart J(1971)
Ageneral
model for thegenetic
analysis
ofpedigree
data. Hum Hered21,
523-542Fernando
RL,
StrickerC,
Elston RC(1993)
Scheme tocompute
the likelihood of apedigree
withoutloops
and theposterior genotypic
distribution for every member of thepedigree.
TheorAppl
Genet,
in pressHenderson CR
(1984)
Applications of Linear
Models in. AnimalBreeding. University
ofGuelph, Guelph,
CanadaHoeschele I
(1988)
Genetic evaluation with datapresenting
evidence of mixedmajor
gene andpolygenic
inheritance. TheorAppl
Genet76,
81-92Hoeschele
I,
Meinert TR(1990)
Association ofgenetic
defects withyield
andtype
traits: The weaver locus effect onyield.
JDairy
Sci73,
2503-2515Johnston J
(1984)
Econometric Methods.McGraw-HiIl,
New YorkKennedy
BW,
Quinton M,
van Arendonk JAM(1992)
Estimation of effects ofsingle
genes on
quantitative
traits. J Anim Sci70,
2000-2012Kinghorn
BP, Kennedy BW,
Smith C(1993)
A method ofscreening
for genes ofmajor
effect. Genetics134,
351-360Knott
SA, Haley CS, Thompson
R(1992a)
Methods ofsegregation
analysis
for animalbreeding
data: acomparison
of power.Heredity
68,
299-312Knott
SA,
Haley
CS,
Thompson
R(1992b)
Methods ofsegregation
analysis
for animalbreeding
data: parameter estimates.Heredity
68,
313-320Lange K,
Elston RC(1975)
Extensions topedigree
analysis.
I. Likelihood calcula-tions forsimple
andcomplex pedigrees.
Hum Hered25, 95-105
Le
Roy P,
ElsenJM,
Knott S(1989)
Comparison
of four statistical methods for detection of amajor
gene in a progeny testdesign.
Genet SelEvol 21,
341-357 LeRoy
P,
NaveauJ,
ElsenJNI,
Sellier P(1990)
Evidence for a newmajor
geneinfluencing
meatquality
inpigs.
Genet Res Camb55,
33-40VIacLennan
DH, Phillips
MS(1992)
Malignant hyperthermia.
Science256, 789-794
MortonNC,
MacLean CJ(1974)
Analysis
offamily
resemblance. III.Complex
segregation
ofquantitative
traits. Am J Hum Genet26,
489-503Ott J
(1979)
Maximum likelihood estimationby
counting
methods underpolygenic
and mixed models in humanpedigrees.
Am J Hum Genet31, 161-175
Van Arendonk