HAL Id: hal-00199757
https://hal.archives-ouvertes.fr/hal-00199757
Submitted on 19 Dec 2007
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Alternative models for QTL detection in livestock. II.
Likelihood approximations and sire marker genotype estimations
B. Mangin, B. Goffinet, P. Le Roy, D. Boichard, J.M. Elsen
To cite this version:
B. Mangin, B. Goffinet, P. Le Roy, D. Boichard, J.M. Elsen. Alternative models for QTL detection
in livestock. II. Likelihood approximations and sire marker genotype estimations. Genetics Selection
Evolution, BioMed Central, 1999, 31, pp.225-237. �hal-00199757�
Original article
Alternative models for QTL
detection in livestock.
II. Likelihood approximations and sire
marker genotype estimations
Brigitte Mangin a Bruno Goffinet Pascale Le Roy b
Didier Boichard Jean-Michel Elsen
a Biométrie et
intelligence artificielle,
Institut national de la rechercheagronomique, BP27,
31326Castanet-Tolosan,
Franceb
Station
degénétique quantitative
etappliquée,
Institut national de la rechercheagronomique,
78352Jouy-en-Josas,
France’ Station d’amélioration
génétique
desanimaux,
Institut national de la rechercheagronomique, BP27,
31326Castanet-Tolosan,
France(Received
20 November1998; accepted
7April 1999)
Abstract - In this paper, we compare four different methods of
dealing
with theunknown
linkage phase
of sire markers which occurs in the detection ofquantitative
trait loci
(QTL)
in a half-sibfamily
structure when no information is available ongrandparents.
The methods arecompared by considering
a Gaussianapproximation
ofthe progeny likelihood instead of the mixture likelihood. In the first simulation
study,
the
properties
of the Gaussian model and of the mixture model wereinvestigated, using
thesimplest
method for sire gamete reconstruction. Both models lead tocomparable
results asregards
the test power but the mean square error of sibQTL
effect estimates was
larger
for the Gaussian likelihood than for the mixturelikelihood, especially
for maps withwidely spaced
markers. The second simulationstudy
revealedthat the
simplest
method for sire marker genotype estimation was aspowerful
ascomplicated
methods and that the methodincluding
all thepossible
sire marker genotypes was never the mostpowerful. © Inra/Elsevier,
Parishalf-sib family
/
QTL detection/
unknown linkage phase/
Gaussian approxi-mation
/
log-likelihood ratio testRésumé - Modèles alternatifs pour la détection de QTL dans les populations
animales. II. Approximations de la vraisemblance et estimations du génotype
des mâles aux marqueurs. Dans ce
papier,
nous comparons quatreméthodes,
qui permettent de résoudre leproblème
relatif à laphase
inconnue des mâles*
Correspondence
andreprints
E-mail: elsen@toulouse.inra.fr
aux marqueurs dans des familles de
demi-germains, lorsque
aucune informationsur les
grands-parents
n’estdisponible.
Ces méthodes sontcomparées,
en utilisantl’approximation gaussienne
de la vraisemblance à l’intérieur dechaque
descendance à laplace
de la vraisemblance dumélange
de distribution. Dans lapremière
étude parsimulation,
lespropriétés
respectives du modèlegaussien
et du modèle demélange
sont étudiées pour la méthode la
plus simple
de reconstruction desgamètes
des mâles.Les deux modèles conduisent à des tests
comparables
auregard
de leurpuissance
mais l’erreur
quadratique
moyenne d’estimation de l’effet de substitution duQTL
intra-famille est
plus grande
pour le modèlegaussien
que pour le modèle demélange,
en
particulier
pour les cartesgénétiques
très peu denses. La deuxième étude par simulation montre que laplus simple
méthode d’estimation dugénotype
des mâlesaux marqueurs est aussi
puissante
que les méthodesplus sophistiquées
et que la méthodequi
consiste àprendre
en compte dans la vraisemblance tous lesgénotypes possibles
d’un mâle aux marqueurs n’estjamais
laplus puissante. © Inra/Elsevier,
Paris
famille de demi-frères
/
détection de QTL/
phase de linkage inconnue / approximation gaussienne/
test du rapport de vraisemblance1. INTRODUCTION
The
present
paper deals with the detection of oneQTL
in half-sib families when no information is available ongrandparents.
A
general
form of the likelihood ofdetecting QTL
insimple pedigree
structures such as half-sib or full-sib families when marker information is available on progeny,
parents
andgrandparents
waspresented by
Elsen et al.!2!.
This likelihood is a two-level mixture distribution with different
possible
siremarker
genotypes given
markerinformation,
and differentpossible
progenyQTL genotypes given
sire markergenotype
andoffspring
marker information.This paper describes simulations carried out to compare
simplified
likelihoods.As an alternative to the mixture
approach,
wesuggest simplifying
thelikelihood
by considering only
one sire markergenotype.
Three solutions wereexplored:
the first one, close to the Knott et al.proposal !7!,
is the likelihood ofquantitative phenotypes
conditional on the mostprobable
sire markergenotype given
markerinformation,
while in theothers,
the sire markergenotype
is treated as a fixedeffect, estimating
the likelihood of thequantitative
traitobservation
conditionally
orjointly
with the sire markergenotype.
These
comparisons
wereperformed
on asimplified
form of the likelihood withregard
to the mixture of the progenyQTL genotypes.
Thissimplified
likelihoodis the one used in interval
mapping by
linearregression [5, 8]
but instead of least squares tests as in the above papers, maximumlog-likelihood
ratio testswere used. The
properties
of thissimplification
are described in the firstpart
of the paper,using
the likelihood of thequantitative phenotypes
conditionalon the most
probable
sire markergenotype given
marker information.2. COMPARISON OF LIKELIHOOD AND SIMPLIFIED LIKELIHOOD
Most
hypotheses
and notations aregiven
in Elsen et al.!2!.
Notations relatedto this paper are summarized in table 1.
Let
hs, p xl , p x 2
denote the vectors of sire markergenotypes hsj
and ofphenotypic
means of trait distribution!Z 1, pi2.
LetA o
be the likelihood under the nullhypothesis
that noQTL
issegregating
in thepedigree
where !.i is the
phenotypic
mean of sirei offspring.
Let p be the vector of pi. 2.1. Test statisticsThe
general
form of the likelihoodpresented by
Elsen et al.[2]
isThat leads to the maximum
log-likelihood
ratio testFull maximum likelihood for this
type
of likelihoodrequires
a lot of compu- tation because the number ofpossible
sire markergenotypes hs i ,
in the firstsummation,
growsexponentially
with the number of informative markers per sire. Table IIpresents
for T and the other testsproposed
in this paper, the CPU time needed for one simulation.Although
our program couldcertainly
beoptimized,
these results show thatcomputing
T test ispossible
for one dataset but cannot
reasonably
be considered forsimulations;
simulations that aregenerally
needed to obtainsignificant
thresholds.A natural way of
dealing
with thisdifficulty
is to work in twosteps:
in the firststep
aprobable
markergenotype
for each sire is estimated and in the secondstep
thepart
of the likelihoodcorresponding only
to theseprobable
marker
genotypes
is maximized.A
possible
estimate for the sire markergenotypes,
very close to the siregamete
reconstructionproposed by
Knott et al.[7]
may be based onLet
hs
be the vector of estimated sire markergenotypes.
For the secondstep,
the likelihood is reduced toIn order to
simplify
the maximizationstep,
the mixture of distributions in progeny can beapproximated by
a normal distribution withexpectation equal
to the
expectation
of the mixture. Thena
linear model is obtained at eachposition
xalong
the chromosome. LetÃx,hs
denote thissimplified
likelihoodequal
toA simulation
study
was carried out to compare the power ofQTL detection, using
maximumlog-likelihood
ratiotests, T’
andT 2
where2.2. Simulation results
Sire
designs
with 20 sire families of 50 or 20 descendants per sire weresimulated. The
linkage
groupcomprised
three or elevenequally spaced markers,
each with two alleles
segregating
atequal frequency
in thepopulation. Polygenic heritability
was fixed at 0.2 and residualvariability
at l. The power studies were based on aQTL
with two alleles atequal frequency,
located either at 5 or 35 cM from one end of thelinkage
group with additive effectequal
either to 0.5 or to1 and no dominance.
2.2.2. Threshold and power
The null distributions of the test statistics were estimated
simulating
datasets with
polygenic
effectscorresponding
to theheritability
value used in the simulation model.Significant
thresholds forT l
andT 2
are shown in table III.The
largest
difference between the test powers, shown in tableIV,
was observedfor a 20 half-sib progeny
design,
an 11 marker map and aQTL
located at 35 cMwith an additive effect
equal
to 1. In thissituation,
again
of about 10%
was obtained with the mixture likelihood ascompared
to the Gaussian likelihood.However,
other cases did not showlarge
differences and either the first or the second test may be the mostpowerful depending
on the case studied.In the back-cross
design,
these tests have been proven to beasymptotically equivalent
when theQTL
effect is small!9!.
In order to limitcomputing
timethe Gaussian
approximation only
will be considered in the secondpart
of thispaper and in its
companion
paper!4!.
Methods and simulation resultsgiven
with the Gaussianapproximation
may be extended to include a mixture of distributions.2.2.2. Parameter estimates
Despite
power results that werequite
similar for bothmethods,
it isworthwhile
comparing parameter
estimates for theQTL
location and sibQTL
effect.Mean estimates of
position
and ofempirical
standard deviation of theposition
estimate are shown in table V.Obviously,
due to the fact that theposition
estimate is constrained in order tobelong
to thechromosome,
its biaswas found to be more
important
for aQTL
located at thebeginning
of thechromosome than for a
QTL
located near the middle of thechromosome,
but both methods gave similar bias. Standard deviations of theposition
estimateswere
slightly larger
for a Gaussian likelihood than for a mixture likelihood for the morewidely spaced
marker map butthey
werecomparable
for the other map studied.Mean square errors of the within half-sib
QTL
substitution effect are shown in table VI.!
As the bias of
az
is small(data
notshown),
the mean square error isclosely
related to
Results for the Gaussian likelihood in the 11
equally spaced
marker maps may beexplained by considering
the idealized case where theQTL position
is known and located on a marker and for which all sires are
heterozygous
for this marker. The variance of
a i depends only
on the number of informa- tive descendants per sire. For a marker with two alleles atequal frequency,
thenumber of informative descendants is
roughly n i/ 2
and the variance ofai
isthen
8/n i
times the residual variance. For 50(respectively 20)
descendants per sire and a residual varianceequal
to1,
a 0.16(respectively 0.4)
mean squareerror is
expected
in the idealized case. The unknownQTL position,
the distance between theQTL position
andheterozygous
markers forsire,
the unknown sire markergenotypes
and the overestimation of the residual variance when the additiveQTL
effect isgreat [10] explain
the increase in the mean square error.Results for the Gaussian likelihood in the three
equally spaced
marker maps may beexplained considering
a second idealized case where theQTL
is knownto be located at the
beginning
of the chromosome. Asonly
siresheterozygous
at least at one
marker are considered,
three cases of sires(c i ,
c2,c 3 )
exist withdifferent variance of
ai .
cl contains sires that areheterozygous
for the firstmarker,
c2 those that arehomozygous
for the first marker andheterozygous
for the second one, and c3those that are
heterozygous only
for the last marker.The
proportion
of sires in the three classes are about4/7, 2/7
and1/7.
Thevariance of
3f i for
sires in the class ci is aboutwhere r Ci denotes the recombination rate between the first marker
heterozygous
in the class ciand the
QTL
located at thebeginning
of the chromosome. With 50 descendants per sire(respectively 20)
and a residual varianceequal
to1,
a 1.7(respectively 4.2)
mean square error isexpected.
A more favourable location of theQTL (near
the middle of thechromosome)
decreases the mean squareerror.
The estimation of the within half-sib
QTL
substitution effect with the mixture likelihood does notonly
use the mean difference between informative descendantscarrying
allele A at a marker and thosecarrying
alleleB,
but takesadvantage
of information fromhigher
moments of the mixture distribution.Even if this information becomes
negligible
when the number of descendants per sire islarge,
in a finitepopulation
andespecially
for awidely spaced
makermap, it leads to a
significant
reduction of the mean square error.3. OTHER METHODS TO DEAL WITH UNKNOWN SIRE
MARKER GENOTYPES
Errors in sire
gamete
reconstruction can decrease the power of both methods.Knott et al.
[7]
found that in their worst situationonly
6%
of informative sireswere
incorrectly reconstructed,
butthey
had studiedlarge
half-sib families with 100 descendants per sire.Table VII
shows, for
onemale,
theempirical probability
of correct recon-struction based on
hs i
over 1 000replications.
We confirm a 6%
maximumerror in
large
families but found up to 30%
errors in smallerfamilies,
whichled us to
study
alternative methods.The rationale of the
following
alternative methods is that their aim is not toimprove
thequality
of siregamete
reconstructions but to increase the powerof
QTL
detection. It is not necessary to work in twosteps
and thehs i
markergenotypes
can be treated as nuisanceparameters.
3.1. Estimations of sire marker
genotypes
based on conditional likelihood ofquantitative phenotypes
The first alternative method is to treat the
hs i parameters
as fixed parame- ters in the likelihood ofquantitative phenotypes given
the markerinformation,
rj
i A!,hsi.
The full maximum is obtained after a search on a continuous space for theQTL
location andeffect,
within sire mean and varianceparameters
andon a discrete space for the sire marker
genotype parameters.
Thisleads,
withthe Gaussian
approximation
of the mixture in progeny, toestimating
the siremarker
genotypes by
The maximum
log-likelihood
ratio test thengives
3.2. Estimations of sire marker
genotypes
onweighted
conditional likelihoodEstimating
the sire markergenotypes by using only
theprevious
likelihoodfunction means
neglecting
information contained inp(hs il M i ). Alternatively,
the within sire conditional likelihood could be
weighted by p(hs i IM i ) giving
the
weighted
conditional likelihood to be maximized!ip(hsi!Mi)Ai’hs!.
This
leads,
with the Gaussianapproximation
of the mixture in progeny, toestimating
sire markergenotypes by
The maximum
log-likelihood
ratio test is thenequal
to3.3. No estimation of sire marker
genotypes
The last method is based on the likelihood function A!
proposed by
Elsenet al.
!2!, using
the Gaussianapproximation
of the mixture in progeny. The maximumlog-likelihood
ratio test isequal
toIn
practice,
the three testsproposed
should beslightly
modified to take intoaccount that the sire marker
genotype
space isgrowing exponentially
with thenumber of informative markers per sire. This sire marker
genotype
space could be limited togenotypes
thatsatisfy p(hs i I M i ) greater
than agiven value,
fixedin the simulation
study
to 0.01.3.4. Simulation results
Significant
thresholds and powers forT’, T’, T 4
andT 5
are shown intables VIII and IX. On the whole the
compared
tests gave very similar power for all of the situationsstudied, suggesting
that thesimplest
method can beused,
to avoid unnecessarycomputation.
Thissimilarity
between tests may be attributed to thehigh percentage
of correct siregamete
reconstruction.Only
when markers werewidely spaced
and whenfamily
size waslimited,
didestimating
sire markergenotypes
on theweighted
likelihoodgiven
the markerinformation lead to a
slightly
morepowerful
test.4. DISCUSSION AND CONCLUSION
When the marker map is
known,
A! isproportional
to thejoint
likelihood ofquantitative
and marker observationsconsidering
thehs i
as randomparameters
with uniformprior distribution,
sojoint
or conditional likelihoodsgive
thesame results in terms of
QTL parameter
estimates and detection test. Thejoint
likelihood was first usedby Georges
et al.[3]
to mapQTL
indairy
cattleby considering only sire-by-sire analyses.
Then Jansen et al.[6] computed
theconditional likelihood and considered
pooled
sireanalysis.
As mentionedby Georges
et al.!3),
theproblem
with this likelihood is due to the fact thatonly
the
lit’i - / i x2 can
be estimated when there is no information ongrandparents.
Indeed, using
the alternativeparametrization p’[l
=!;+o!/2, p’[2
2 =Pi - 0:’[ /2
it has been
proved (in
theAppendix)
that thesign
ofai
cannot be estimated.This is not
important
for anobjective
ofQTL
detection but shows the limit of thismethod,
if anobjective
is to pursueQTL
effect estimationsimultaneously.
For all the methods
studied,
theempirical significance
thresholds wereobtained
by
simulations of sire and progeny markergenotypes
and of thequantitative performances.
Inpractice
apermutation
test[1]
or a Monte-Carlo simulationtaking
account of the correct marker structure should be used. This could lead toslightly
different threshold values.However,
because veryhigh
correlations between tests were
observed,
we can guess that the conclusionsconcerning
the different tests should notdepend
on the chosensignificance
threshold.
Modelling
progenyquantitative
observations with mixture distributions isa more
computationally demanding approach
than methodsusing
Gaussiandistributions. Previous studies
[5, 10]
havecompared
in asingle family
theestimates obtained
using
mixture or Gaussian models.They
concluded that the estimation accuracy is similar in bothmodels, except
for the residual variance when aQTL
oflarge
effect ismapped
in awidely spaced
marker map. Ourstudy
onmultiple
families showed that the accuracy of within half-sibQTL
substitution effect estimates decreased
significantly
for the Gaussian modelcompared
to the mixturemodel, especially
in a morewidely spaced
markermap and even if the
QTL
effect was notlarge, although
the test power and the accuracy of theQTL position
remainedcomparable.
Our
comparison
of alternative methods forhandling
theproblem
of unknownsire marker
linkage phases
showedclearly
that asimple
method of reconstruct-ing
the siregenotype
is almost aspowerful
as morecomplex methods, especially
the one that takes into account all the
possible
sire markergenotypes
since theT
5
test was never the mostpowerful
test.REFERENCES
[1]
ChurchillG.A., Doerge R.W., Empirical
threshold values forquantitative
traitmapping,
Genetics 138(1994)
963-971.[2]
ElsenJ.M., Mangin B.,
Gof&netB.,
LeRoy P.,
BoichardD.,
Alternative models forQTL
detection in livestock. I. General introduction, Genet. Sel. Evol.(1999)
213-224.[3] Georges M.,
NielsenD.,
MackinnonM.,
MishraA.,
OkimotoR., Pasquino A.T., Sargeant L.S., Sorensen A., Steele M.R., Zhao X., Womack J.E., Hoeschele I., Mapping
quantitative trait locicontrolling
milkproduction
indairy
cattleby exploit- ing
progenytesting,
Genetics 139(1995)
907-920.[4]
GofflnetB.,
LeRoy P.,
BoichardD.,
ElsenJ.M., Mangin B.,
Alternative models forQTL
detection in livestock. III.Assumption
about themodel,
Genet.Sel. Evol.
(1999)
in press.[5] Haley C.S.,
KnottS.A.,
Asimple regression
method formapping quantitative
trait loci in lines crosses
using flanking markers, Heredity
69(1992)
315-324.[6]
JansenR.C.,
DavidL.J.,
Van ArendonkJ.A.M.,
A mixture modelapproach
to the
mapping
ofquantitative
trait loci incomplex populations
with anapplication
to
multiple
cattlefamilies,
Genetics 148(1998)
391-400.[7]
KnottS.A.,
ElsenJ.M., Haley C.S.,
Methods formultiple-marker mapping
ofquantitative
trait loci in half-sibpopulations,
Theor.Appl.
Genet. 93(1996)
71-80.[8]
Martinez0.,
CurnowR.N., Estimating
the locations and the sizes of the effects ofquantitative
trait lociusing flanking markers,
Theor.Appl.
Genet. 85(1992)
480-488.
[9]
RebaiA.,
GofflnetB., Mangin B., Comparing
power of different methods forQTL detection,
Biometrics 51(1995)
87-99.[10]
XuS.,
A comment onsimple regression
method for intervalmapping,
Genet-ics 141
(1995)
1657-1659.APPENDIX: Proof of the
nonestimability
of thesign
of the within half-sibQTL
substitution effectThe likelihood A! is a function of
parameters pfl and J .Lf 2 . Using
thealternative
parametrization pfl
1 = Pi +o© /2, = 2 f p
pi -o© /2,
A! is a functionof
Q!
and we denoteby A !hs’ (ai )
the termscorresponding
to descendantsConsider a
genotype hs i
={hsi , hs2}
and itssymmetrical hsi
={hs2, hsi }.
When there is no
ancestry information,
it is obvious thatUsing
the relationp(d’[j
=q/hsi, M i )
= 1 -p(d-T
=q/hs!, Mi)
it caneasily
be proven that
In
A!,
termscorresponding
tohs i
andhsi
can begrouped
to obtainBecause A! is a
symmetrical
function of theai parameters,
theirsign
cannotbe estimated.