HAL Id: hal-02693287
https://hal.inrae.fr/hal-02693287
Submitted on 1 Jun 2020
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Sire design power calculation for QTL mapping experiments
A. Carta, Jean Michel Elsen
To cite this version:
A. Carta, Jean Michel Elsen. Sire design power calculation for QTL mapping experiments. Genetics
Selection Evolution, BioMed Central, 1999, 31 (2), pp.183-189. �hal-02693287�
Note
Sire design power calculation for QTL mapping experiments
Antonello Carta a Jean-Michel Elsen
a Istituto Zootecnico e Caseario per 1a
Sardegna, Bonassai,
07040 Olmedo(SS), Italy
b Station d’amélioration
génétique
desanimaux,
Inra, BP 27,31326 Castanet-Tolosan
cedex,
France(Received
25May 1998; accepted
2February 1999)
Abstract - Estimates of sire
design
power forQTL mapping
experiments obtained using three different methods ofalgebraic
approximation wereanalysed by comparing
them with the results of data simulations. Even when the binomial
probability
thatany number of sires out of the total number of sires are
jointly heterozygous
at themarker and the
QT
loci was taken intoconsideration,
thealgebraic
approximationsoverestimated powers. However,
they
could be used to rankdesigns differing
in thenumber of sires if the total size of the
experiment
isgiven.
The results werediscussed, focusing
on the assumptions made about the number of informativeoffspring,
thebalance between the two
offspring sub-groups
which receive the same marker allele from the sire and the distribution of the statistic. Given that a fullalgebraic approach
would be
computationally costly,
data simulation can be considered a useful tool inestimating
the power ofQTL
detection siredesigns. © Inra/Elsevier,
Paris QTL/power/ simulation/
protocol designRésumé - Calcul de la puissance de détection de QTL dans un modèle « père ».
Trois méthodes
analytiques
pour l’estimation de la puissance du protocole fille pour la détection des(!TLs
à l’aide d’un marqueurflanquant
ont été étudiées en comparaisonavec des résultats obtenus par simulation. Ces estimations sont
surestimées,
même quand est prise en compte la distribution de probabilité du nombre de pères doublehétérozygotes
au marqueur et auQTL. Cependant,
elles peuvent être utilisées pour classer desprotocoles
defaçon relative,
à taille depopulation
totale fixée. Les résultats sont discutés en référence auxhypothèses
sur le nombre de descendantsinformatifs,
labalance entre les descendants selon l’allèle marqueur reçu de leur
père,
et la nature desdistributions.
Compte
tenu du coûtnumérique
élevé d’un calculanalytique complet,
les simulations demeurent un outil efficace pour l’estimation de la
puissance
de cesprotocoles
de détection deQTL. @ Inra/Elsevier,
ParisQTL/
puissance/simulation/
planification expérimentale*
Correspondence
andreprints
E-mail: izcszoo@tin.it
1. INTRODUCTION
The use of
genetic
markers to locate genes whosepolymorphism partly explains
thegenetic variability
ofquantitative
traits wasproposed by
Sax[3]
and further detailed
by
Neimann-Sorensen and Robertson[2]
and others. Theprinciple
is toidentify,
in theoffspring
of anindividual,
those which receivedone or other of the two chromosomal
fragments surrounding
the marker inquestion.
If aquantitative
locus is located on thisfragment,
and if theparent
is
heterozygous
at both the marker andQTL (quantitative
traitlocus),
thena
systematic
difference is observed between the twosub-groups
of progeny.With the
development
of molecular markers based on DNAvariations,
theapplication
of these ideas has become feasible on alarge
scaleparticularly
inlivestock
populations,
wherelarge
families areroutinely
recorded. Thedesign
of such
experiments
has been studied in detailby
a number ofauthors,
inparticular
Soller and Genizi[4]
and Weller et al.[6].
In order tooptimize
these
designs,
it is necessary to estimate their power.Focusing
onsimple population structures,
Soller and Genizi[4],
as well as Weller et al.[6],
approached
this power estimationconsidering fully
balancedpopulations,
andusing approximate
distributions of the test statistic. In theseearly
papers, markers were studied oneby
one, and the test statisticsapplied
weresimple
ANOVA
methods, modelling
trait means as linear combinations of sire and marker within sire effects. In theirapproximation,
these authors worked with theasymptotic X 2 or
normalapproximation
of the Fstatistic,
and consideredsimply
the mean contrastaveraging
differentpossibilities
for the sire andoffspring genotypes
at theQT
and marker loci. The power of suchdesigns,
as well as more
complex experiments involving
two or threegenerations
andmixing
half- and full-sibfamilies,
was further studiedby
van der Beek et al.!5).
In their paper, these authors considered the mixture ofsub-populations,
ascharacterized
by
the number ofheterozygous
sires at theQTL,
rather than themean.
Alternatively,
the estimate of thedesign
power may be obtainedby
sim-ulating heterogeneous populations
andapplying
studied test statistics to thegenerated
sets ofdata,
without anyapproximation,
but at the expense of morecomputing
time. Thisapproach
was followedby
LeRoy
and Elsen[1]
in astudy addressing
the relative value of ANOVA and maximum-likelihood methods forQTL
detection.The aim of this
study
is to evaluate thevalidity
ofapproximate
siredesign
power
estimates, by comparing
threealgebraic
methods withsimulating
data.2. HYPOTHESES AND COMPARED METHODS
2.1.
Hypotheses
Powers were calculated for a
single
markeranalysis.
Multiallelic marker loci(with
na = 4alleles)
were studied. AllelesM i
were assumed to be distributed withfrequencies
in ageometric
series(f,
=f, 2 f
=o f, f 3
=a 2 f, ...,
withf
=1/(1+cr+a2 ...)).
In thissituation,
theparameter
a wasobtained, given
themean
heterozygosity
of the marker(E( f hm)), solving
theequation E( f hm) _
1
_
:i . 2 I:i( i o: -l ) )2 /(I l - i : o
This marker wassupposed
to betotally
linked tothe
QTL.
Thedesign
wasorganized
with np half-sib familiescomprising
noprogenies
per sire. mp was theexpected
number of sires for which a markercontrast can be
computed,
i.e. theexpected
number ofheterozygous
sires atthe marker
locus,
andlp,
theexpected
number ofheterozygous
sires at bothmarker and
QT
loci. mo was theexpected
effectivefamily size,
i.e. the mean number ofoffspring
per sire for which the marker allele received from the sire is identified. This effectivefamily
size is linked to the allelefrequencies by
the relation: mo =
£ i j 2 f z f, (1 - 0.5(f i
+/,))/E, j 2/,/ j .
The firsttype
error cx(accepting
a linkedQTL
when it does notexist)
was fixed at 1%.
2.2.
Compared
methodsThe
following
threeapproximations
were studied.1)
Theapproximation
usedby
Weller et al.!6!:
in thisapproximation, only
mean sire and
daughter
numbers were considered. The power wasgiven by
Pl =
P !F(NC(lp), mp, mp(mo - 2)) > f],
whereF (NC(lp), mp, mp(mo - 2))
is a non-central F variable with a
non-centrality parameter NC(lp)
and mpand
mp(mo - 2) degrees
of freedom. The thresholdf corresponds
to the(1 - 0 :) percentile
of the central F distribution. TheNC(lp)
iscomputed
as:
NC(lp)
=lpE 2 (MC)/SE 2 (MC),
whereE 2 (MC)
is the square of theexpectation
of a marker contrast andSE 2 (MC)
is the square of the standarderror of the marker contrast. If a sire is
heterozygous
at theQTL,
thenE 2
(MC)
=GE 2 (1 - r) 2 ,
whereis GE z
the square of the gene effect and rthe recombination rate between the marker locus and the
QTL.
For a half-sibfamily
SE is calculated as(4 - h 2 )/mo where h 2 is
thepolygenic heritability
of the trait
(within QTL genotype).
2)
Theapproximation
followedby
van der Beek et al.!5!:
in thisapproxima- tion,
thevariability
in number ofheterozygous
sires at theQTL
is considered.The power was
given by:
where xp is the number of
heterozygous
sires at theQTL
andPr(xp/mp)
is thebinomial
probability
that xp out of mp(the expected
number ofheterozygous
sires at the marker
locus)
areheterozygous
also at theQTL.
3)
Anapproximation
where variation at both the sire marker and theQT
loci are considered. The power was
given by:
where yp is the number
of heterozygous
sires at the marker locus andPr(yp/np)
is the binomial
probability
that yp out of np sires areheterozygous
at themarker locus.
4)
In order to test thereliability
of the threealgebraic
methodsabove,
thedesign
power was also estimatedby simulating
data andapplying
the standardF test. For each power calculation 10 000
replicates
were used under the null and the alternativehypotheses.
The variance ratio for the classic hierarchical ANOVA was calculated as:where
Zi M1h; (resp. Zi M2k )
are thequantitative performances
of thejth daughter
of anheterozygous
M1M2 sirei,
which received marker allele All(resp. M2),
and T!Mi(resp. ni l’ ln)
is their number. The power was estimatedby
the ratio between the number ofreplicates
under the alternativehypothesis
whose statistic exceeds a certain threshold and the total number of
replicates.
The threshold was the
(1 - a) percentile
of the 10 000replicates
under the nullhypothesis. Thus,
noassumptions
about the distribution of the statistic weremade.
3. RESULTS
Table I
reports
the power estimates of siredesigns
with a half-sibfamily
structure for a gene effect
(GE)
of 0.5 or 1phenotypic
standard deviation(< 7 p),
for various numbers of
sires,
for two totalexperiment
sizes(tno equal
to 500 or1000
daughters),
for a constantpolygenic heritability h z of
0.25 andassuming
a recombination rate
(r)
of 0.Expected heterozygosities
at bothloci,
markerand
QT,
are assumed to be 0.5. Four alleles aresegregating
at the marker locus withfrequencies 0.664, 0.229,
0.079 and 0.028. Note that the totalheritability (including
the variation at theQTL) equals
0.375 if GE =0.5,
0.75 if GE = 1.0.It is shown that when the gene effect is one half ap and the total
experiment
size is 500
daughters,
the threealgebraic
methodsgive
similar resultsand, considering
that the power is low in thissituation,
theseapproximations only slightly
overestimated the power ascompared
to the simulated data. The results for the same GE but with a totalexperiment
size of 1000daughters,
confirmthat no
significant
differences exist betweenalgebraic
methodsexcept
when the number of sires is low in which case Plgreatly
overestimated the power.The overestimation of
algebraic
methods withrespect
to simulations is moreimportant
here than it is with a totalexperiment
size of 500daughters.
As
regards
the GE of 1.0 Qr&dquo; when the totalexperiment
size is 500daughters algebraic
results continued to overestimate powerexcept
forP3,
when the number of sires isequal
to2,
in which case PIgives particularly high
powercompared
to the otheralgebraic
and simulation methods. For a totalexperiment
size of 1 000
daughters,
PIgreatly
overestimated power for any considered number ofsires,
while P2 and P3give
results more similar to simulated data.Power estimates for a constant total
experiment
size and number of sires(1000
and10, respectively),
for two GE values(0.5
and 1O p )
with variousexpected frequencies
ofheterozygosity
at the marker locus(E( f hm))
and atthe
QTL (E( f hq))
are shown in table 11.When GE is
equal
to 0.5 andE( f hm)
is low(0.25-0.5)
the differences betweenalgebraic
methods arenegligible
and there is evidence that the overestimation ofalgebraic
methods tends to become moreimportant
asE( f hq)
increases.Algebraic
results are more realistic whenE( f hm)
is 0.75which
corresponds
toequal frequencies (0.25)
for the four alleles at the markerlocus. The same trends can be
pointed
out for a GE of 1 up.Nevertheless,
inthis case P1 tends to estimate
higher
powers than otheralgebraic
methods and the differences between simulations andalgebraic
methods become verylarge.
4.
DISCUSSION/CONCLUSION
These results showed that
important
differences exist between power calcu- lated withalgebraic approximations
andsimulating
data. Even if the binomialprobability
that any number of sires out of the total number of sires arejointly heterozygous
at both the marker and theQT
loci is taken intoaccount,
asin the P3
method, algebraic approximation
cannotalways
be used to estimatethe power of different sire
designs
forQTL
detection when the totalexperiment
size is
given. However,
eventhough they
overestimate power, P2 and P3 could be used to rankdesigns differing
in the number of sires when the total size of theexperiment
isgiven.
On thecontrary,
it seems to beinadequate
not to in-clude the binomial
probability
and to use theexpected
number ofheterozygous parents
also in order tooptimize
the choice of the number of siresmainly
whenthe total
experiment
size isgiven,
the gene effect islarge
and theexpected frequencies
ofheterozygotes
at the marker and at theQT
loci are close to 0.5.The same conclusions can be drawn from an
analysis
carried outconsidering
a diallelic marker locus
(unpublished data).
Probably, part
of the difference between thealgebraic
and simulation resultscan be attributed to
assumptions
made about the number of informativeoffspring
persire,
the balance between the twooffspring sub-groups
whichreceive the same marker allele from the
sire,
and the distribution of the statistic.As
regards
the distribution of thestatistic,
it should be noted that the use ofx
2 distribution
instead of F did notsignificantly change
thealgebraic
estimatesobtained in this work
(unpublished data).
All in
all,
it would beprogramming
andcomputing costly
to consider all eventualitiesconcerning
theoffspring sub-group
sizesusing
a fullalgebraic approach. Thus, simulating
the data can still be considered in these situationsas the most useful tool for
estimating
the power ofQTL
detection siredesigns.
REFERENCES
[1]
LeRoy
P., ElsenJ.M.,
Numerical comparison between powers of maximum- likelihood andanalysis
of variance methods forQTL
detection in progeny testdesigns:
the case
of monogenic
inheritance, Theor.Appl.
Genet. 90(1995)
65-72.[2]
Neimann-SorensenA.,
RobertsonA.,
The association between blood groups and severalproduction
characteristics in three Danish cattlebreeds,
ActaAgric.
Scand. 11
(1961)
163--196.[3]
SaxK.,
The association of size differences with seed coat pattern andpigmen-
tation in Phaesolu.s
vulgarus,
Genetics 8(1923)
552-560.[4]
Soller M., GeniziA.,
Theefficiency
ofexperimental designs
for the detection oflinkage
between a marker locus and a locusaffecting
a quantitative trait insegregating populations,
Biometrics 34(1978)
47-55.[5]
van der BeekS.,
van ArendonkJ.A.M.,
GroenA.F.,
Power of two- and three-generation
QTL mapping
experiments in an outbredpopulation containing
full-sib orhalf-sib