HAL Id: hal-00894267
https://hal.archives-ouvertes.fr/hal-00894267
Submitted on 1 Jan 1999
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Random model approach for QTL mapping in half-sib
families
Mario L. Martinez, Natascha Vukasinovic, Freeman Gene (a.E.)
To cite this version:
Original
article
Random model
approach
for
QTL
mapping
in half-sib families
Mario L.
Martinez,
Natascha Vukasinovic*
Gene
(A.E.)
Freeman
Department
of AnimalScience,
Iowa StateUniversity, Ames,
IA 50011, USA(Received
7April
1998;
accepted
9 June1999)
Abstract - An interval
mapping procedure
based on the random modelapproach
wasapplied
to investigate itsappropriateness
and robustness forQTL
mappingin
populations
withprevailing
half-sibfamily
structures. Under a randommodel,
QTL
location and variance components were estimatedusing
maximum likelihoodtechniques.
The estimation of parameters was based on thesib-pair approach.
Theproportion
of genesidentical-by-descent
(IBD)
at theQTL
was estimated fromthe IBD at two
flanking
marker loci. Estimates forQTL
parameters(location
andvariance
components)
and power were obtainedusing
simulateddata,
andvarying
thenumber of
families, heritability
of the trait,proportion
ofQTL
variance, number of marker alleles and number of alleles atQTL.
The mostimportant
factorsinfluencing
the estimates of
QTL
parameters and power wereheritability
of the trait and theproportion
ofgenetic
variance due toQTL.
The number ofQTL
alleles neither influenced the estimates ofQTL
parameters nor the power ofQTL
detection. Witha
higher heritability, confounding
betweenQTL
and thepolygenic
component wasobserved. Given a sufficient number of families and informative
polyallelic markers,
the random model
approach
can detect aQTL
thatexplains
at least 15 % of thegenetic
variance withhigh
power andprovides
accurate estimates of theQTL position.
For fine
QTL
mapping and proper estimation ofQTL
variance, moresophisticated
methods are, however,
required. ©
Inra/Elsevier,
Paris QTL/
randommodel / interval
mapping/
sib-pair methodRésumé - Approche en modèle aléatoire pour la détection de QTL des familles de
demi-frères
(soeurs).
Uneprocédure
decartographie
basée surl’approche
en modèlealéatoire a été
appliquée
de manière à examiner sapertinence
et sa robustesse pourla détection de
(aTLs
dans lespopulations
oùprévaut
la structure en familles dedemi-frères. Dans un modèle aléatoire, la position du
QTL
et les composantes devariance ont été estimées en utilisant les techniques de maximum de vraisemblance.
*
Correspondence
andreprints:
AnimalBreeding Group,
Swiss Federal Institute ofTechnology,
Clausiusstr. 50, 8092Zurich,
SwitzerlandL’estimation des
paramètres
a été basée surl’approche
par les pairesd’apparentés.
La
proportion
degènes identiques
par descendance(IBD)
auQTL
a été estimée àpartir de l’IBD à deux loci de marqueurs
flanquants.
Les estimées desparamètres
pour le
QTL
(position
et composante devariance)
et la puissance ont été obtenus enutilisant des données simulées et en faisant varier le nombre de
familles,
l’héritabilité ducaractère,
laproportion
de variance auQTL,
le nombre d’allèles au marqueuret le nombre d’allèles au
QTL.
Les facteurs lesplus
importantsinfluençant
les estimées deparamètres
auQTL
et la puissance ont été l’héritabilité du caractèreet la
proportion
de variancegénétique
due auQTL.
Le nombre d’allèles auQTL
n’ainfluencé ni les estimées des
paramètres
auQTL
ni la puissance de détection duQTL.
À
une héritabilité élevée, on a observé une confusion entre la composanteQTL
et lacomposante
polygénique.
S’il y a un nombre suffisant de familles et de marqueurspolyallèliques informatifs, l’approche
du modèle aléatoire permet de détecter avec une puissance élevée unQTL
quiexplique
au moins 15 % de la variancegénétique
et d’estimer
précisément
la position de ceQTL.
Pour une détectionprécise
et uneestimation correcte de la variance au
QTL,
des méthodesplus sophistiquées
sontcependant
nécessaires.©
Inra/Elsevier,
ParisQTL
/
modèle aléatoire/
cartographie par intervalle/
méthode des paires d’apparentés1. INTRODUCTION
The
development
oflinkage
maps withlarge
numbers of molecular markers has stimulated the search for methods to map genes involved inquantitative
traits. The search for
QTL
has been most successful inplants
andlaboratory
animals for which data are available for backcross andF
2
generation
frominbred lines. With such
data,
theparental
genotypes,
thelinkage phases
of theloci,
and the number of alleles at theputative
QTL
are knownprecisely.
Additionally,
data fromdesigned
experiments
can be considered as onelarge
family,
because all individuals share the sameparental
genotypes.
As aresult,
the effect of
QTL
substitution and dominance can bedirectly
estimated[14,
18, 24! .
In most livestock
species,
especially
indairy cattle,
data from inbred lines and their crosses are not available. An outbredpopulation
is assumed to bein
linkage equilibrium.
In the absence oflinkage disequilibrium,
thelinkage
phase
between theQTL
and the markers will differ fromfamily
tofamily, and,
therefore,
theanalysis
of themarker-(aTL linkage
has to be made within afamily
[17].
Thefamily
size,
however,
isusually
notlarge enough
to enableaccurate
analysis
within asingle pedigree.
Additionally,
the number of(aTLs
affecting
traits of interest isuncertain,
as well as the number of alleles at eachQTL.
With the presence of a biallelicQTL
with codominantinheritance,
thedistribution of
genotypic
values is a mixture of three normal distributions.But,
with more alleles at theQTL,
the number ofpossible
genotypes
increasesand the
analysis
becomescomplicated
and tedious. With an unknown numberof
QTL
alleles it isimpossible
to determine the exact number ofgenotypes,
i.e. the number of normal distributions that build up the overall distribution
of
genotypic
values. In suchsituations,
the detection oflinkage relationships
between aputative QTL
and the marker loci canonly
be based on robustmodel-free
(non-parametric)
andcomputationally rapid linkage methods,
suchThe random model
approach
is based on thephenotypic similarity
(or
covariance)
betweengenetically
related individuals. The covariance betweentwo relatives
comprises
apolygenic
and aQTL
component.
Thepolygenic
component
depends
on thegenetic relationship
betweenanimals,
whereas theQTL
component
depends
on theproportion
of allelesidentical-by-descent
(IBD)
that two individuals share at theQTL.
Thepolygenic
component
consists of many genes with small effects.Thus,
it is assumed that theaverage
proportion
of alleles IBD sharedby
two individualsequals
thegenetic
relationship
coefficient between therelatives,
i.e.1/2
for full-sibs and1/4
for half-sibs. For the same kind ofrelationship, however,
the IBDproportion
at theQTL
differs from onepair
of relatives to another. Because the actualproportion
of alleles IBD at theQTL
is notobservable,
theproportion
of alleles IBD at theQTL
sharedby
two relatives(
7
rq)
must be inferred from the observedgenotypes
at linked marker loci.
Haseman and Elston
[16]
proposed
a robustsib-pair approach
based onsimple
linearregression
ofsquared phenotypic
differences between two sibs within afamily
on theproportion
of alleles IBD sharedby
the two sibs atthe
QTL.
The Haseman-Elstonsib-pair
method has beenproved
to be robustagainst
avariety
of distributions of data andindependent
of the actualgenetic
model of the
QTL. However,
this method islimited,
because thegenetic
effect of theQTL
and the recombination fraction between theQTL
and a marker locus are confounded. It canonly
detectlinkage
between a marker and aQTL,
but cannot estimate whether this is due to a
QTL
with alarge
effect at alarge
distance,
or to aQTL
with a small effectclosely
linked to the marker.Fulker and Cardon
[8]
developed
asib-pair
intervalmapping
procedure using
two markers to
separate
the location of aQTL
from its effect and to estimatethe
specific position
of aQTL
on a chromosome. This results in ahigher
statistical power, but it is still a
least-square-based
methodand, therefore,
does not
optimally
utilize all information that could be extracted from the distribution of thespecific data,
as a maximum likelihood(ML)
method woulddo.
Goldgar
[10]
developed
amultipoint
IBD method based on the MLapproach
to estimate the
genetic
varianceexplained by
aparticular
chromosomalregion.
This method has been extended
by
Schork[19]
tosimultaneously
estimate variances of several chromosomalregions
and the common environmental effectshared
by
all sibs. Both methods takeadvantage
of the distributionalproperties
of the dataand, therefore,
are morepowerful
than the Haseman-Elston method.However, they only
estimate variance ofQTL
and not the exactQTL position.
Xu and
Atchley
[22]
extended theGoldgar’s
ML method to intervalmapping.
They
developed
an efficientgeneral
QTL mapping procedure, assuming
asingle
normal distribution of
QTL genotypic
values andfitting
aQTL
as a randomeffect
along
with apolygenic
component.
They
showedthat,
using
the randommodel
approach,
aQTL
can besuccessfully
mapped
and its variance estimatedin full-sib families.
The ML-based random model
approach
forQTL mapping using
thesib-pair
method has been well established forlinkage analysis
in humans[3, 22]
anda)
to extend the random modelapproach
forQTL mapping
based on asib-pair
method to half-sibfamilies;
b)
to test theappropriateness
and robustness of a random modelapproach
for
QTL mapping
in half-sib families with differentsample
sizes,
heritabilities of thetrait,
QTL
variances,
number of alleles at marker loci and number of alleles at theQTL using
stochastic simulation.2. THEORY
2.1.
Estimating
theproportion
of IBD in half-sib familiesIf the markers are
fully informative,
theproportion
of alleles IBD(
7
i)
sharedby
two sibs at a locus can be0,
1/2
or 1 ifthey
share zero, one or twoparental
alleles,
respectively.
Forhalf-sibs,
theproportion
of alleles IBD at a locus canbe either 0 or
1/2,
sincethey
only
have one commonparent
andtherefore,
assuming
unrelateddams, they
can share either zero or oneparental
allele.If the markers are not
fully informative,
the !ris at the markers cannotbe observed and must be
replaced by
theirexpected
values conditional onmarker information available on sibs and their
parents.
Haseman and Elston[16]
proposed
asimple
method to calculate !r.l aswhere
f
i2
andf,,
l
are theprobabilities
that the sibs share two or one allele ata
locus, respectively,
conditional on observedgenotypes
of the sibs and theirparents.
Analogously,
7r,
for two half-sibs can be estimated asThe
proportions
of alleles IBD at marker loci are used to calculate theproportion
of alleles IBD at theQTL,
because twooffspring
that receive thesame marker allele are
likely
to receive the same allele at a linkedQTL.
Haseman and Elston[16]
showed that theexpected proportion
of IBD at onelocus is a linear function of the
proportion
of IBD at another locus. Fulker and Cardon[8]
used theproportions
of IBD at twoflanking
markers to calculate the conditional mean of theproportion
of IBD at theQTL
(
7
q),
which is alsoa linear function of %s at two
flanking
markers:where 7rl and !r2 are IBD values for two
flanking
markers. The/
3
weights
aregiven by
the normalequation:
2 and the
putative QTL, respectively, replacing
all 7rS with1/4,
all variances(V(
7
r
;
))
with1/16,
and all covariances(Cov(!ri,
!r!))
with(1 —
2!)!/16,
andsolving
(4),
the estimates of(
3
values can be obtained as follows[2,
7,
8!:
2.2.
Mapping procedure
under the random modelA
general
form of the random model has been definedby
Goldgar
[10]
aswhere yij is the
phenotypic
value of the trait in thejth
offspring
of the ith half-sibfamily;
p is thepopulation
mean; gij is the random additivegenetic
effect of theQTL
with mean = 0 and variance =or2;
aij is the random additivepolygenic
effect with mean = 0 and variance =er!;
e2! is the random environmental deviation with mean = 0 and variance =
u!.
All random effects in the model are assumed to be
normally
distributed.However,
ifQ
a
and
af
are
large enough
to make the distribution of the datanormal,
the normal distribution of theQTL
effects is notabsolutely required.
In a half-sib
family,
the variance of y2!assuming
alinkage equilibrium
is:and a covariance between two non-inbred
half sibs j
and
j’
is:with !rq = the
proportion
of alleles IBD at theputative QTL
sharedby
twohalf-sibs.
The coefficient of the
polygenic
variance is1/4
because,
by
expectation,
twonon-inbred half-sibs share
1/4
alleles IBD. Theproportion
of IBD at theQTL
(!rq)
will be different for each half-sibpair.
7rq is a variable that ranges from 0to
1/2
in half-sib families.For the estimation of variance
components,
7rq inequation
(9)
isreplaced by
its estimated valuetrq
fromequation
(3).
With k sibs in each
family,
C
i
is a k x k matrix.We define
h9
=u.!/ U2
as theheritability
of aputative
QTL,
h’
=u;/ u2
asthe
heritability
of apolygenic
component,
andht =
(!9
+
u;) / u
2
as the totalheritability. Assuming
a multivariate normal distribution of the data(
Yij
),
wehave a
joint
density
function of the observations within a half-sibfamily:
where y2 =
[
Y
i
l
Yi2 yZ3...
yZ!;!!
is a k x 1 vector of observedphenotypic
valuesfor k half-sibs within the ith
family,
and 1 = k x 1 vector with all entriesequal
to 1.
The overall
log
likelihood for nindependent
families isThe likelihood function relates to the
position
of theQTL
flankedby
twomarkers
through
ri. The unknownparameters
that have to be estimated arep,
Q
z
,
h9,
ha
andq.
1
0
Inmaximizing
L,
the commonpractice
in the intervalmapping procedure
is to treat the recombination fraction between the first marker and aputative
QTL
(0
1
,)
first as a knownconstant,
thengradually
increase0
1
,
and decrease the distance between theQTL
and theright
marker(0q2 )
throughout
the entire interval between theflanking
markers,
andrepeat
theprocedure
in every intervaluntil, eventually,
the whole genome is screened.The maximum likelihood estimate of the
QTL
position
is determinedby
the value of0
1
,
in theappropriate
interval that maximizes Lthrough
the entirechromosome.
The null
hypothesis
is thath! =
0,
i.e. that noQTL
ispresent
in the tested interval. The ML under nullhypothesis
is denotedby
L
o
.
The likelihood ratio(LR)
test statistics isThe LR statistics under
H
o
follows thex2
distribution
with a number ofdegrees
of freedom(df)
between 1 and 2. With asingle QTL,
one df is due tofitting
h9 and
theremaining
df forfitting
theQTL position.
Theremaining
dfdepends
on the distance between two markers and is less than one becausewe search for the
QTL only
within aninterval,
rather than in the entire genome(chromosome).
If theH
o
is that noQTL
is present in the whole genome(chromosome)
coveredby
themarkers,
the df underH
o
is =N 2!22!.
3. SIMULATION AND ANALYSES
The Monte Carlo simulation
technique
was used togenerate
genotypic
andphenotypic
data.Mapping QTL
were considered in a 100 cMlong
chromosomalsegment
coveredby
sixmarkers, equally
distributedalong
the chromosomesame
frequency.
Asingle QTL
with several codominant alleles with the samefrequency
and additive effects was simulated in the middle of the chromosomalsegment
(i.e.
at 50cM).
Parents were
generated
by
the random allocation ofgenotypes
at eachlocus
assuming
aHardy-Weinberg equilibrium.
Parentallinkage phases
wereassumed unknown.
Offspring
weregenerated
assuming
nointerference,
so that a recombination event in one interval does not affect the occurrence of arecombination event in an
adjacent
interval. Recombination fractions for eachlocus were calculated
using
the Haldane map function!13!.
Normally
distributedphenotypic
data with mean = 0 and variance = 1 weregenerated
according
to thefollowing
model:where y2! is the
phenotypic
value of theindividual j in
the half-sibfamily
i;
pis the
population
mean; qi! is the effect of theQTL
genotype
of individualj;
si is the sire’s contribution to the
polygenic value;
d
ij
the dam’s contributionto the
polygenic
value;
4>ij
is the effect of Mendeliansampling
on thepolygenic
value;
and eij the residual error.Phenotypic
values were assumedpre-corrected
for fixed environmentalef-fects.
Family
structure was chosen to accommodate atypical
situation in acommercial
dairy population.
Forsimplicity,
sires were assumed to beunre-lated. Each sire was mated to 25
randomly
chosen unrelated dams toproduce
one
offspring
permating.
The values of the simulated
parameters
varieddepending
on themajor
purpose of the simulation.
To test the behavior of the random model
approach
under different heri-tabilities of the trait and differentproportions
of varianceexplained
by
theQTL
(i.e.
different size of the(aTL),
seven different values ofheritability
wereassumed: the
heritability
of the trait was varied from 0.10 to 0.70 insteps
of 0.10. The totalgenetic
variance consisted of aQTL
component
and an unlinkedpolygenic
component.
The additive allelic effect of theQTL
was set so that theQTL
variance accounted for10,
50 and100 %
of the totalgenetic
variance.The number of alleles at the
QTL
was 5. All of the six markers had six alleles with the samefrequency.
To test the influence of marker
polymorphism
on theperformance
of therandom model
approach,
each of six marker loci was assumed to have two,four,
six or ten alleles with anequal frequency.
Two different heritabilities ofthe trait were considered: 0.10 and 0.50. The number of alleles at the
QTL
wasfive. The total
genetic
variance was accounted forby
theQTL,
i.e. nopolygenic
component
was simulated.To test the robustness of the random model
approach against
the number of alleles at theQTL,
theQTL
was simulated withtwo,
five or nineequally
frequent
alleles with additive effects.Again,
thephenotypic
trait was simulatedassuming
two different heritabilities: 0.10 and0.50,
with thecomplete genetic
variance due to the
QTL.
Each of six marker loci had sixequally
frequent
alleles.
In each simulation two different
sample
sizes were considered: 50 and 100The ML interval
mapping procedure
wasapplied
to the simulated data. The chromosome was searched insteps
of 2 cM from the left to theright
end. Unknownparameters
h!,
h!
andu
2
were estimatedsimultaneously.
The likelihood function was maximized withrespect
to theseparameters
using
thesimplex algorithm provided by
Xu(pers. comm.).
The testposition
with thehighest
LR wasaccepted
as the mostlikely
position
of theQTL.
For eachparameter
combination the simulation andanalysis
wererepeated
100times. The accuracy of estimation was
judged
according
to anempirical
95 %
symmetric
confidenceinterval,
estimated from the observedbetween-replicate
variation and calculated as
2t,
/2
,
99
times theempirical
standard error.The
empirical
distribution of the LR test statistics wasgenerated
in the same manner for eachparameter
combination under the nullhypothesis,
i.e.assuming
no
QTL
in the entiresegment.
Asignificance
level of 0.95 was chosen for allanalyses.
Theempirical
threshold value was defined as the 95thpercentile
of theempirical
distribution of the LR test statistics underH
o
.
The power was defined as apercentage
ofreplications
in which the nullhypothesis
wasrejected
at the5 %
significance
level. The distribution of the maximum LR values obtained underH
o
forheritability
of the trait 0.10 and 0.50 is illustrated infigure
1.4. NUMERICAL RESULTS
4.1.
Heritability
of the trait andproportion
ofQTL
varianceEstimates for the
QTL
location,
averaged
over 100replicates,
withWhen the
QTL explained
the entiregenetic
variance,
the estimates for theQTL
position
were close to the trueparameter
value of 50 cM. When theQTL
explained
50 %
of thegenetic
variance,
the estimates were close to the trueQTL position
when theheritability
of the trait was 0.30. When theQTL
explained only
10 %
of thevariance,
the average estimates were biased and close to the true valueonly
with a veryhigh heritability
of the trait and asample
of 100 families.When the
genetic
variance iscompletely
due to theQTL,
the accuracy of theQTL position
estimates,
given
as a width of the 95 %empirical
confidenceinterval,
wasstrongly
influencedby
theheritability
of the trait and thenumber of families. When
heritability
increased from 0.10 to0.20,
the accuracy of the estimates increasedby
approximately
40 %
(the
confidence interval decreased from 10.9 to 6.3 cM and from 7.9 to 4.9 cM for 50 and 100families,
respectively).
With a further increase inheritability
to0.70,
the confidence interval decreased to 1.8 and 0.6 cM for 50 and 100families, respectively.
Relative
improvement
in accuracy was smaller when theQTL explained
asmaller
proportion
of thegenetic
variance. When50 %
of thegenetic
variancewas
explained by
theQTL,
the increase inheritability
of the trait from 0.10 to0.20 resulted in a reduction of the confidence interval
by
20%.
With aQTL
explaining
only
10 %
of thegenetic
variance,
theimprovement
in accuracy with increasedheritability
of the trait was verysmall,
regardless
of thesample
size.However,
generally,
more accurate estimates of theQTL position
were obtainedEstimates for
QTL
(h 2),
polygenic
(h’)
and total(hn
heritability
aregiven
in table Il. Estimates for total
heritability,
whichrepresents
a sum ofQTL
andpolygenic heritability,
wereequal
or very close to the trueparameter
values. When theQTL explained
10%
of the totalgenetic variance,
the estimatedh2
was
relatively
close to the true value oronly
slightly
overestimated for theheritability
of the trait = 0.10. With an increase inheritability
from 0.10 to0.40,
h9
was
overestimated. With further increase inheritability
(over 0.40),
the bias became
smaller,
so that the estimatedhy was
close to the true value. Thispattern
is visible infigure
2a. When 50%
of thegenetic
variance wasexplained by QTL,
the estimates ofh9 followed
a differentpattern
(figure 2b).
For low
heritability
of thetrait,
0.10 and0.20,
the estimates were close to thetrue values of the
parameter.
With further increase inheritability,
the estimatesbecame
biased,
andfinally
considerably
underestimated when theheritability
of the trait reached 0.70. Even more severe downward bias was encounteredin the
parameter
combinations in whichQTL
accounted for the entiregenetic
variance
(figure 2c).
As theheritability
of the traitincreased,
the estimated values ofh9
became
more and more biased. Thisinability
of the random modelto
’pick up’
alarger
QTL
variance was observedindependently
of thesample
size.
The
empirical
power ofQTL detection,
defined as thepercentage
ofobtained
by
data simulation underH
o
,
isgiven
in table 111. The power to detectQTL
washighly dependent
on theheritability
of the trait. With aheritability
of
0.10,
the maximum power was 32%
(with
100 families and thecomplete
genetic
variance accounted forby
theQTL).
Withincreasing
heritability
of thetrait,
the power increasedrapidly.
A further factor with astrong
influenceon power was the
proportion
ofgenetic
variance due toQTL.
When theQTL
explained
only
10%
of the totalgenetic variance,
the power increased from 6 to 27%
and from 6 to 34%
forsamples
of 50 and 100families,
respectively,
explained
50 %
of the totalgenetic variance,
the power increased much faster and reached over 90%
already
with aheritability
of the trait of 0.40-0.50. Evenfaster increase in power could be observed in
parameter
combinations in which theQTL explained
the entiregenetic
variance.Figure 3
shows the LRprofiles averaged
over 100replicates
for differentproportions
ofgenetic
variance due toQTL,
heritability
of the trait = 0.10 andsample
size = 50 families. The LRprofiles
for differentQTL
effects with thesame parameter combination and
heritability
of the trait = 0.50 are shown infigure
4.
Bothfigures
show a flatprofile
whenQTL
accounts foronly
10 %
genetic
variance,
regardless
of theheritability
of the trait. With ahigher
heritability
andgreater
proportion
ofgenetic
variance due to theQTL,
the LRprofile
indicates theQTL
location veryprecisely.
With theheritability
of the trait =0.10,
the location of theQTL
isclearly
indicatedonly
when theQTL
accounts for thecomplete
genetic
variation.But,
the average LR inthis situation did not exceed a value of
2.3,
which is far below ourempirical
threshold value of 5.47.
4.2. Number of alleles at marker loci
The effect of the number of alleles at marker loci on the estimates of the
sizes and heritabilities of the
trait,
assuming
thecomplete genetic
variance dueto
QTL,
is shown in table IV. The mean estimates forQTL
location wereconsistent for all
parameter
combinations and close to the trueparameter
value
(50
cM),
regardless
of the number of marker alleles. The confidence intervals were,however,
narrower forpolyallelic
than for biallelicmarkers,
whichindicated more accurate estimates when markers were
polymorphic.
Increasing
the number of alleles from four to six and ten did not affect the confidence interval. The heritabilities of the trait showed a
significant
influence on thewas
considerably
wider with the lowheritability
of the trait.Increasing
thenumber of families also resulted in narrower confidence intervals and thus more
accurate estimates for the
QTL
location.Estimates for
QTL,
polygenic
and totalheritability
for different numbers of markeralleles,
heritability
of the trait andsample
size aregiven
in table V.Estimates for total
heritability
were close to simulated values for almost allparameter
combinations,
except
for the situations with biallelic markers inwhich ht was
overestimated. Forheritability
of the trait =0.10,
estimates for bothQTL
andpolygenic heritability
wererelatively
close to the truevalues,
regardless
of the number of marker alleles and otherparameters.
Forheritability
of the trait =0.50, QTL
heritability
wasagain
severely
biased downwards. Theestimated
polygenic
component,
although
notsimulated,
accounted for almostThe
empirical
power for the sameparameter
combinations isgiven
intable VI. As
expected,
the power to detectQTL strongly depended
on theheritability
of the trait. For aheritability
of the trait =0.50,
power was close
to 100 for all
parameter
combinations.Therefore,
differences in power to detectQTL
causedby
parameters
other thanheritability
could be observedonly
forparameter
combinations withheritability
of the trait = 0.10. Thepower
mostly
increased when the number of marker alleles increased from two to four. With
a further increase in the number of marker
alleles,
the power did notchange
considerably.
Power was alsosignificantly
increased with increasedsample
size.With 100
families,
the power was almost twice that with 50 families for allparameter
combinations. Adrop
in power from 42 to 32%
when the number of marker alleles increased from four to sixmight
be due to thehigher
threshold value obtained for thisparameter
combination.Figures
5 and 6 show the LRprofile
averaged
over 100replicates
fortwo,
four and ten marker
alleles, sample
size of 50 families andheritability
of thetrait = 0.10 and
0.50,
respectively. Figure
5 shows that theQTL
location wasnot
clearly
indicated with a lowheritability
and a low number of marker alleles.Increasing
the number of marker alleles to tenimproved
the estimate of theQTL
location. With theheritability
of 0.50(figure
6),
the estimates of theQTL
position
weresignificantly improved.
LR also increased withincreasing
markerpolymorphism, especially
when the number of marker alleles increased fromtwo to four.
4.3. Number of alleles at
QTL
trait and
sample sizes, assuming
thecomplete
genetic
variance as due to theQTL,
are shown in table VII. For allparameter
combinations, regardless
ofany
parameter,
the estimates for theQTL
position
were close to the simulated value of 50 cM.Empirical
confidence intervalsdepended
on theheritability
ofthe trait and
sample
size. The confidence interval wasconsiderably
decreasedby increasing
heritability
of the trait from 0.10 to 0.50.Increasing sample
size from 50 to 100 families also had a certainpositive
influence on the accuracyof estimation. The number of alleles at the
QTL
does not seem to have anyTable VIII shows estimates for
h9,
ha
and
h; for
different numbers ofQTL
alleles,
different heritabilities of the traitand
differentsample
sizes. As inthe
previous
analyses,
a severe downward bias in the estimates forh9 and
acorresponding upward
bias in the estimates forha
were encountered witha
heritability
of the trait = 0.50.Obviously,
this bias was not causedby
the number of alleles at theQTL,
because it was found in allparameter
combinations in which the simulated true
heritability
of the trait was 0.50.Power of
QTL
detection for different numbers ofQTL
alleles,
different heritabilities of the trait and differentsample
sizes isgiven
in table IX. Thepower was 100
%
with theheritability
of the trait =0.50,
regardless
of any5. DISCUSSION
In the first
part
of thisstudy,
weinvestigated
the effects of theproportion
ofgenetic
variance due toQTL, heritability
of the trait andsample
size onthe estimates of
QTL
parameters -QTL
location and variancecomponents,
and power. The results of the simulation
study
showedsignificant
effects ofproportion
ofgenetic
variance due toQTL
on the estimates forQTL
location,
heritabilites and power. A
QTL
with a smalleffect,
which accounts foronly
1
%
of the totalphenotypic variance,
is veryunlikely
to beprecisely located,
especially
when thesample
comprises
only
50 families. The location of the smallQTL
cannot beclearly indicated,
as the estimates forQTL
location aredistributed
along
thechromosome,
and the average estimate over thereplicates
takes almost a random value. On the
contrary,
aQTL
with alarge effect,
accounting
for 10%
of the totalphenotypic variance,
can beaccurately
locatedwith
only
50 families.Estimation of
QTL
position
yields
better results with alarger
proportion
ofgenetic
varianceexplained by QTL
and ahigher heritability
of the trait. Theempirical
confidence interval forQTL
location shows that accuracy decreasedsignificantly
when theproportion
ofgenetic
variance due toQTL
decreased from 100 to 10%,
especially
withhigh heritability
of the trait.Sample
size has little influence on the averageestimates,
butlarger samples
enable somewhatmore accurate estimates. These results are consistent with those obtained
by
Xu and
Atchley
!22!,
who used the sameapproach
to estimateQTL
location andgenetic
parameters
in full-sib families.The estimates of variance
components,
hereingiven
as heritabilities(ht ,
hg
2and
ha)
highly depend
on theheritability
of the trait and theproportion
ofgenetic
varianceexplained
by
theQTL. Although
the estimates of the totalgenetic
variance(expressed
ash t
2
)
are very close to the trueparameter
valuesin all
parameter
combinations,
the properpartition
of theQTL
andpolygenic
a
larger
biasaccompanying
alarger QTL.
However,
an underestimatedQTL
variance is
always
accompanied by
an overestimatedpolygenic
variance,
so thatthe sum of
h9 +
ha is
conserved at a value very close to a simulated true value of totalheritability, indicating
a successfulpartitioning
ofgenetic
and residualvariance.
Confounding
betweenh) and
ha has
been observedby
Gessler and Xu!9!,
who
explained
thisphenomenon by
differences in the models used for data simulation and estimation.They
simulated datausing
amonogenic model,
and,
because the simulatedh2
was zero, apartitioning
intoha and
h9 under
the conserved
sum h9 +
ha tended
to reduceh 9 2,
and thus the estimates forh2
were
biased downwards. In anotherstudy,
Xu and Gessler[23]
found anoverestimation of the
QTL
component
under a modelincluding
a non-zeropolygenic
component.
Theirfinding
is,
therefore,
opposite
of what we found in ourstudy. Nevertheless, confounding
between variancecomponents
has beenconsidered to be a
general
difficulty
of thesib-pair
approach
!1,
4,
9!.
Recently,
Xu
[21]
proposed
a method to correct the bias in the estimates of theQTL
variance
using
aquadratic approximation
of the LR test statistic. Thisproblem,
however, requires
further research.The power of
QTL
detection,
ingeneral, depends mostly
on theheritability
of the trait and theproportion
ofgenetic
varianceexplained by QTL.
A smallQTL
in a smallsample
is very difficult to detect withcertainty.
Alarge QTL
can,
however,
be detected with ahigh
power, even in a smallsample. Increasing
the number of families does notsignificantly improve
the power when theQTL
is small.
Generally,
it can be concluded that aQTL
thatexplains
at least 30%
of thephenotypic
variance can be detected with 100%
power in anyexperimental
design.
To reach asatisfactory
power of 70-80%
in asufficiently
large sample,
aQTL
must account for at least 15%
of thephenotypic
variance. The secondpart
of thisstudy
focused on the influence of markerpolymor-phism
onQTL
parameter
estimates and power,assuming
low andhigh
her-itabilities of the trait
(h
2
= 0.10and h
2
=0.50),
andusing
small andlarge
samples
(50
and 100 families with 25 half-sibseach).
The results showed that the mean estimates of the
QTL
location were notaffected
by
any of theparameters
in thestudy
(heritability
of thetrait,
sample
size and number of alleles at each of six marker
loci).
However,
the accuracy ofestimation,
given
as a 95%
empirical
confidenceinterval,
wasmarkedly
influenced
by
theheritability
of the trait and the number offamilies,
and alsopartially by
the number of marker alleles.Several
previous
studies found that the accuracy ofQTL
location ismostly
influenced
by
the size of theQTL
effect andsample
size. Otherparameters,
such as marker mapresolution,
have little effect!6!.
The results from ourstudy
also showed
positive
effects oflarger samples
on the confidence interval in allparameter
combinations.Furthermore,
the accuracy of estimates forQTL
locationimproves
with anincreased number of marker loci. For markers with
four,
six or tenequally
frequent
alleles and lowheritability
of the trait(h
2
=0.10),
50 half-sib familiesgive
the same accuracy as for markers with two alleles and double the number of families. An increased accuracy of the estimates for theQTL
location withpolyallelic
markers was alsoreported by
Knott andHaley
(17!.
This indicatespolymorphic
markers are used in theanalysis.
The reduction of the number of animals to begenotyped
wouldsignificantly
reduce the costs ofQTL analysis,
one of the
major
limitations inmapping
andutilizing QTL
!5!.
In
general,
estimated values for heritabilities are similar to those from the firstpart
of thestudy. Only
for biallelicmarkers,
the value ofh
is
biasedupwards,
which indicates that biallelic markers do notprovide
enough
information to infer 7r
properly.
For markers with > fouralleles,
the estimated heritabilitiesh9,
h2
andht
are
close to the simulated values in allparameter
combinations when theheritability
of the trait = 0.10. With theheritability
ofthe trait =
0.50,
hv and
h2 are
strongly
confounded,
and the sum ofh! +
ha is
relatively conserved,
for allparameter
combinations.’
Apart
from theheritability
of the trait and the number offamilies,
another factor that influences power,especially
when theheritability
of the trait islow,
is theheterozygosity
of marker loci. With anincreasing
number of alleles atmarker
loci,
one canexpect
ahigher
power ofQTL
detection!11,
17,
20!.
The results of thisstudy
indicate that power increasesapproximately by
20%
when the number of marker alleles increases from two to four. This is consistentwith the
expectation
that a linkedQTL
can be detectedonly
if theparent
isheterozygous
for the marker locus. With biallelicmarkers, only
1/2
of the parents isexpected
to beheterozygous.
On the otherhand,
with four markeralleles,
theproportion
ofparents
heterozygous
for individual marker loci will be0.75,
which results in an increasedproportion
of informativesib-pairs.
A furtherincrease in marker
heterozygosity
(from
four to six to tenalleles)
does notresult in a
significant
increase in power, because theproportion
ofheterozygous
parents
and informative half-sibpairs
does notchange drastically.
Variations in power with four marker alleles found in ourstudy
can beregarded
as random.The third
part
of thestudy
focused on the influence of the number ofQTL
alleles on estimates for
QTL
position,
variancecomponents
and power. Theresults of the simulations
proved
theinsensitivity
of the random modelapproach
against
the number of alleles at theQTL.
The estimates of theQTL position
are very similar for biallelic and for multiallelic
QTL. Also,
the accuracy of theestimates is affected
only by
theheritability
of thetrait,
theproportion
of thegenetic
varianceexplained
by
QTL
andsample
size,
but notby
the number ofQTL
alleles.Other authors who
compared performance
of the random modelapproach
in
analyses
of biallelic and multiallelicQTL
in full-sib families[22]
andmultigenerational pedigrees
[12]
reported comparable
results. This underlines the mainadvantage
of the random modelapproach
over otherparametric
methods: its
flexibility regarding
the actual number of alleles at theQTL.
The estimates of the variancecomponents,
expressed
ashy,
ha and h t , 2
arevery similar to those from the
previous analyses.
With ahigher heritability
of the
trait,
hy is
severely
biaseddownwards,
andha is,
accordingly,
biasedupwards.
The same bias can be observed forQTL
with two, five and ninealleles. This shows that the bias in estimates of the
QTL
variance is not causedby
deviation of the distribution ofQTL
effects fromnormality,
as in the case ofa
biallelic, and,
partly,
five-allelicQTL.
Even with nineQTL alleles,
when theassumption
of the normal distribution of theQTL
effectfully
holds(with
ninecodominant alleles there are 45 different
genotypes),
the bias in the estimatesis
obviously
due to ageneral
frailty
of a random model based on thesib-pair
approach. Grignola
et al.[12]
who used a residual maximum likelihood method based on amultigenerational pedigree
did not obtain biased estimates ofQTL
andpolygenic
variances.Also,
the power to detect aQTL
shows little differences amongdesigns
witha
QTL
withtwo,
five or nine alleles anddepends only
on theheritability
of thetrait,
proportion
ofQTL
variance andsample
size.6. CONCLUSIONS
In this
study
we showed that the intervalmapping procedure
based onthe random model
approach,
initially designed
forQTL mapping
in humanpopulations
(22!,
can beapplied
todairy
cattlepopulations
withlarge
half-sib families.QTL
withrelatively large
effects can be detected withhigh
power andaccurately located, especially
if alarger
number of families andpolymorphic
markers are used.The random model based on a
sib-pair approach requires
marker dataonly
on progeny and their
parents,
which can be seen as anadvantage
when markerdata on older ancestors are not available.
However,
the method can beeasily
extended to make use of available data from
general pedigrees.
This wouldprovide
better estimates of !rs because information from all relatives would bejointly
used rather thanjust using
data from apair
of individuals and theirparents.
Therelationships
among animals andinbreeding
would be taken intoaccount.
Furthermore,
in the case ofmissing parental
genotypes,
it would bepossible
to infer 7rS from the information available on other relatives.Because of its robustness and
simplicity,
the random modelapproach
isrecommended for
rapid screening
of the whole genome, followedby
a refinedanalysis applied
to those chromosomalsegments
that show somesignals
ofQTL
presence,
using
moresophisticated
methods.Also,
moresophisticated
methodsshould be used to estimate
QTL
variance,
because the random modelapproach
cannot
partition QTL
andpolygenic
varianceproperly.
Furthermore,
certainrecently
developed
methods based on residual maximum likelihood[12]
may beconsidered as a
possible
alternative tosib-pair
based methods.ACKNOWLEDGMENTS
The authors want to thank the EMBRAPA and
CNPq,
Brazil(M.L.M.)
and theSwiss National
Foundation,
Switzerland(N.V.)
for financial support, the ISUCom-putational
Center forproviding
resources, and Dr S. Xu forproviding
programs and invaluablesuggestions.
This is JournalPaper
no. J-17132 of the IowaAgriculture
and Home EconomicExperiment Station, Ames, Iowa, Project
no. 3146, andsupported
by
the Hatch Act and State of Iowa funds.REFERENCES