HAL Id: hal-02702386
https://hal.inrae.fr/hal-02702386
Submitted on 1 Jun 2020
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Optimal design for the detection of a major gene segregation in crosses between 2 pure lines
Jean Michel Elsen, Pascale Le Roy
To cite this version:
Jean Michel Elsen, Pascale Le Roy. Optimal design for the detection of a major gene segregation in crosses between 2 pure lines. Genetics Selection Evolution, BioMed Central, 1995, 27 (3), pp.275-285.
�hal-02702386�
Original article
Optimal design for the detection
of a major gene segregation in crosses
between 2 pure lines
JM Elsen P Le Roy
1
Institut national de la rechercheagronomique,
station d’améliorationgénétique
des animaux,
BP!7,
31326 Castanet-Tolosancedex;
2
Institut national de la recherche agronomique, station degénétique
quantitativeet
appliquée,
78352Jouy-en-Josas cedex,
France(Received
15 June1994; accepted
15 December1994)
Summary - A simulation method was used to compare different
experimental designs
fortheir power to detect a
major
geneusing
a maximum likelihoodapproach.
Theoptimal design
is most often theproduction
of F2 as theonly segregating genetic
type, with a limited effect of the relative numbers of F2s andnon-segregating
groups(parentals
andF1)
on the power. Dominant genes were moreeasily
detected than additive ones. A modeldealing
with theheteroskedasticity
of thepolygenic
component was also studied.major gene
/
optimization/
maximum likelihood/
homozygous lineRésumé - Protocoles optimaux pour la détection d’un gène à effet majeur en ségrégation dans des croisements entre 2 lignées pures.
Différents protocoles expéri-
mentaux ont été
comparés
par simulation sur leurpuissance
pour la détection d’ungène
à l’aide d’un test du maximum de vraisemblance. Le
protocole optimal
est leplus
souventcelui pour
lequel
le seul typegénétique
où legène
est enségrégation
est la F2, avec unfaible effet
de laproportion
de F2 par rapport aux typesgénétiques
sansségrégation (parentaux
etFl).
Lesgènes
dominants sont détectésplus facilement
que lesgènes additifs.
Un modèleconsidérant l’hétéroscédasticité de la composante
polygénique
est aussi étudié.gène
majeur / optimisation / maximum
devraisemblance / lignée
homozygoteINTRODUCTION
The
genetic
mapspresently
underdevelopment
will soon be agreat help
inthe detection of
quantitative
trait loci.Nevertheless,
as statedby
Gofhnet et al(1994), evidencing major
genesegregation
without marker information will remainimportant
for various reasons:i) genetic
maps may not be available for allspecies;
ii) systematic
use of molecular markers is verycostly; iii)
statisticalanalysis
ofphenotype
distributions is a usefulpreliminary analysis
of availabledata;
andiv) retrospective
studies of oldexperiments
without marker information may be valuable.The basis for
population genetics
was establishedby Mendel,
who used crossesbetween pure lines of peas to observe the
segregation
of genescontrolling
the colourand appearance of seeds in F2 and backcrosses. Since that
time,
a number ofcrosses between
homozygous
lines and even betweenheterogeneous subpopulations
have been conducted in
plants
and animals as tests of amajor
genesegregation
between these lines or
subpopulations (the parental groups),
eg, Hanset(1991)
andBoujenane
et al(1991).
Thesubpopulations
may often be considered asindependent samples (eg,
Bradford andFamula, 1984;
Duchet-Suchaux etal, 1992;
Loisel etal,
1994).
The
underlying hypothesis
isusually
that theparental
groups(PI
andP2)
arehomozygous
inopposite
states(AA
andBB)
at aparticular
locusgoverning
themeasured trait. Under this
hypothesis,
the first cross(Fl)
ishomogeneous
with allanimals
AB;
the F2s(crosses
between Flparents)
may beAA,
AB or BB withprobabilities
of1/4, 1/2
and1/4 respectively;
the backcrosses(either BC1,
crossesbetween Fl and
PI,
orBC2,
crosses between Fl andP2)
are alsoheterogeneous
AA or AB animals
(BC1)
and AB or BB animals(BC2)
withproportions 1/2, 1/2.
The statistical
analysis
of the data obtained from thesepopulations
wasclearly
described
by
Elston and Stewart(1973)
and Stewart and Elston(1973). They
showed how a maximum likelihood
approach
could be used to test variousgenetic hypotheses differing
in gene numbers andtypes (additive/dominant, autosomal/sex- linked).
Alternative methods were describedby
Mode and Gasser(1972)
and Weber(1959).
The power of thistype
ofexperiment
has beenrecently investigated by
Janssand Van der Werf
(1992), limiting
theirstudy
to the case of F2populations.
In this paper, we describe a
study
of theoptimal
structure of thepopulation
defined
by
the relative and absolute numbers ofsubgroups (PI, P2, Fl, F2,
BC1and
BC2).
Different structures werecompared using
simulations and their power to detect amajor
gene in a maximum likelihoodapproach
wasinvestigated.
Someinformation about a more robust model is also
provided.
The use of simulations for the evaluation of the statisticalproperties
of the likelihood ratio test isjustified by
the non-observation of classical
asymptotic
distributions in theparticular
contextstudied
(Goffinet
etal, 1992;
Loisel etal, 1994).
METHODS Model
Two
hypotheses
werecompared. H o
assumes that the difference between theparental
lines PI and P2 is due to alarge
number of genes, each with a small effect incontrolling
the traitmeasured,
andH l
assumes thatbeyond
thispolygenic difference,
amajor
gene is fixed atopposite homozygous
states(AA
andBB)
inthe
parental
lines.Y2!
is theperformance
of thejth
individual of the ithgenetic type.
Sixgenetic types
are considered(PI, P2, F1, F2, BC1, BC2)
with i = 1 to 6respectively.
Thenumber of individuals in the ith group is ni. Under
H o ,
theperformance x j
was modeled as:where p
is thegeneral
meanand l i
thegenetic type
i effect which can be detailedusing
Dickerson’scrossbreeding parameters (Dickerson, 1973).
In thisstudy,
theonly parameters
considered were the direct individual additive effects(r
and s forthe
parental populations
PI and P2respectively)
and the direct heterosis effect(h):
e
ij is the residual effect which is
normally
distributedN(0, <r!).
Under
H l ,
theperformance l oj
is modeled as:y
ti
= J1-i- l i
+ gk + e2! withprobability
Pikwhere g,! is the
major genotype
k effect(k
= 1 forAA,
2 for AB and 3 forBB)
and pik is the
probability
of the kthgenotype
in the ithgenetic type.
Under the
preceding
fixed alleleshypothesis:
The case where the
within-major-genotype
variance varies between groups may be studiedsimply by replacing u with c, 2..
In oursimulations,
this has beenexplored
for a limited range ofpopulation
structures.Test statistic .
The
hypothesis H o
was testedusing
the likelihood ratio test £ =-21n(L o/ L 1 )
where:
It must be
emphasized that,
in thismodel,
no familialrelationships
are consideredbetween the measured individuals.
The
H o hypothesis (no major
genesegregating
in F2sand/or backcrosses)
wasrejected
if the test statistic C exceeded a threshold A. Due to non-observation ofregulatory conditions,
theasymptotic
distribution of G underH o
isprobably
notthe classical
x2
with a number ofdegrees
of freedomequal
to the difference between the number ofparameters
to be estimated underH l
andH o (Goffinet
etal, 1992;
Jans and Van der
Werf, 1992). Moreover,
for a limited number ofindividuals,
thetrue
asymptotic
distribution may not be attained. To cope with thesedifficulties, empirical rejection
thresholds were obtained from simulations.Cases studied
First,
the power was evaluated for differentpopulation structures, given
a totalnumber of 180 individuals measured. These situations are
given
in table I. In all cases,PI,
P2 and Fl were inequal proportions.
In the Cl cases, the backcrosseswere not
produced
and thesegregation
of themajor
gene was visibleonly
in theF2. In the C2 cases, the F2 was absent and the 2 backcrosses were
present
inequal proportions.
TheC3,
C4 and C5 cases described the situations where both F2 and backcrosses werepresent.
Theproportion
t of individualsbelonging
to the’segregating groups’
increased between C10 andC19,
C20 andC26,
and C3 andC5. The
proportion
of F2s to backcrosses increased between C30 andC35,
C40 andC44,
and C50 and C54. Themajor
gene was characterized for each of these casesby
an effect of 2 residual standard deviations between the means ofhomozygotes,
either additive
(g l
=0,
g2 = 1 and g3 =2, ie,
a= (g 3 - g l )12
=1)
or dominant(
9i
= g2 = 0 and g3 =2 ,
ied= g2 - (9 1
+9s)/2 = -1).
Secondly,
the effects of the wholepopulation
size(E i
ni = 30 to 480individuals)
and of the
major
gene effect(4
values for a between 0.25 andla e ,
and d = 0 or- a)
were evaluated in the case where half of thepopulation
was made up of F2 individuals. The other half wasequally
divided betweenPI,
P2 and Fl individuals.Finally, considering
thesetypes
ofmajor
genes, the likelihood was modified to consider the case where thewithin-group
variance differs between the F2(a 2
andthe
non-segregating subpopulations (a2N ).
Simulations wereperformed F2) and
the
non-segregating subpopulations !).
Simulations wereperformed considering
!FZ
= 1 andaN S
=!FZ, cr!/1.25
orcrj!/1.5,
for the structures C10 to C19 and theirequivalent
with the total number of measured individuals doubled.Numerical
techniques
The results were obtained from simulations.
Appropriate
subroutines from the NAGlibrary
were used for thegeneration
ofgenotypes
and normal values(G05CCF,
G05DDF, G05CAF).
The maximization of the likelihood wasperformed using
aquasi-Newton algorithm (E04JBF
from the NAGLibrary). Only
1starting point
was tested for each maximization.
The
rejection
thresholds underH o
were estimated from the10% empirical
quantiles
of the test statisticdistribution,
for eachpopulation
structurestudied,
defined
by
the group sizes ni. The power at the10%
level wassimply
estimated for each case studiedby taking
the number of test statistic values that exceeded thecorresponding H o quantile.
Two thousand simulations wereperformed
in each ofthe
H o
andH l
cases.RESULTS AND DISCUSSION
Optimal
structure under the homoskedastic modelFigure
1gives
the power of situations Cl and C2 as a function of the ratio t of thesegregating population (F2
or the 2backcrosses)
size to the totalpopulation
size.Whereas the 2
types
ofdesigns (F2
or BCalone) give
a similar power for a dominant gene, the F2 must be used in the case of an additive gene, with a powervarying
between 60 and
70% against
30 to40%
for the backcross. In the Cl situations the maximum power isalways
reached for anequal proportion
ofsegregating (n 4
=90)
and
non-segregating populations (n l
= n2 = n3 =30),
ie with a t ratio of1/2.
In
contrast,
in the C2situations,
thisoptimal proportion
seems to differaccording
to whether a dominant
(where
theoptimum
is about 3 times more in backcrossindividuals than in
non-segregating individuals)
or an additive gene(the
maximumpower
being
attained with the minimum number of backcross individualsstudied)
is considered.
Figure
2 describes the case where the F2 and backcross groups were bothproduced (C3,
C4 andC5).
The power isgiven
as a function of the ratio u ofthe number of F2s to the number of F2 + backcross
individuals,
for the 3 situations considered withrespect
to thet parameter: 1/2 (C3
cases, nl = n2 = n3 =30), 2/3 (C4
cases, nl = n2 = n3 =20)
and5/6 (C5
cases, nl = n2 = n3 =10).
Thepower
appeared
to be very insensitive to the ratio u for a dominant gene and whenconsidering
an additive gene with a small number ofparental
individuals(t
=5/6).
In situations with an additive gene with a
larger proportion
ofparental
individuals(t
=1/2
or2/3),
the maximum power was attainedby maximising
theproportion
of F2s.
Evidence for a
major
gene comes from the detection of a mixture of subdistribu- tions within theglobal
distribution of either F2and/or
backcrosses. Inprinciple,
thetest statistic used
(the
likelihood ratiotest)
makes use of the wholenon-normality
of the
global
distribution. Thisnon-normality
isgreater
when the means of the subdistributions are more extreme. Thisphenomenon probably explains
the lack ofpower of the backcross cases as
compared
to the F2 cases when an additive gene was studied. In thissituation,
the difference between distributioncomponents
means of theglobal
F2 distribution was twice as ahigh
as the difference in either the BC1or the BC2.
When a
hypothesis
can be made about thetype
ofdominance,
before theexperiment
isdesigned,
then maximum power will be attainedby limiting
thesegregating subpopulation
to thesingle
backcrossshowing segregation. However,
the power of such a
design
will be zero if the true dominance is in theopposite
direction. Table II compares the power of this
design
with the power of an F2 whena total of 180 individuals were
measured,
half of which were in thenon-segregating (PI,
P2 andFl) populations.
All these results may also be
directly
related to theproportion
of the variance of the trait due to themajor
gene in thesegregating
groups(table III);
thisproportion
increases with the differences between subdistributions means.
Size of the
design
The minimum number of individuals to be measured in order to have a
90%
power for the detection of a gene effect a = 1 standard deviation is 150 whenconsidering
adominant gene
(d
=-a)
and about 500 whenconsidering
an additive gene(d
=0) (fig 3). Larger populations
arerequired
for smaller gene effects. Thechanges
incurve
shape
with the gene effect a must beemphasized.
These curves arenearly
linear for power under
70% and,
in this linearpart,
theslope (ie
thegain
in powerper extra
individual measured)
increases with a. Theresulting
increase in size of thedesign required
for a70%
power does not appear to be linear in1/a.
Janss and Van der Werf
(1992)
considered a 1 standard deviation additivegene
effect
(a
=1)
and a5% significance
level and found a12%
power whenonly F2
individuals were measured
(1000 individuals)
but a100%
power when 500 Fls were added to these 1000 F2s. From oursimulations,
thefurther inclusion
ofparental
P1 land P2
performances
in theanalyses appears
to beextremely
useful. We confirmed these results at the10%
level with some simulationsperformed
with F2individuals only.
The power ofdetecting
an additive 2 standard deviations gene with 1 000F2s
reachedonly 24%,
a value attained withonly
30 individualswhen
theparental subgroups
wereincluded.
’
Robustness to
heteroskedasticity
Janss and
Van
der Werf(1992) argued
that the inclusion of Fl data decreases the robustness of theanalysis,
a falsemajor
genebeing easily
detectedwhen,
the F2group variance is
higher
than in the F1population (100%
false detection with a50%
varianceincrease).
As describedabove,
thisheteroskedasticity
can be includedin the model without
difficulty.
Figure
4 shows the power of such a heteroskedastic model for variouspopulation sizes,
when theperformances
are simulated witha!2
=2
Additive anddominant genes of a 1 standard deviation effect were considered. The results obtained with
a!2
=1.25o NS
anda!2
=OrNs 2
were very similar. The detectionpower for additive genes was low and
nearly independent
of thepopulation
size andstructure. In
contrast,
in the case of a dominant gene, the power increasedstrongly
with
population
size and reached its maximum when all individualsbelonged
tothe F2
population,
which is theopposite
of the homoskedastic case where the non-segregating populations
were useful.This result shows that the information in the
non-segregating population
derivesfrom the level of the
within-group
variance. This variance for the F2 can be estimated in theparental
and Fl groups in the homoskedasticmodel,
but notin the heteroskedastic model. In the
latter,
themajor
genesegregation
wasonly
tested
through
thenon-normality
of the F2 group, while in theprevious
model theincrease of variance between Fl and F2 also contributed to this
testing.
CONCLUSION
In
general,
thegeneration
of backcrosses does notcompete
with theproduction
ofF2s alone as a
segregating population.
This isparticularly
true for an additive gene.The power of the detection test seems to be
poorly
sensitive to theproportion
ofF2s in the whole
population.
Theoptimum
appears to be50%
of F2s withequal proportions
ofPI,
P2 and F1.Large
dominant genes areeasily
detected in suchsmall
populations (fewer
than 200 individuals for a 2 standard deviations geneeffect).
Additive genes are lesseasily
detected.These results were obtained
by comparing
mixed withpolygenic
inheritance in the homoskedastic case. Toprevent
a lack of robustness due toheteroskedasticity,
a model
including
variance differences between F2s andparental populations
may be used. In this case, themajor
gene is detectedthrough
thenon-normality
ofthe
F2,
with a loss of power. Another extreme situation may be found if the differences betweengenetic types
are dueonly
to thesegregation
at themajor
locus.Comparing
thismonogenic hypothesis
to thepolygenic
one causesdifficulty
since these
hypotheses
are not nested. This may be solvedsimulating empirical quantiles
as done in thisstudy
orusing
the Akaike(1973)
criteria.REFERENCES
Akaike H
(1973)
Informationtheory
and an extension of MLprinciple.
In: 2nd Interna-tional
Symposium
onInformation theory (BN Petrov,
FCsahi, eds),
AkKiado,
Bu-dapest, Hungary,
267-281Boujenane I,
BradfordGE,
Famula TR(1991)
Inheritance of litter size and its componentsin crosses between D’Man and Sardi breeds of
sheep.
J Anim Sci69,
517-524Bradford
GE,
Famula TR(1984)
Evidence for amajor
gene forrapid postweaning growth
in mice. Genet Res Camb
44,
293-308Dickerson GE
(1973) Inbreeding
and heterosis in animals. In: Proc AnimBreeding
GenetSym
P
in honourof
Dr JLLush,
Am Soc Anim SciAssoc, Champaign, IL, USA,
54-77Duchet-Suchaux
M,
MenanteauP,
Le RouxH,
ElsenJM, Lechopier
P(1992)
Geneticcontrol of resistance to
enterotoxigenic
Escherichia coli in infant mice. MicrobiolPathogenesis 13,
157-160Elston
RC,
Stewart J(1973)
Theanalysis
ofquantitative
traits forsimple
models fromparental,
Fl and backcross data. Genetics73,
695-711Goffinet
B,
LoiselP,
Laurent B(1992) Testing
in normal mixture models when theproportions
are known. Biometrika79,
842-846Goffinet
B,
BekmannJ,
Boichard D et al(1994)
M6thodesmath6matiques
pour 1’etude desgenes
contr6lant des caract6resquantitatifs.
Genet Sel Evol26,
9s-20sHanset R
(1991)
Themajor
gene of muscularhypertrophy
in theBelgium
Blue cattle breed. In:Breeding for
Disease Resistance in the Farm Animal(JB Owen,
RFEAxford, Wallingford, eds),
CABInternational, UK,
467-478Janss
LLG,
Van der Werf JHJ(1992)
Identification of amajor
gene in Fl and F2 data when alleles are fixed in theparental
lines. Genet SelEvol 24,
511-526Loisel
P,
GoffinetB,
MonodH,
Montes de Oca G(1994) Detecting
amajor
gene in an F2population.
Biometrics 50, 512-516Mode
CJ,
Gasser DL(1972)
A distribution free test formajor
gene differences inquantitative
inheritance. Math Biosci14,
143-150Stewart
J,
Elston RC(1973)
Biometricalgenetics
with 1 or 2 loci: the inheritance ofphysiological
characters in mice. Genetics 73, 675-693Weber E
(1959)
Thegenetical analysis
of characters with continuousvariability
on amendelian basis. I.