HAL Id: hal-00894269
https://hal.archives-ouvertes.fr/hal-00894269
Submitted on 1 Jan 1999
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Marker assisted estimation of breeding values when
marker information is missing on many animals
Theo H.E. Meuwissen, Mike E. Goddard
To cite this version:
Original
article
Marker assisted estimation of
breeding
values when marker
information
is
missing
on
many
animals
Theo
H.E. Meuwissen
Mike E.
Goddard
Research Institute of Animal Science and
Health,
Box 65,8200 AB
Lelystad,
the Netherlandsb
Institute of Land and Food Resources, University of Melbourne,
Parkville,
Victoria 3052, Australia(Received
15 December1998;
accepted
4 June1999)
Abstract - Two methods are
presented
that use information from alarge population
of commercial
animals,
which have not beengenotyped
forgenetic
markers, tocalculate marker assisted estimates of
breeding
value(MA-EBV)
for nucleus animals,where the commercial animals are descendants of the marker
genotyped
nucleusanimals. The first method reduced the number of mixed model equations per
commercial animal to
one, instead
of oneplus
twice the number of markedquantitative
trait loci in conventional MA-EBV equations. Without this reduction, the time taken
to solve the mixed model equations
including
markers could be verylarge especially
if the number of commercial animals and the number of markers islarge.
The solutions of the reduced set of equations were exact and did notrequire
more iterations than the conventional set ofequations.
A second method wasdeveloped
for the situation where the records of the commercial animals were notdirectly
available to thenucleus
breeding
programme but conventional non-MA-EBVs and their accuracieswere available for nucleus animals from a
large
scale(e.g. national)
breeding
valueevaluation,
which uses nucleus and commercial information.Using
thesenon-MA-EBV,
the MA-EBV of the nucleus animals wereapproximated.
In anexample,
theapproximated
MA-EBV were very close to the exactMA-EBV. ©
Inra/Elsevier,
Parismarker assisted selection
/
breeding value estimation/
quantitative trait loci /DNA markers
Résumé - Évaluation génétique assistée par marqueurs quand l’information sur
les marqueurs est rare. On
présente
deux méthodes d’utilisation de l’informationprovenant d’une
grande population
d’animaux commerciaux, nontypés
pour desmarqueurs, en vue de l’évaluation
génétique
d’animauxtypés
dans les noyaux de*
sélection qui sont à
l’origine
des populations commerciales. Lapremière
méthode limite à une seuleéquation
du modèle mixte pourchaque
animal commercial au lieu de uneplus
deuxfois,
le nombre de locimarqués, quand
on utilise leséquations classiques
du BLUP assisté par marqueurs. Ceci permet de réduire substantiellement le temps
de calcul
quand
le nombre d’animaux commerciaux et le nombre de marqueurs sontgrands.
Les solutions de cesystème
réduit sont exactes et ne demandent pasplus
d’itérations que le
système classique d’équations.
La seconde méthode estproposée
quand
les données des animaux commerciaux ne sont pas directement accessibles aux sélectionneurs du noyau de sélection alors que leurs évaluationsclassiques
(non
assistées par
marqueurs)
le sont. Ces évaluations tiennent alors compte des données des animaux du noyau et hors noyau. Dans ce cas, la méthode estapprochée.
Sur unexemple,
cetteapproximation
a été trouvée trèsproche
de l’évaluation exacte assistéepar marqueurs.
©
Inra/Elsevier,
Parissélection assistée par
marqueurs
/ évaluation génétique
/ loci
à caractèresquantitatifs
/ marqueur
à ADN1. INTRODUCTION
Fernando and Grossman
[3]
presented
a method to calculate the best linear unbiasedpredicted-estimates
ofbreeding
values(BLUP-EBV)
using
the information that DNA markers are linked to aquantitative
trait locus((aTL).
Goddard[4]
extended the method to the use offlanking
marker information.Although,
these methods arerelatively
easy to use, the number ofequations
rapidly
becomeslarge
when there are many animals. Even withonly
one markedQTL,
there are threeequations
per animal: twoestimating
bothgametic
effects at theQTL
and one for thepolygenic
effect(the
joint
effect of thebackground
genes).
Every
extra markedQTL
increases the number ofequations
per animal
by
two.Moreover,
when theflanking
markers are close to theQTL,
theprobabilities
of double cross-overs become small and theequations
close to
singular,
and thus difficult to solve[13].
Meuwissen and Goddard[8]
avoided thesesingularity problems by assuming
anegligible probability
ofdouble recombinations within the
flanking
markers.As
genetic
markers become morefrequently
used in comnrercialbreeding
programmes, the situation will
commonly
arise whereonly
a small fraction ofthe animals have been
genotyped.
Thephenotypes
ofnon-genotyped
animalsmay,
however,
be vital to the calculation of the effects of markedQTL
as,for
instance,
in agranddaughter design
whereonly
bulls aregenotyped
butonly
cows arephenotyped.
Calculation of twoQTL
effects for each marker formany
non-genotyped
animals is wasteful and may inhibit theimplementation
of marker assisted selection. Hoeschele
[7]
greatly
reduced the number of equa-tions in verygeneral
population
structures,
but this method iscomplicated
and therefore difficult toapply
inpractice,
mainly
because it eliminates asmany
equations
aspossible.
A moresimple breeding
structure such as ageno-typed
nucleus andnon-genotyped
commercialpopulation
structure cangreatly
simplify
the elimination ofequations.
In some situations theorganisation
con-trolling
the nucleusbreeding
programme may not have access to the recordsThe aim of this paper is to
present
a method that reduces the number ofmarker assisted
breeding
value estimationequations
in apopulation
where thenucleus animals are marker
genotyped
and the commercial animals are notgenotyped.
The reductionmainly
eliminates theequations
ofnon-genotyped
animals.Furthermore,
anapproximate
method ofcalculating
MA-EBVs onnucleus animals is
presented,
which usesonly
the conventional non-MA-EBVsof nucleus animals from a national
genetic
evaluation torepresent
the data from commercial animals.2. METHODS
2.1.
Reducing
the number ofequations
The
population
wassplit
into nucleus and commercial animals.Here,
thedefinition of a commercial animal is: an animal that is not marker
genotyped
and has no descendants that aregenotyped.
The nucleus animals are allmarker
genotyped
animalsplus
their ancestors. The method will still work if a commercial animal iserroneously
considered as a nucleusanimal,
although
the number of
equations
will not be reduced for such an animal. The methodwill
fail, however,
if a nucleus animal iserroneously
considered as a commercialanimal. For
simplicity
weignored
fixed effectequations,
butincluding
them isstraightforward. Similarly,
we assumed hereonly
one markedQTL,
since theinclusion of more marked
QTL
isstraightforward.
Partitioning
thepopulation
into nucleus and commercialanimals,
the model can be written as:where yi
(y
2
)
is the vector ofphenotypic
records of nucleus(commercial)
animals;
al(a
2
)
is the vector ofpolygenic
effects of nucleus(commercial)
animals;
ql is the vector of markedQTL
effects of the nucleusanimals;
q2(q
3
)
is the vector ofpaternally
(maternally)
derivedQTL
effects of the commercialanimals;
el(e
2
)
is the vector of environmental effects of nucleus(commercial)
animals;
Z
l
is the incidence matrix ofpolygenic
effects of nucleusanimals;
Z
2
is the incidence matrix ofQTL
effects of the nucleusanimals;
andZ
3
is the incidence matrix ofpolygenic
effects of the commercial animals. Note thatZ
3
is also used as the incidence matrix of the
paternally
and of thematernally
derived
QTL
effects of the commercialanimals,
because these effects have thesame incidence matrix as the
polygenic
effects of the commercial animals. TheZ
2
matrix can differsubstantially
fromZ
l
when the inheritance ofQTL
effectsis traced from
parent
tooffspring by
the markers[8].
In order to solve the BLUPequations,
we need the inverses of the(co)variance
matrix of[a’ a’]
and of
[q’
q’
q’],
which are obtainedusing
the methods ofQuaas
[10,
11!
andFernando and Grossman
!3!,
respectively.
In order to reduce the number of
equations
of the commercialanimals,
the ’reduced animal model’approach
ofQuaas
and Pollak[12]
wasadopted.
Thisthe
equations
ofnon-parents
in a model withQTL
andpolygenic
effects. Were-write
equation
(1)
as:where U2 ! az + qz + q3. For the mixed model
equations
that follow fromequations
(2),
we need the inverse of the(co)variance
matrix of[a’ q!
1 U/ 2
1.
Following Quaas
[10, 11!,
we will assume that the animals within the nucleusand within the commercial are sorted from old to young.
Next,
we write everyelement of
[at
1 qf
1 uf 2 in
terms of its’parental’
elementsplus
anindependent
deviation from the
’parental’ elements,
where’parental’
elements denote theai, ql or U2 elements of the
parents
of the current animal:where P is an indicator matrix of the
parents
of ai, such thatP
ij
= 0.5 ifanimal j
is aparent
of animali,
and otherwiseP
ij
=0;
Q2! = B2!
ifQTL,
is withprobability
O
ij
a direct copy ofQTL,,
whereQTL
J
was one of the two’parental’
QTL
alleles ofQTL,,
with’parental’ denoting
thatQTL
j
was involved in theMendelian
sampling
process that resulted inQTL
I
,
and for all other i andj:
Qij
=0;
Rij
= 0.5 if nucleusanimal j is
a parent of commercial animali,
andotherwise
Ri!
=0;
Si!
= 0.5 if one of the twoQTL
of commercial animal i is adirect copy of the nucleus
gamete j
with aprobability
of 0.5(the
probability
isalways
0.5 because commercial animals are not markergenotyped),
otherwiseS
j
=0;
T
ij
= 0.5 when commercialanimal j
is aparent
ofi,
otherwiseT
ij
= 0.The elements
of E
l
,
E2 and E3 are allindependent,
unless the markers are notcompletely
informative,
i.e. it is notalways possible
to trace which marker isinherited from the sire and which from the dam. In the latter case, the elements
of E2 may be correlated and the method of
Wang
et al.[14]
can be used to set up(the
inverseof)
the(co)variance
matrix of theQTL
effects of the nucleus animals. The calculation of the(co)variance
matrix of theQTL
effects of the nucleus animals becomes even morecomplex
when ancestors of nucleus animals havemissing
markergenotypes;
however,
for thissituation,
Wang
et al.provide
anapproximate
method to set up the(co)variance
matrix ofQTL
effects. Wewill
ignore
thesecomplications
ofobtaining
the inverse of the(co)variance
matrix of the
QTL
effects of the nucleus animalshere,
because the method that is used to obtain the inverse of this(co)variance
matrix does not affect thesetting
up of the inverse of the(co)variance
matrix of the u2equations.
This is because the situation of uninformative marker information andungenotyped
ancestors of
genotyped
animals did not occur within the group of commercialanimals,
since none of the commercial animals weregenotyped.
where
D
l
is adiagonal
matrix withD
iii
equal
toQ
a, 0.75
Q
a
or0.5
Q
a
when
no, one or both
parents
are known of nucleus animali,
respectively; D
2
is adiagonal
withD
2ii
equal
toa) or
2Bi!(1 -
0g )a) when
gamete
i is a foundergamete
or is derived fromgamete j
with
probability
Bi!
!3!,
respectively;
andD
3
is adiagonal
withD
3ii
equal
toQ
u, 0.75
Q
u
or0.5u!,
when no, one or bothparents
of commercial animal i areknown, respectively,
where0
’
=a2
+2
Q
q.
Next we solveequation
(3)
for v’ =[a’ q’ u’]
to obtain:Taking
variances on both sidesyields,
Finally
the inverse ofVar(v)
isG-
1
which is obtained as:Similar to
Quaas
(10, 11!,
thefollowing
rules can be found to set upG-
1
.
1)
For thepolygenic
effects of the nucleus animalspart
ofG-
1
:
followQuaas’
rules(multiply
by
I/or2
to
account for the different variances in differentparts
of
G-1).
2)
For theQTL
effects of the nucleus animalspart
ofG-
1
:
follow the rules of Fernando and Grossman[3]
(multiply
by
1/
Q9
).
3)
For thegenetic effects,
u2, of commercial animal i:- if both
parents
are unknown: add1/
Q
u
to
position
(i, i);
-
if one
parent
s is known withQTL
alleles al and a2 add to the indicatedIf there are no
equations
for theQTL
alleles al and a2, i.e. s was a commercialanimal,
the additions to theirpositions
arecancelled,
and the additionssimplify
to the
original
rule ofQuaas
[10, 11!;
- if both
parent
s and d of animal i are known withQTL
alleles al and a2of s and alleles a3 and a4 of
d,
add to the indicatedpositions:
If there are no
equations
for theQTL
alleles ai, , a2 a3and/or
a4 theadditions to their
positions
are cancelled. When all alleles ai, a2, a3and/or
a
4 have no
equations,
the additionssimplify
to theoriginal
rule ofQuaas
[10, 11].
As can be seen from the above
additions,
the commercial animals add thesame values as in
Quaas’
rules to the elements of theirparents,
but if theparents
are nucleus animals these values are added to theirpolygenic
andQTL
effects.After
setting
up theG-
1
matrix,
we can set up and solve Henderson’s[6]
mixed model
equations:
and
Q
e
is the environmental variance.These
equations
willyield
exact solutions of the estimates ofpolygenic
(a
l
)
and
QTL
effects(q
l
)
of the nucleusanimals,
and of the sum of thepolygenic
andQTL
effects of the commercial animals(u
2
)
(unless
approximations
haveto be
applied
forsetting
up the(co)variance
matrix of theQTL
effects of the nucleus animalsowing
tomissing
markergenotypes
of ancestors of nucleusanimals).
A smallexample
of the calculation of theG-
1
matrix isgiven
inAppendix
A.2.2. The use of conventional EBV to
predict
MA-EBVIn the case of cattle
breeding
schemesespecially,
the commercial animalsmay not be owned
by
thebreeding
organisation
and thisorganisation
may notnational
breeding
value evaluation. We would like to use this information toimprove
the accuracy of the marker assistedbreeding
value estimates in the nucleus. Thisproblem
is similar to that ofincorporating
AI sire evaluations into intraherdbreeding
valuepredictions by
Henderson(5!,
and ourapproach
will therefore also be similar to that of Henderson.
The first step is to absorb the commercial animal
equations
into the nucleusequations,
which will reveal which information from the commercial animals is needed. The full mixed modelequations
are[writing
outequations
(8)
and(5)]
see
(8bis)
in thefollowing
page.Absorption
of the commercial animalequations
(u
z
)
yields equation
(9),
shown in the
following
page, where B =D-’ - D3l(I -
T)(Z3Z
3
+(I
-T
/
)D3
l
(I -
T)]-l(I - T/)D3l,
and b =D3
l
(I -
T)(Z3Z
3
+(I -
T
/
)D3
l
(I-T)]!Zgy2.
Note thatequation
(9)
reduces to the MA-EBVequations
of the nucleus animals withoutaccounting
for any information of commercialanimals,
if B and b are set to zero. The term R’BR leads to additions to theequations
of the nucleusparents
of the commercial animals.Similarly,
S’BS leads toadditions to the
equations
of theQTL
that are carriedby
the nucleusparents
of the commercial animals.
Further,
R’BS leads to additions to the animal *QTL
block of theequation
(9)
of the nucleusparents
(of
commercialanimals)
and theirQTL
effects. The terms R’b and S’b result in additions to theright
hand side of theequations pertaining
to theparents
of nucleus animals and theirQTL effects, respectively.
We willapproximate
these termsR’BR, S’BS,
R’BS,
R’b and S’busing
the results from a conventional national evaluationof
breeding
values.The solutions of EBV of nucleus animals of the conventional national evaluation should
equal
the solutions from theequations
of the nucleus animals afterabsorption
of the commercial animals. The conventionalequations
for nucleus animals afterabsorption
of commercial animals are:where EBV is a vector of conventional EBV of nucleus animals
(known
from nationalevaluation),
M =[Z’Z
l
+(I -
P)’D-’(1 -
P)!e u!/u!],
whichis the coefficient matrix of the conventional mixed model
equations
whenonly
information from nucleus animals is used(note
that(I -
P)
/
D1l (I
-P)
l
a§
equals
the inverse of therelationship
matrix of the nucleusanimals).
Note also that the additions R’BR and R’b are the same as those in the MA-EBVequation
(9).
Hence,
if we obtainapproximations
for R’BR andR’b
inequation
(10)
we canapproximate equation
(9).
We know the EBVand their
accuracies,
ri, which result fromequation
(10).
Let the matrixC =
(M
+R’BR)-
1
,
then thediagonal
elements of C are:where
only
thediagonal
elements of C are known. Thediagonal
elements ofA,
!ii, yield
the effective number of records that should be added to a nucleus animali,
such that the accuracy of its EBV isequal
to the accuracy when the commercial animals were included. A similar effective number of records wasderived
by
Henderson!5!,
but in his situation the animals within the herd didnot contribute
significantly
to the EBV of the sire.Here,
we used thefollowing
iteration scheme to
disentangle
the information that came from the nucleusanimals,
which isrepresented by
the matrixM,
and the information that comesfrom the commercial
animals,
which isrepresented by
the matrix A.Newton’s iteration
algorithm
was used to calculate thediagonal
matrixA such that
diag((M
+0)-
1
)
=diag(C),
wherediag(X)
denotes a vectorcontaining
thediagonal
elements of the matrix X. Let the vector 6 =diag(A).
The iteration scheme estimates b
by:
step
1: a firstapproximation
A
[O]
or,equivalently,
6
[o]
is obtained from:step
2:improve
6by
Newton-Raphson
iteration:where
[p]
denotes thepth
iteration;
and H is a matrix of derivatives ofdiag((M
+D)-’)
withrespect
tob,
which can be shown toequal
- (M
+A)-’
*(M
+A)-’,
where * denotes elementby
elementmultiplica-tion.
Given the
approximated
mixed model coefficient matrix of the nucleus animals afterabsorption
of the commercialanimals,
M+ A,
anapproximation
of the
right
hand side ofequation
(10),
is obtained from:where ARHS is an
approximation
of the term R’b inequation
(10).
Since,
EBV and
Ziy
l
areknown,
ARHS can be calculated from the aboveequation.
Next we will calculate the absorbed coefficient matrix of the marker assisted mixed model
equation
(9),
and theirright
hand side. From theprevious
sectionwe concluded that we could
approximate
R’BR
I
by D
ii
,
whereR
i
is the ith column of R. The vectorR
i
indicates which commercial animal is anoffspring
of nucleus animal
i by containing
a1/2
if the commercial animal is anoffspring
of i or a 0 otherwise. If al is one of the
QTL
alleles of nucleus animali,
thea
l
th
column of
S, S
al
,
contains a1/2
if the commercial animal is anoffspring
ofanimal i. If every nucleus animal has two
unique
QTL
alleles,
as in the model of Fernando and Grossman!3!,
it follows thatR
i
=S
al
=S
a2
,
with al and a2denoting
theQTL
alleles of animal i. Hence:where ax denotes al or . 2a
Thus,
the additionD.ti
to thediagonal
of thepolygenic
equation
of the nucleus animal i should also be added to theoff-diagonal
of thepolygenic equation
i andQTL
alleleequation
al and a2; to thediagonal
of bothQTL equations
al and a2; and to theoff-diagonal
elements ofa
l and a2. And the term
ARHS
I
should be added to theright
hand side of theequation
of animali,
and of theQTL
equations
al and a2. Inconclusion,
to account for the information of commercial
animals,
for every nucleus animali we add to the coefficient matrix of the MA
equations
of the nucleus animals thatignores
information of commercial animals:where al and a2 denote the
equations
for theQTL
effects of animali;
and weadd to the
right
hand side of these nucleusequations
for every nucleus animal i:Thus,
the additions(11)
and(12)
result in anapproximation
of the marker assisted nucleusequations
(9)
using only
the EBV and accuracies to accountfor the information of commercial animals.
The
equality
ofR
i
toS
a£
requires
that theQTL
allele a!. isonly
present
in one animal i.However,
in the model of Meuwissen and Goddard!8!,
QTL
allelesmight
be traced fromparent
tooffspring
withcertainty,
becauseflanking
markers were used and double recombinations wereignored.
In this model different animals may carry the sameQTL
allele a!, andS
ax
=Ei,
7
Ax
Ri,
where the summation is over all animals i that carry
QTL
allele ax. Thiscomplication
ofS
ax
being
the sum of severalR
i
terms does not affect the additions inequations
(11)
and(12)
which are due to terms that are linear inS
ax
,
because the correct additions are stillperformed
as all the animalscontributing
toS
ax
are evaluated.However,
the additions to theQTL
allele*
QTL
allele block ofequation
(11),
are due to second order terms ofS
ax
,
whichimplies
that more offdiagonal
terms of theabsorption
matrix B haveto be added. We will
ignore
these extra offdiagonal
terms ofB,
which are dueto the second order terms of
Sa,!,
andperform
the additions as described inequation
(12),
which adds another level ofapproximation
to this method.In the
above,
the fixed effect structure of the nucleus animal data wasignored,
but can be accounted forby absorbing
the fixed effectequations
into theequations
of the nucleusanimals,
i.e. the matrix M would be the conventional mixed model coefficient matrix afterabsorption
of fixed effects.Alternatively,
ifabsorption
of fixed effects iscomputationally
toodemanding,
the
following
steps
can be undertaken to account for fixed effects:(M
+![1’]) -1
isreplaced by
the animal * animal block of:where X is the
design
matrix of the fixed effect structure of the nucleusdata;
step
2: if the fixed effect solutions are not available from the nationalbreeding
valueevaluation,
solve for the fixed effectsolutions, /3, using:
step
3: calculate ARHSusing:
The above methods that account for fixed effects assume that different fixed
effects are estimated in the nucleus than in the commercial
animals,
which will be the case in most situations. A briefexample
of the use of non-MA-EBV inthe estimation of MA-EBV of nucleus animals is
given
inAppendix
A.2.3. Simulation
A data set was simulated to test whether the reduced number of marker
as-sisted mixed model
equations
indeedyielded
the same solutions as theoriginal
full set of
equations,
to compare the number of iterations needed to solve the reduced set ofequations
and theoriginal equations
(which
might
be morediag-onally
dominant),
and to test theapproximate absorption
of allequations
for commercial animals. The data set resulted from fivegenerations
of simulation of a nucleus and a female commercialpopulation,
where the nucleus animals areselected on conventional
BLUP-EBV,
and the unselected commercial females are mated to the selected sires of the nucleus. Theparameters
of the simulated data set arepresented
in table I. MA-EBVs were calculated in a manner similarto that of Meuwissen and Goddard
[8]
in which it is assumed that if markerscannot trace the inheritance of
QTL
alleles fromparent
tooffspring,
then theQTL
allele inherited is treated asequally likely
to be either of the two alleles inthe
parent
(the
possibility
that asegregation
analysis
of the marker datamight
improve
thisprediction,
wasignored).
Theprobability
that the markers couldnot trace the inheritance of the
QTL
was assumed to be0.1,
which occurred in158 instances in the nucleus. In these instances a new
QTL
effect waspostu-lated and estimated.
Including
the 400 founderQTL
effects(=
2 * 200 founder nucleusanimals),
the number ofQTL
effects of nucleus animals was 558. Thecommercial animals were not marker
genotyped
and so noQTL
effects could betraced. If no
equations
wereeliminated,
this would result in 10 000(= 2 *
5 000commercial
animals)
commercialQTL
effects.The
equations
were solvedby
Gauss-Seidel iteration. The convergencewhere SS is the sum of squares of the deviations of the left hand side from the
right
hand side of theequations;
SST is the sum of squares of the solutions. Thenumber of iterations needed to reach this convergence criterion is a
(imperfect)
measure of howeasily
theequations
could be solved. This measure is notperfect
because the solution vectors of both methods are not the same, and SS may be
3. RESULTS
Table II shows the results of the EBV calculations. Without any reduction in
the number of
equations,
the total number ofequations
is: 16 558(6 000
animal and 10 558QTL
effects).
When theQTL equations
of the commercial animalswere
eliminated,
the number ofequations
was reducedby
10 000. Inpractice,
this
figure
will often be muchlarger,
because the commercialpopulation
will be muchlarger
than in the simulation.Furthermore,
the number of iterations thatwas needed to reach the convergence
criterion,
was smaller with the reducedset of
equations.
Thissuggests
that the reduced set ofequations
was not anyharder to solve than the
original large
set ofequations.
The solutions to bothsets of
equations
werevirtually
identical(result
notshown),
except
that the reduced set did notyield
estimates of individualQTL
effects of commercial animals. If the estimates of theQTL
effects of the commercial animals wererequired,
they
could be obtainedby
backsolving
(see
Appendix
B).
Table IIIshows the results of the
approximate
methodusing
conventional EBV of nucleus animals to estimate the MA-EBV of the nucleus animals. When we want topredict
the totalbreeding
value of theanimals,
ui, the use of conventional EBVof nucleus
animals,
instead of thephenotypes
of commercialanimals,
leads tovery accurate
predictions
of MA-EBV in the simulated data set. Thepredictions
of the individualQTL effects,
qi, were alsoaccurate,
i.e. the correlation andregression
is close to 1. In thisapproximate
method the number ofequations
was
only
1 558.4. DISCUSSION
4.1. Reduction in the number of
equations
A
simple
method waspresented
that reduced the number ofequations
of non-markergenotyped
commercial animals from three to one, where the latterequation
estimates the totalbreeding
value(polygenic
plus
QTL)
of the commercial animals. The reduction in the number ofequations
waslarge
whenthe number of commercial animals was
large,
and the reduced set ofequations
An alternative to this method is the use of a reduced animal model
[1, 2!,
which would eliminateequations
for commercial animalsonly
ifthey
arenot
parents.
Thus the methodpresented
eliminates moreequations
than the reduced animal modelapproach
but less than Hoeschele(7!.
Theimportance
ofusing
the data on commercial animals whencalculating
MA-EBVs for the nucleus animals is illustratedby
the case of agranddaughter design
where allphenotypic
data come from the commercialgranddaughters.
The methodproposed
could be used in a nationalgenetic
evaluation where the number of additionalequations
would beproportional
to the size of the nucleus.Extension of the method to more marked
QTL
isstraightforward.
Let theparental QTL
alleles of the first markedQTL
be denotedby
ai, az(a
3
,
a
4
),
and those of the second marked
QTL by b
l
,
b
2
(b
3
,
b
4
),
where the elements between the brackets are needed when a second nucleusparent
is known. Nowextra rows and columns for
b
l
,
b
2
,
3
(b
b
4
)
areaugmented
to the additions inequations
(6)
and(7),
where the values in theaugmented
rows and columns are the same as those in the rows and columns of , ia a2(a
3
,
a
4
).
Also,
the offdiagonal
elements betweenaj
and bh are the same as those between aland a2 for
all j
and k.Hence,
with n markedQTL,
the number ofequations
of commercial animals reduces from 2n, + 1 tojust
one. In many marker assisted selectionschemes,
there may also be many nucleus animals that arenot
genotyped,
namely
the old ancestors of the nucleus that were born before markergenotyping
started. Somecryo-conserved
semen of old sires may stillbe available for
genotyping,
but many old ancestors will remain non-markergenotyped.
Insituations,
where the old ancestors result in acomputationally
unmanageable
number ofequations,
theapproach
of Hoeschele[7]
eliminates allQTL
equations
ofnon-genotyped
ancestors that are not on agenetic
pathway
between two marker
genotyped
animals.Although
more difficult toapply,
this method can result in a substantial reduction in the number ofQTL
equations
ofnon-genotyped
ancestors. In thepresent
simulated dataset,
all nucleus animalswere
genotyped
and a further reduction in the number ofequations
was notobtained
by using
theapproach
of Hoeschele(7!.
The rules
presented
forsetting
up theG-
1
matrix,
did not account for theinbreeding
of the animals. Ifinbreeding
is notnegligible, Quaas’
[10, 11!
rulescan be used to account for
inbreeding
in the nucleus animal * nucleus animalpart
of theG-
1
matrix,
which results inreducing
the elements of theD
1
matrixby
a fractionequal
to the averageinbreeding
coefficient of theparents.
Also,
Wang’s
[14]
method accounts forinbreeding,
where theinbreeding
coefficientis calculated at the
QTL given
the marker information. In theequations
of the commercialanimals,
the averageinbreeding
coefficients can be accounted forby
reducing
theo,2
term
in additions(6)
and(7)
by
a fractionequal
to the averageinbreeding
coefficients of theparents.
The latter will beslightly
biased because the averageinbreeding
coefficient of theparents
will be different at theQTL.
This bias can be corrected
by using
aweighted
averageinbreeding coefficient,
where the conventional
inbreeding
coefficientaveraged
over theparents,
theinbreeding
at theQTL
of thesire,
and that at theQTL
of thedam,
areweighed
4.2. Use of conventional EBV to
predict
MA-EBVA method was
developed
that uses the information of conventional EBV ofnucleus animals and their accuracies instead of the data on commercial animals
to increase the accuracy of MA-EBV of nucleus animals. In the simulated data
set,
theapproximate
MA-EBV,
which used conventionalEBVs,
were very close to theoriginal
MA-EBV based on the full set ofequations
which used thephenotypic
records from the commercial animals. The method wasalso tested in a
granddaughter design
!15!,
where agenotyped grand
sire hasgenotyped
sons which have conventional EBVs based ondaughter
records. Inthis
situation,
theprediction
of theQTL
effects of the sons and thegrand
sires was identical to when the
original
records of thedaughters
had been used(result
notshown).
This method could be usedby
abreeding
organisation
controlling
the nucleusbreeding
programme withoutincluding
any informationon commercial animals. The method can be
compared
to the use ofdaughter
yield
deviations to represent data on the(commercial)
daughters
of a nucleusbull.
However,
this method uses all commercial descendents of a nuclear animal(via
its conventionalEBV)
and avoids doublecounting
of the information from descendents in the nucleus. The method could be extended forQTL
detection studies based on REML estimation of the variance due to aQTL,
bracketedby
the markers.The
approximate
method toincorporate
EBVs from the commercial animalsinto the nucleus MA-EBV is similar to the use of
foreign
EBV in the national evaluation of acountry.
Except
that theforeign
EBV calculation does(almost)
not use local
information,
such that theforeign
EBVyield independent
extrainformation.
Hence,
the accuracy of theforeign
EBV can bedirectly
converted into an effective number of records(or daughters)
that is added to thediagonals
of the coefficientmatrix,
and intoderegressed proofs,
i.e. extra records(or
daughters)
are invented that contain the information of theforeign
EBV.Here,
the situation was morecomplicated
because the conventional EBV of the nucleus animalalready
contained the information of the commercialanimals,
which made it more difficult to determine the extra information.The calculation of the MA-EBV
using
conventional EBV of commercial animals reliesstrongly
on the accuracy of the conventional EBV. In thepresent
simulationstudy,
the accuracies were calculatedby
inversion of the conventional mixed modelmatrix,
i.e. exact accuracies were used. In the case of a nationalevaluation of
EBV,
the number ofequations
is toolarge
for direct inversion and the accuracies have beenapproximated.
Theseapproximations
are oftengood
(e.g. !9!).
However,
poorapproximations
of the accuracies of the conventionalEBV will
probably
reduce the accuracies of the MA-EBVsubstantially.
In anycase, the method
presented
here seems to make as much use aspossible
ofthe conventional EBV of national evaluations to
improve
the accuracies of theMA-EBV of the nucleus animals.
REFERENCES
[1]
BinkM.C.A.M., Quaas R.L.,
Van ArendonkJ.A.M., Bayesian
estimation ofdispersion
parameters with a reduced animal modelincluding polygenic
andQTL
[2]
CantetR.J.C.,
SmithC.,
Reduced animal model for marker assisted selectionusing best linear unbiased
prediction,
Genet. Sel. Evol. 23(1991)
221-233[3]
Fernando R.L., Grossman M., Marker-assisted selection using best linearun-biased
prediction,
Genet. Sel. Evol. 21(1989)
467-477.[4]
GoddardM.E.,
A mixed model foranalyses
of data onmultiple genetic
markers, Theor.
Appl.
Genet. 83(1992)
878-886.[5]
HendersonC.R.,
Use of AI relatives in intraherdprediction
ofbreeding
valuesand
producing
abilities, J.Dairy
Sci. 58(1975)
1910-1916.[6]
HendersonC.R., Applications
of linear models in animalbreeding,
Can. Catal.Publ. Data, Univ.
Guelph, Guelph,
1984.[7]
HoescheleI.,
Elimination of quantitative trait lociequations
in an animalmodel
incorporating genetic
marker data, J.Dairy
Sci. 76(1993)
1693-1713.[8]
Meuwissen T.H.E., Goddard M.E., The use of markerhaplotypes
in animalbreeding
schemes, Genet. Sel. Evol. 28(1996)
161-176.[9]
Misztal L,Wiggans G.R., Approximation
ofprediction
error variances inlarge-scale
animalmodels,
J.Dairy
Sci. 71(Suppl. 2) (1988)
27-32.[10]
Quaas
R.L.,Computing
thediagonal
elements of alarge
numeratorrelation-ship
matrix, Biometrics 32(1976)
949-953.[11]
Quaas
R.L., Additive genetic model with groups andrelationships,
J.Dairy
Sci. 71
(1988)
1338-1345.[12]
Quaas
R.L., Pollak E.J., Mixed modelmethodology
for farm an ranch beefcattle
testing
programs, J. Anim. Sci. 51(1980)
1277-1287.[13]
Ruane J., Colleau J.J., Marker-assisted selection for a sex-limited character in a nucleusbreeding population,
J.Dairy
Sci. 79(1996)
1666-1678.[14]
Wang T.,
FernandoR.L.,
Van der BeekS.,
Grossman M., Van ArendonkJ.A.M.,
Covariance between relatives for a marked quantitative trait locus, Genet.Sel. Evol. 27
(1995)
251-274.[15]
WellerJ.L,
Kashi Y., SollerM.,
Power ofdaughter
andgranddaughter
de-signs
fordetermining linkage
between marker loci and quantitative trait loci indairy
cattle,
J.Dairy
Sci. 73(1990)
2525-2537.APPENDIX A
An
example
illustrating
the calculation of the inverse of thegenetic
covariancematrix, G,
for the reduced set ofequations
The
pedigree
and markergenotypes
of five animals aregiven
in table AI. The markergenotype
of animal 5 isunknown,
and this animal has no descendantswith known marker
genotypes.
Hence, given
the definition in the maintext,
animal 5 is a commercial
animal,
and itsQTL
alleleequations
will be absorbed.Let the
polygenic
andQTL
allelic variance beequal
to one, i.e.Q
a
=Q9
=1,
and
a
=o
l2
+
2a)
= 3. The firststep
is to setup the inverse of the
polygenic-nucleus animal
part
of thegenetic
covariancematrix,
G-
1
,
where the nucleus animals are1, 2,
3 and 4. Thispart
of theG-
1
matrix is obtainedby using
The second
step
is to set up theQTL
effectspart
of theG-
1
matrix. Forsimplicity,
we will assume here that the recombination rate between the markerand the
QTL
is zero.(This
situation is similar to the situation whereflanking
markers are used to trace theQTL alleles,
and the double recombination rateis assumed
negligible.)
Because the recombination rate is zero, the inheritanceof the
QTL
alleles is identical to that of the markeralleles,
i.e. there are fourQTL
alleles ql, q2, q3and , q4and the presence of marker alleleA
x
also indicatesthe presence of
QTL
allele q!, where x =1, 2,
3 or 4. TheQTL
allelepart
of theG-
1
matrix is calculatedusing
the rules of Fernando and Grossman[3]
orWang
et al.!14!,
andyields
anidentity
matrix here because theQTL
allelesq l
, q2, q3 and q4 are all founder alleles. After this
step,
thepolygenic
andQTL
allele
part
of the nucleus animals of theG-
1
matrix is:The third
step
is to add theequations
for the commercial animals to nucleus animalequations
using
additions(6)
or(7)
of the main text.Here,
only
animal5 is a commercial animal and it has both
parents
known(namely
animals 3 and4)
such that addition(7)
applies.
Note that1/
Q
u
is
1/3,
such that thecomplete
Use of conventional EBV to
predict
MA-EBVLet us consider
again
theexample
of tableAI,
and make use of theconven-tional non-MA-EBV that are calculated for all
animals,
but areonly
availableon the nucleus animals 1-4
together
with their accuracies. These EBVs and their accuracies are calculatedassuming
an error variance ofQ
e
= 3. The firststep
is to obtain a firstapproximation
of the extra information due to the commercial animalsby calculating
b
[ol
=diag(C-
1
) -
diag(M),
where the ithdiagonal
element ofC-’
is!/(1 -
r?),
with A =U2/0
,2
=1;
and M is thecoefficient matrix of the conventional animal model
equations
for the nucleus animals:Hence,
6
[o]
= diag(C-
1
)-diag(M)
_
[-0.6326
-0.6326 -0.2222-0.2222]’.
In
step
2 we first set up the H - matrix of derivatives ofdiag((M
+!) -1 )
with
respect
to 6:where
0!!!
= adiagonal
matrix with the elements of6!0!
on thediagonals.
Next we calculate:The
update
of 6 is now obtained from:After three more
updates
of6 by
equations
(A2)
and(A3)
the values of 6converged
to[0.03
0.03 0.360.36!’,
which areequal
to theA,
in addition(11).
Next we set up the marker assisted mixed modelequations
of the nucleusof these
equations
is obtained from(Al),
such that the coefficient matrix of theseequations
is:where
Z
l
is thedesign
matrix ofpolygenic effects,
i.e. a(4 * 4)
identity matrix;
Z
2
is thedesign
matrix of theQTL effects,
i.e.Z
2
=[1
1 00; 0 0 1 1; 1 0 1 0;
0101];
and the factor 3 is due toQ
e
= 3.Using
the estimates of 6 or,equivalently, A
ii
,
we nextperform
the additions inequation
(11)
to obtain the coefficient matrix of the mixed modelequations
that does account for the information of commercial animal 5:Next,
thepolygenic
*polygenic
part
of theseequations,
i.e. the(1:4, 1:4)
block,
should bemultiplied by
the non-MAS-EBV(see
tableAI)
to obtain thenew
right
hand side. This newright
hand side deviates from theoriginal
handside,
i.e.Z!y,
by
ARHS =[0.03 -4.03 0 4.36]’.
This ARHS is used toperform
additions
(12)
to obtain theright
hand side of the MA-mixed modelequations
of the nucleus animals:The solutions from the coefficient matrix
(A4)
and theright
hand side(A5)
are
[0.887 -
0.887 0.177 0.823 - 0.759 1.20 - 0.20 -0.241]’,
which are theAPPENDIX B
Back-solving
for theQTL
effects of the commercial animals in the reduced set ofequations
The reduced set
of
equations
yields
estimates of v’ =[a!
ql u2! (see text).
Given the estimate
V,
we can solve for the Mendeliansampling
components:
s =
Vv,
where V and E are defined inequation
(4).
Theequations
of the commercial animalsyield
no information toseparate
the Mendeliansampling
components
of the u2effects,
E3, into thecomponents
due toQTL
effects and due topolygenes.
Hence,
thesplitting
of thesecomponents
isproportional
totheir variance
components,
i.e.where
e
31
ande
32
are the Mendeliansampling
components
of a2 and q2 effectsof the commercial
animals,
with q2denoting
theQTL
effects sorted such that thepaternal
QTL
effect of an animal isalways
followedby
its maternalQTL
effect. Next the estimates of a2 and q2 are obtained
by solving:
where
S
*
(T
*
)
is matrix with element(i, j)
equal
tohalf,
if nucleus(commercial)
QTL,
is a’parental’ QTL
of commercialQTLi,
and zerootherwise; a
l
andq
l
are known from the MA-EBV evaluation of the reduced set of
equations;
theelements of