HAL Id: hal-00894093
https://hal.archives-ouvertes.fr/hal-00894093
Submitted on 1 Jan 1995
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Covariance between relatives for a marked quantitative
trait locus
T Wang, Rl Fernando, S van der Beek, M Grossman, Jam van Arendonk
To cite this version:
Original
article
Covariance between
relatives
for
a
marked
quantitative
trait
locus*
T Wang
1
RL Fernando
S
vander
Beek
M
Grossman
JAM
vanArendonk
1
University
of Illinois, Department of
AnimalSciences, 1207,
WGregory Drive, Urbana,
IL61801, USA;
2
Wageningen
Agricultural University, Department of
AnimalBreeding,
PO Box338,
6700 AH
Wageningen,
The Netherlands(Received
6April 1994; accepted
lstFebruary
1995)
Summary - Best linear unbiased
prediction
(BLUP)
can beapplied
to marker-assisted selection. Thisapplication requires computation
of the inverse of the conditional covariance matrix(G
v
)
of additive effects for thequantitative
trait locus(QTL)
linked to the marker locus(ML),
given
marker genotypes. This paper presentstheory
andalgorithms
to constructG
v
and to obtain its inverseefficiently.
Thesealgorithms
aresuf&ciently
general
to accommodate situations(1)
wherepaternal
or maternalorigin
of marker alleles cannotbe determined and
(2)
where the marker genotypes of some individuals in thepedigree
are unknown.genetic marker
/
marker-assisted selection/
best linear unbiased prediction/
co-variance between relatives
/
gametic relationship*
Supported
in partby
the IllinoisExperiment Station,
HatchProjects
35-0345(RLF)
and 35-0367(MG).
**
Correspondence
and reprints.Résumé - Covariance entre apparentés pour un locus de caractère quantitatif marqué. La meilleure
prédiction
linéaire sans biais(BL UP)
s’applique
à la sélection assistée par marqueur. Cela demande d’inverser la matrice(G
v
)
des covariances entreapparentés
deseffets génétiques additifs
du locusquantitatif
lié au locus marqueur,covariances conditionnelles aux
génotypes
du marqueur. Cet articleprésente
la théorie et lesalgorithmes
pour établirG
v
et pour obtenir son inverse d’une manièreef!îcace.
Cesalgorithmes
sont assezgénéraux
pourprendre
en compte des situationsi)
oùl’origine
paternelle
ou maternelle des allèles marqueurs ne peut pas êtredéterminée,
ii)
où legénotype
marqueur de certains individus dans lepedigree
n’est pas connu.marqueur génétique
/
sélection assistée par marqueur/
meilleure prédiction linéaireINTRODUCTION
Theory
for covariance between relativesprovides
the basis for use of data from rel-atives ingenetic
evaluation. Atpresent,
genetic
evaluations in animalpopulations
areprimarily
obtainedby
best linear unbiasedprediction
(BLUP;
Henderson1973)
using
traitphenotypes
(T-BLUP).
Due to advances in molecularbiology, genetic
markers arebecoming increasingly
available for use ingenetic
evaluation. Severalapproaches
for use ingenetic
evaluationusing
markergenotypes
and traitpheno-types
have been discussed(Geldermann,
1975;
Soller,
1978;
Soller andBeckmann,
1982;
Smith andSimpson,
1986;
Kashi etal,
1990).
Inaddition,
Fernando and Grossman(1989)
described how BLUP can be used forgenetic
evaluationusing
marker
genotypes
and traitphenotypes
(TM-BLUP).
Somestrategies
have beenproposed
to make TM-BLUPcomputationally
efficient(Cantet
andSmith, 1991;
Hoeschele,
1993;
van Arendonk etal,
1994).
TM-BLUP has also been extended to accommodatemultiple
markers(Goddard,
1992;
van Arendonk etal 1994).
TM-BLUP
requires
computation
of the inverse of the conditional covariance matrix(G
v
)
of additive effects for thequantitative
trait locus linked to the markerlocus, given
markergenotypes.
Tocompute
thisinverse,
Fernando and Grossman(1989)
provided
analgorithm
thatrequired
information on theparental
(paternal
ormaternal)
origin
of markeralleles,
in addition to information on markergenotypes.
Theparental
origin
of marker alleles in anindividual, however,
is notalways
known. Forexample,
if 2parents
and theiroffspring
each hasgenotype
A
l
A
2
at the same markerlocus,
marker alleleA
1
in theoffspring
could have descended from either of theparents,
thus theparental origin
ofA
1
in theoffspring
is unknown.The
objective
of this paper is topresent
theory
andalgorithms
tocompute
the conditional covariance matrix and its inverse whenparental origin
of the marker alleles may not be known.Theory
andalgorithms
aredeveloped
forpedigrees
where the markergenotype
of each individual is known(complete
markerdata).
Application
of thistheory
isgiven
forpedigrees
where the markergenotype
of someindividuals is unknown
(incomplete
markerdata).
Wang
et al(1991)
presented,
withoutproof,
a recursiveequation
to constructG
v
and an efficientalgorithm
tocompute
its inverse. This recursiveequation
has been usedby
van Arenonk et al(1994)
and Hoeschele(1993).
In thepresent
paper, we prove that the recursiveequation
holds when marker data arecomplete,
but does not holdgenerally
when marker data areincomplete.
Chevalet et al
(1984)
have described a method tocompute
G
v
given
markerphenotypes.
This method does notrequire knowing
theparental origin
of marker alleles and can accommodatemissing
markerphenotypes.
Themethod,
however,
is notcomputationally
feasible for thelarge pedigrees typically
encountered in animalbreeding. Computation
of the conditional covariance matrix and its inverse become feasibleby conditioning
on markergenotypes
instead of markerphenotypes.
NOTATION AND ASSUMPTIONS
Consider a
single
polymorphic
marker locus(ML)
closely
linked to aquantitative
trait locus((aTL),
which will be referred to as the markedQTL
(MQTL).
Assumedenote 2 alleles at the
ML,
and letQI
andQ;
denoteMQTL
alleles linked toM/
andMl
as shown belowIf the 2 marker alleles for individual i are
known,
thenthey
will bearbitrarily
labelled asMi
andM
2
.
Forexample,
suppose individual i has marker allelesA
3
andA
l
,
thenA
3
can be labelled asMi
andA
1
asM?,
orA
1
can be labelled asM!
andA
3
asM?.
If the 2 marker alleles for individuali,
however,
areunknown,
Mi
can be any of the marker allelessegregating
in thepopulation,
andM2
can also be any of the marker alleles. Forexample,
suppose there are 3 marker alleles(A
l
,
A
2
,
and3
)
A
segregating
in thepopulation,
thenM
2
1
can be,
A
l
A
2
,
orA
3
,
andM
2
can also beA
1
,
A
2
,
orA
3
.
Further,
letvl
andv?
be the additive effects ofQ!
and
Q2,
and letw
=Var( vI) =
Var(v!)
be theirvariance,
for i =1, ... , n.
Observed markergenotypes
are denoted
by
Gobs.
COVARIANCE OF
MQTL
EFFECTS GIVEN COMPLETE MARKER DATAThe conditional covariances of additive effects of
MQTL
alleles will be derivedseparately
for alleles between individuals and for alleles within an individual.Covariance between individuals
Suppose
s and d areparents
ofi,
and j
is not a direct descendant ofi (fig 1).
The conditional covariance of the additive effects ofMQTL
allelesQk
i
andQ
in individuals i andj, given
the observed markergenotypes
(Gobs),
iswhere k
i
andk
j
can be 1 or2,
andPr(Q7
i
==
Q)&dquo; )
Gobs)
is the conditionalprobability
thatQ7
i
is identicalby
descent toQ/
given Gobs
(eg,
Fernando andGrossman,
1989).
Because individuals s and d are
parents
ofi,
Q7
i
can be identicalby
descent toQ
in 1 of 4 ways:1.
Q7
i
descended fromQ!
andQ;
was identicalby
descent toQ)! ,
denotedby
(Q7i {= Q;, Q; == Q!j)
2.
Q7
i
descended fromQ!
andQ;
was identicalby
descent toQ!j,
denotedby
(Q7i {= Q;, Q; == Q!j)
3.
Q7
i
descended fromQà
andQà
was identicalby
descent toQ!j,
denotedby
Fig
1. Chromosomefragments containing
the ML and theMQTL
for individuals s,d,
iand
j.
4.
Q!i
descended fromQ!
andQ§
was identicalby
descent toQ!j,
denotedby
!!..<-
r)2 r)2 —
r)!-’’)
Therefore,
theprobability
in[1]
can be written asBecause
individual j
is not a direct descendant of individuali,
and markergenotypes
of s and d areknown,
the conditionalsampling
ofQ7
i
from s or d isindependent
of allelesin j
being
identicalby
descent to alleles in s or d(fig 1),
given Gobs. Thus,
theprobability
in[1]
can becomputed
recursively
asEquation
[3]
was firstgiven
by
Wang
et al(1991).
It will be shown later that[3]
does not holdgenerally
when marker data areincomplete.
Generalizing
in(3!,
Pr(Q7
i
«Q;
P
IG
obs
)
is the conditionalprobability
that alleleQ/°
inoffspring
i descended from alleleQp
inparent
p = s or d fork
i
,
k
P
= 1 or 2.This conditional
probability
will be referred to as theprobability
of descent for afor
k
i
= 1 or 2 and p = s ord,
where p = r whenk
P
= 1 andp = 1 - r when
k
P
=2,
and where r is the recombination rate between the ML andMQTL.
Further,
Pr(Miki !
M;
P
¡G
obs
)
is the conditionalprobability
that marker alleleMk’
inoffspring
i descended from marker alleleM!’
inparent
p,given
thepedigree
and markergenotypes.
This conditionalprobability
will be referred to as theprobability
of descent for a marker allele(PDM).
There are 8 PDMs for eachindividual,
and theircomputations
areexplained
inAppendix
A. Note that the PDMs andPD(as
associated with the unknownparent(s)
are undefined.Equation
[4]
explicitly
shows therelationship
betweenPDQs
and PDMs in scalar notation. Forconvenience,
it is rewritten in matrix notation aswhere
Covariance within an individual
The conditional covariance between additive effects
vi
t and v?
of
MQTL
alleles(!
z andQ?
in
individual i withparents
s andd, given
Gobs,
can be written from[1]
aswhere
fi
=Pr(Ql
> Q/ )Gobs)
is the conditionalprobability
that 2homologous
alleles at the
MQTL
in individual i are identicalby
descent, given Gobs.
Thus, f
i
is the conditionalinbreeding
coefficient of individual i for theMQTL, given
Gobs.
This is different fromWright’s inbreeding coefficient,
which is the conditionalprobability
that 2homologous
alleles at any locus in individual i are identicalby descent, given
only
thepedigree.
The
pair
of 2homologous
alleles at theMQTL,
Ql
andQ?,
in individual i descended from 1 of thefollowing
parental pairs:
(Qs,
Qd),
(Q9,
Qd),
8
Q’)
ors
Q§) .
LetT,!skd
denote the event that thepair
of alleles in i descended from theBecause
(QI = Q2!Tks!d> Gobs)
implies
(QSS - Qdd !Gobs),
[10]
becomesThe
Pr(T
k
g
kdI
G
obs
)
can beexpressed
in terms ofPD(as
(see
Appendix
G!
asFor
example,
where
B
i
(l, k)
are elementsof B
i
in(5!.
If 1 of the denominators in!12!
is zero, then the entirecorresponding
term is set to zero.Tabular method to construct covariance matrix
G
v
The conditional covariance matrix
(G
v
)
between additive effects ofMQTL
alleles can bewritten,
from[1]
and!9!,
aswhere A is the matrix of conditional
probabilities
thatthe
2homologous
alleles
atMQTL
are identicalby descent, given Gobs.
The matrix A includes a row and column for each of the 2MQTL
alleles in each individual. Thus the order of A is2n,
where n is the number of individuals in thepedigree.
This matrix is the conditionalgametic
relationship
matrix(Smith
andAllaire,
1985),
given Gobs.
It follows that eachdiagonal
element of this matrix isunity.
The tabular method to construct A isexplained
below.Following
Henderson(1976),
individuals are ordered such thatparents
precede
their progeny, and individuals 1through
b are considered to be unrelated and non-inbred.Thus,
the upper left submatrix of A is anidentity
matrix of order2b,
which isexpanded sequentially by
the 2 rows and 2 columnscorresponding
to individuali,
for i =b + 1, ... , n,
as follows:Let 81
=2(i — 1)
+ 1 and6f
=2(i — 1)
+ 2 be the row indices of Acorresponding
to the 2MQTL
allelesQl
and
Q2
of
individual i. From!3!,
the elements of the 2rows 6/
and6i ,
corresponding
to the 2MQTL
alleles of individual i withparents
sfor j
=
61 -
1,
whereB
i
(L,
k)
were defined in!6!.
ElementÀ
p
61
=fi,
wheref
i
isgiven
in!11!.
Elements of columns6!
and
8;
are obtainedby
symmetry.
If 1parent
isunknown,
termsinvolving
the unknownparent
aredropped
from!14!.
Forconvenience,
the tabularalgorithm
described above can be written in matrix notation. LetA
i
-
i
be the upper left submatrix of Aexpanded
up to i - 1. For individuali,
withparents
s andd, A
i
_
1
isexpanded
toAi
aswhere
and
In
(17!, q’
is a 2 x2(i-1)
matrix with at most 8 non-zeroelements,
which are fromB
i
and are located in columns6s, 8;,
6d
and
6d.
The above tabular
algorithm
to construct A is similar to that used to construct the numeratorrelationship
matrix(Emik
andTerrill, 1949; Henderson,
1976).
Further,
Aplays
the same role inprediction
ofMQTL
effects as the numeratorrelationship matrix, A,
does inprediction
ofbreeding
values.ALGORITHM TO INVERT COVARIANCE MATRIX OF
MQTL
ALLELE EFFECTSTheory
Tier and S61kner
(personal
communication,
1994)
and van Arendonk et al(1994)
usedpartitioned
matrixtheory
todevelop
rules to invert the numeratorrelationship
matrixefficiently
forpopulations
with unusualrelationships.
A similarapproach
is used here to invert Aefficiently.
From
[13],
Gj!
=A -1 /a!.
Ingeneral,
the inverse ofA
i
,
partitioned
as in!15!,
can be obtained aswhere
Di
=Ci - q§Aj- i qi
is 2 x 2 matrix(Searle, 1982).
From!18!,
the contributionof individual i to
Ai
is
given by
the second term on theright-hand
side of thisBecause of the sparse structure of qi as shown in
(17!, qiA
i
_
l
qi
can be written asB
ics
,
d
B§,
whered
,
s
C
is the 4 x 4 conditionalgametic relationship
matrix forparents
ofi,
s andd,
the elements of which are in1
,
A
_
Z
andB
i
is the matrix ofPDQs
defined in(6!.
ThusIf
fi, f
s
andf
d
arenulle,
thenwhere
1
2
is a 2 x 2identity
matrix.The submatrix
qiDilq!
in[18]
is a square matrix of order2(i — 1)
that containsonly
16 non-zeroelements,
which aregiven
by
l
Bi.
i
Dz
B
The submatrixDi
l
qi
is a matrix of order 2 x2(i -
1)
that containsonly
8 non-zeroelements,
which aregiven
by
DilB!.
Thus,
there is a total of 36 non-zero elementscontributing
toAil
i from individual i. Forconvenience,
these 36 non-zero elements are collected into a 6 x 6 matrix:Because
W
i
contains all contributions tol
i
A
from
individuali,
we refer to it as the ’contribution matrix’. Theposition
of contribution elementW
i
(l,
k)
isgiven
by
elementIl
i
(1, k),
so we define thecorresponding
’position
matrix’ forW
i
aswhere 6b
=2(a-1)+b
for a = s,d,
or i and b = 1 or 2. If bothparents
of individuali are
known,
then all elements inIi
i
are defined. If at least 1parent
isunknown,
then elements inII
i
associated with the unknownparent(s)
are not defined.Because qi has at most 8 non-zero
elements,
and thepositions
of these elements aresimple
functions of s andd,
[18]
leads to an efficientalgorithm
to invertA,
where the number of arithmeticoperations
forinverting
isproportional
to2n,
the size of A. It isnoteworthy
that anysymmetric positive
definite matrix can be invertedusing
!18!.
Unless qi is sparse and thepositions
of the non-zero elements can be determinedeasily,
thisapproach
will not be efficient. Note that[19]
requires
C
s
,
d
,
which is fromA
i
_
1
.
Thus for an inbredpedigree,
C
s
,
d
needs first to becomputed,
similar to the situation whereinbreeding
coefficients need first to becomputed
when Henderson’s
rapid algorithm
(Henderson,
1976)
is used to invert a numeratorAlgorithm
1. Set
A-
1
equal
to the null matrix. 2. For individuali,
i =1, ... ,
n:(a)
if bothparents
areunknown,
then add Is toA
6i
l
bi
andA
a
i
162
(b)
if at least 1parent
isknown,
then:i)
compute B
i
according
to[5]
ii)
compute
D
i
according
to[19]
forinbreeding
or[20]
fornon-inbreeding
iii)
compute W
i
according
to[21]
iv)
for each ’defined’ element inII
i
,
add elementW
i
(l, k)
toA-’
at theposition given by
Hi (I, k)
NUMERICAL EXAMPLE WITH COMPLETE MARKER DATA Consider the
pedigree
of 5 individuals in table I. These 5 individuals are numberedsequentially
so thatparents
precede
theiroffspring,
and are assumed to be from apopulation
with marker allelefrequencies
ofp(A
d
=
0.7,
p(A
2
)
=0.1,
andp(A
3
)
= 0.2. Forconvenience,
we assumed thatJ
fl =
1.0 and r = 0.1. For thisexample,
genotype
A
Z
A
2
isassigned
to individual2,
so that marker data arecomplete.
Computing
PDMsThe PDMs are undefined for individuals 1 and
2,
because theirparents
are unknown. Individual 3 hasparents
1 and 2.Thus,
as shown inAppendix A,
the 8 PDMs for individual 3 can becomputed
asfor
k
3
, kp, p
= 1 or2,
where,
l
G
G
2
,
andG
3
represent
markergenotypes
of individuals1, 2,
and 3. Theright-hand
side of[23]
can becomputed
from MendelianFor individual
4,
thepaternal
parent
is unknown.Thus,
PDMs for individual 4 can becomputed
asfor
k
4
, k
2
= 1 or 2 whereG
u
=AiA!
is the ordered markergenotype
for the unknownpaternal
parent.
The upper limit of the summation is the number of marker allelessegregating
in thepopulation.
Theresulting
PDMs areThe first 2 columns in
S
4
are undefined because thepaternal
parent
is unknown. For individual5,
bothparents
are known.Thus, computation
of PDMs for individual 5 is similar to that for individual3,
and theresulting
PDMs areConstructing
AIndividuals 1 and 2 are unrelated and non-inbred
(table I),
thus the upper left submatrix of the conditionalgametic
relationship
matrix A is anidentity
matrix oforder 4. This submatrix will be
expanded
by
the tabular method for individuals3,
4,
and5,
as shown below.The matrix
B
3
ofPD(as
for individual 3 withparents
1 and 2 iscomputed
using
S
3
according
to[5]:
Now,
from[14],
elementsA5
,
j
and,
,
j
A6
for j
=l, ... , 4,
whichcorrespond
toindividual
3,
arecomputed
as linear functions of elements in the first 4 rows, whichcorrespond
to theparents
1 and 2:Diagonal
elementsA
5
,
5
and6
,
6
A
for individual 3 areunity.
Off-diagonal
element.!6,5,
which is defined as the conditionalinbreeding
coefficient in!10!,
is null because theof elements
A
5
,
j
for j =
1, ... ,
5 and!6,!
for j
=
l, ... ,
6 areThe
corresponding
column elements are obtainedby
symmetry.
ThePD(as
for individual 4 arecomputed using
!5!:
For individual
4,
numerical values of elementsA
7
,
j
for j
=
1, ... ,
7 and)..8,
j
forj = 1,...,8 are
The
PD(as
for individual 5 arecomputed
using
!5!:
To
compute
f
5
defined in!10!,
we need-
3
Pr(Q3
Q!4IGobs)
andPr(T
kak4
IG
obs
)
fork
3
, k
4
= 1 or 2.Probabilities,
Pr(Q3
3
-
Q!4IGobs),
havealready
beencomputed
asProbabilities,
Pr(r!![Go6s);
can be obtainedaccording
to[12]
asSimilarly,
Pr(Tl2!Gobs)
=41/100,
Pr(T2l!Gobs)
=41/100,
andPr(T
22I
G
obs
)
For individual
5,
numerical values of elements>’9,
j
for j =
1, ... ,
9 andA
l
o,j
forj = 1,...,10 are
The conditional
gametic relationship
matrix(A)
isInverting
ASet
A-
1
to the null matrix. For each of the 5individuals,
the contribution matrixW
i
andcorresponding
position
matrixH
i
arecomputed
as described below. The inverse of A is obtainedby
adding
elementsW
i
(l,
k)
toA-
1
atpositions
indicatedby
elementsIIi(l, k).
For the first 2
individuals,
theparents
are unknown.Thus,
add Is toAIL A2!,
A3!
andA4!.
For individual3,
PD(as
(B
3
)
can be obtained as shown earlier. Because individual 3 is notinbred, D
3
=I
2
-
B3B!,
from(20!.
MatrixW
3
is intable II and
II
3
is in table III.Similarly,
for individual4,
matricesW
4
and11
4
are in tables II and III. Note that 1parent
of individual 4 is unknown. Those elements inW
4
and11
4
associated with the unknownparent
are undefined.From the
previous section,
individual 5 is inbred( f
5
=0.045).
Thus,
[19]
is usedto obtain
D
5
=C5 - B5C3,4B!,
whereC
5
andC
3
,
4
werecomputed
in theprevious
section:
The
A-
1
matrix isCOVARIANCE OF
MQTL
EFFECTS GIVEN INCOMPLETE MARKER DATAAlgorithms
to construct and invert the conditionalgametic relationship
matrix(A),
given complete
markerdata,
are based on the recursiveequation
!3).
Inderiving
[3]
from!2!,
it wasassumed, given complete
markerdata,
that eventsQ7; {::::
Q§
andQs -
Q
i
kj
for example,
areindependent. They
may notalways
beindependent,
however,
when markergenotypes
of theparents
are unknown.Thus,
although
[2]
holds forcomplete
andincomplete
markerdata,
[3]
may not hold forincomplete
marker data.Therefore, algorithms developed
forcomplete
marker data cannot bedirectly applied,
ingeneral,
topedigrees
withincomplete
marker data. In thissection,
we first demonstrate that[3]
may not hold when markergenotypes
ofparents
are unknown. Astrategy
to accommodatepedigrees
withincomplete
marker data is thenpresented.
The
pedigree
in table I is used to demonstrate that[3]
may not hold when markergenotypes
of theparents
are unknown. In thispedigree,
markergenotype
of individual2,
the maternalparent
of individuals 3 and4,
is unkown.Thus,
as shownbelow,
Pr(Q4 -
Q2)
cannot becomputed using
!3!.
From
!2!,
.The last 2 terms in
[26]
are null because theQTL
alleles in the unknownparent
of individual 4 cannot be identicalby
descent toQTL
alleles in individual 3. Inderiving
[3]
from!2!,
it wasassumed, given Gobs,
thatQ4 ! Q’
andQz = Q2,
forexample,
areindependent,
ieBecause the marker
genotype
for the maternalparent
of individual 4 isunknown,
Given the
parents’
genotypes,
thegenotypes
ofoffspring
areindependent.
Therefore,
Pr(Qi « Q’, Q’
=!w3I!’robs)
can becomputed
by conditioning
on thegenotype
of individual 2(parent
of individuals 3 and4)
asThe
probabilities required
in the abovecomputation
areFrom the above
table,
Pr(Q4 ! Q)]G
obs
)
andPr(Q!
=Q3IG
obs
)
can also beThe values of
Pr(<! 4=
Q2!Gobs)
=1/24,
Pr(Q2
=Q3IGobs)
=1/2,
andPr
(Q’ 4 - <- Q2, Q2
=Q3!Gobs) =
3/400
illustrate thatPr(! 4= Q
,
Ql
-
l
Q2
i
G
ob,,) =
A
Pr(‘‘!4 ! Q
(
w2 =
‘
2l
Pr
obs)
G
Q
2
3
ob.,)
Because
[3]
may not hold when markergenotypes
ofparents
s and d areunknown,
the tabularalgorithm
forcomplete
marker data cannot beapplied directly
to constructA,
given incomplete
marker data. The tabularalgorithm
can beused,
however,
to construct Agiven incomplete
markerdata,
as described below..Let S2 be the set of all
possible
markergenotype
configurations
for individuals with unknowngenotypes,
and letGobs
be the observed markergenotypes
for individuals with knowngenotypes.
The conditionalgametic relationship
matrixgiven
incomplete
markerdata,
A
IGobs’
can then becomputed
aswhere
A
lw
,G
Ob8
is the conditionalgametic
relationship
matrixgiven
markergeno-types
w for individuals with unknowngenotypes
andGobs
for individuals with knowngenotypes,
andPr(w
I
G
obs
)
is the conditionalprobability
of individuals with unknowngenotypes
having
markergenotypes
w,given
markergenotypes
Gobs
for individuals with knowngenotypes.
The matrixA
lw
,
GOb8
can be constructedusing
the tabular methodgiven
complete
markerdata,
and theprobability
Pr!Go!)
can becomputed
aswhere
Pr(w, G
obs
)
can becomputed efficiently
(Elston
andStewart, 1971;
Bonney,
1984).
The
conditionalgametic relationship
matrix(A)
for thepedigree
in tableI,
computed
using
!27!,
isunknown
genotypes.
Further,
an efficientalgorithm
to invertAI
Gobs
has not been found.Therefore,
2approximate
methods tocompute
A¡
Gobs
and its inverse arepresented:
1)
We havealready
shown that[3]
may not hold forincomplete
marker databecause,
given
Gobs,
Q7i !
Q!
andQ9 =
Q
ki
in[2],
forexample,
may not beindependent.
If weignore
thisdependency,
then[15]
and(18!,
which are based on(3!,
can be used toapproximate
A and its inverse. Thisapproximation
willrequire
PDMs for individuals withincomplete
marker data. For individuali,
with unknown markergenotypes
forparents
s andd,
PDMs can becomputed
aswhere each summation is over all
possible
genotypes
at the ML. IfG
s
,
G
d
,
orG
i
is notmissing,
then thecorresponding
summation should bedropped
from!28!.
Thecomputation
ofPr(G,,Gd,GilG!b,)
can be verytime-consuming
when alarge
number of individuals have unknown markergenotypes.
Anapproximation
for
Pr(G
s
,
G
d
,
Gi ] Gobs)
can beobtained,
however,
by conditioning only
on markerinformation of ’close’ relatives of
i,
s andd, where,
forexample,
a set of ’close’ relatives for an individual could be itsparents,
sibs andoffspring.
The conditionalgametic relationship
matrix(A),
for thepedigree
in tableI, using
thisapproxima-tion is
The consequence of this
approximation
is that the summation in[27]
has beenbrought
into inside of A andperformed
onB
i
(or
S
i
,
see[5]).
2)
Let wmax be thegenotype
configuration
in S2 with thelargest probability.
Given wmax and
Gobs,
[15]
and[18]
can be used toapproximate
A and its inverse. Sheehan et al(1993)
proposed
asampling
scheme
tocompute
theprobability
ofgenotype
configurations.
For
thepedigree
in tableI, given
Gobs,
G
z
=A
i
A
2
has
the
largest
.conditionalprobability
(2/3)
among allpossible
genotypes
forG
2
, ie
w
max =
(Gz
=AiA
2
).
Thus,
[15]
can be used to construct A withG
Z
=A
i
A
2
.
TheThe consequence of this
approximation
is that theresulting
A is conditional on wmax ·A measure of how well an
approximation
compares to the exact method is the correlationscoefficient,
r’exact,a.pprox;between upperoff-diagonal
elements ofAI
Gob
s ,
computed
exactly
by
!27!,
andcorresponding
elementscomputed by
approximate
methods. For thepedigree
in tableI,
Texact,approxl = 0.9877 forapproximation
1 and ?’exact,approx2 = 0.8735 forapproximation
2.To further examine these
approximations
rexa!t,apProxi and rexa!c,appTOx2 werecomputed
for apedigree
of 99 individuals with 3generations.
The firstgeneration
consisted of 3grandsires,
each mated with 12granddams.
The secondgeneration
consisted of 2 sires and 10 dams from eachgrandsire
for a total of 6 sires and 30 dams. Each sire wasrandomly
mated with 4dams, avoiding
full-sib and halfsibmatings.
The thirdgeneration
consisted of 2grandsons
and 2granddaughters
from each sire for a total of 12grandsons
and 12granddaughters.
Markergenotypes
were assumedmissing
for the 30 maternalgranddams.
Thus covariances wereonly
computed
for theremaining
69 individuals in thepedigree.
Markergenotypes
for these 69 individuals weregenerated randomly. Granddaughters
and dams without progeny wereassigned
missing
markergenotypes
withprobability
0.6.Exact and
approximate
covariances werecomputed
for 20randomly generated
markergenotype
configurations.
The average for rexact,approxi was 0.8923 and for!’exact,approx2 was 0.8939. The effect of these
approximations
on marker-assistedgenetic
evaluation needs to be studied.DISCUSSION
Theory
andalgorithms
arepresented
here to construct the conditional covariance matrix between relatives for a markedquantitative
trait locus(G
v
=Au v 2)
andconstruct
A!Gobs
forincomplete
marker data may not be efficient forlarge pedigree.
Therefore,
we
presented
2 alternativestrategies
toapproximate
A!Gobs
and its inverse. Simulation results indicate that the 2approximations
are similar becausethey
have similar correlations with the exact method(
?’exact
,
a
pp
rox
i =
0.8923,
rexact,approx2 !0.8939).
Approximation
(1)
ispreferred, however,
because it may be difficult to search for wmax when alarge
number of individuals have unknown markergenotypes.
We also
presented
analgorithm
tocompute
the conditionalinbreeding
coefficient( fi)
for aQTL
given Gobs,
which is different fromWright’s
inbreeding
coefficient. This conditionalinbreeding
coefficient is theprobability
that the 2homologous
alleles at theMQTL
in an individual are identicalby
descentgiven
thepedigree
and marker
information,
whereasWright’s inbreeding
coefficient is the conditionalprobability
that the 2homologous
alleles at any locus in an individual are identicalby
descentgiven
only
thepedigree.
A numericalexample
is used to show thatequation
!3!,
which is the basis of tabular method to constructG
v
,
does not holdgenerally
when marker data areincomplete.
In most
practical situations,
marker information will not be available on distant ancestors.Thus,
TM-BLUP cannot becomputed.
One of the 2approximations
presented
in this paper,however,
can beemployed
tocompute
A
IG
o
bs’
Thus available marker information can be used to obtainimproved
genetic
evaluationsby approximate
TM-BLUP.Further,
ingeneral,
information on distant ancestors has littleimpact
ongenetic
evaluations.If the ML and
MQTL
are inlinkage disequilibrium,
marker dataprovide
information on the first moment ofMQTL
effects. In thissituation, regression
techniques
can be used forgenetic
evaluationusing
marker and trait information(Lande
andThompson,
1990;
Zhang
andSmith,
1992).
If the ML andMQTL
are inlinkage
equilibrium,
marker data do notprovide
information on the first moment of theMQTL
effects. Even withequilibrium, however,
marker data doprovide
information on covariances ofMQTL
effects. In thissituation,
TM-BLUP can be used forgenetic
evaluationby fitting MQTL
effects as random effects within animal(Fernando
andGrossman,
1989;
Cantet andSmith,
1991;
Goddard,
1992; Hoeschele,
1993).
Genetic evaluation
by
TM-BLUPrequires
knowledge
ofgenetic
parameters,
such as r ando, v 2.
This is also true forT-BLUP,
whichrequires
knowledge
ofgenetic
variances and covariances. Inpractice,
true values ofgenetic
parameters
are unknown and estimates are used in theirplaces.
Both restricted maximum likelihood and maximum likelihoodapproaches
can be used to estimateparameters
required
for TM-BLUP(Weller
andFernando,
1991).
Ideally,
marker-assisted selection will be based onmultiple
marker loci. When thelinkage
phase
betweenflanking
marker loci is known in addition to theparental
origin
of markeralleles,
the methodpresented
by
Goddard(1992)
formultiple
markers can be used for TM-BLUP. Further research is needed for TM-BLUP
using
multiple
markers when both thelinkage
phase
betweenflanking
marker loci and theACKNOWLEDGMENT
The authors would like to thank an anonymous reviewer for
pointing
out an error in themanuscript.
REFERENCES
Bonney
GE(1984)
On the statistical determination ofmajor
gene mechanisms in contin-uous human traits: regressive models. Am J Med Genet 18, 731-749Cantet
RJC,
Smith C(1991)
Reduced animal model for marker-assisted selectionusing
best linear unbiasedprediction.
Genet SelEvol 23,
221-233Chevalet
C,
GilloisM, Khang
JVT(1984)
Conditionalprobabilities
ofidentity
of genesat a locus linked to a marker. Genet Sel
Evol 16,
431-444Elston
RC,
Stewart J(1971)
Ageneral
model for thegenetic analysis
ofpedigree
data. Hum Hered 21, 523-542Emik
LO,
Terrill CE(1949)
Systematic procedures
forcalculating inbreeding
coefficients. J Hered40,
51-55Fernando
RL,
Grossman M(1989)
Marker-assisted selectionusing
best linear unbiasedprediction.
Genet SelEvol 21,
467-477Geldermann H
(1975)
Investigations
on inheritance ofquantitative
characters in animalsby
gene markers. I. Methods. TheorA
P
pI
Genet46,
319-330Goddard ME
(1992)
A mixed model foranalyses
of data onmultiple genetic
markers. TheorAppl
Genet 83, 878-886Henderson CR
(1973)
Sire evaluation andgenetic
trend. In: Anim Breed GenetSymp
in Honorof
Dr J LLush,
Am Soc Anim Sci and AmDairy
SciAssoc, Champaign, IL,
USA,
10-41Henderson CR
(1976)
Asimple
method forcomputing
the inverse of a numeratorrelationship
matrix used inprediction
ofbreeding
values. Biometrics32,
69-83 Hoeschele I(1993)
Elimination ofquantitative
trait lociequations
in an animal modelincorporating genetic
marker data. JDairy
Sci76,
1693-1713Kashi
Y,
Hallerman EM, Soller M(1990)
Marker-assisted selection of candidates sires for progenytesting
programs. Anim Prod51, 63-74
Lande
R, Thompson
R(1990)
Efficiency
of marker-assisted selection in theimprovement
of
quantitative
traits. Genetics124,
743-756Searle SR
(1982)
MatrixAlgebra Useful for
Statistics. JohnWiley
&Sons,
NewYork,
USA SheehanN,
Thomas A(1993)
On theirreducibility
of a Markov chain defined on a spaceof genotype
configurations by
asample
scheme. Biometrics49,
163-175Smith
C, Simpson
SP(1986)
The use ofgenetic polymorphisms
in livestockimprovement.
J Anim Breed Genet 103, 205-217
Smith
SP,
Allaire FR(1985)
Efficient selection rules to increase non-linear merit:applica-tion in mate selection. Genet Sel
Evol 17,
387-406Soller M
(1978)
The use of loci associated with quantitative traits indairy
cattleimprovement.
Anim Prod 27, 133-139Soller
M,
Beckmann JS(1982)
Restrictionfragment length polymorphisms
andgenetic
improvement.
In: 2nd WorldCongress
GenetAppl
LivestProd, Madrid,
EditorialGarsi,
Madrid, Spain,
vol 6, 396-404van Arendonk
JAM,
TierB, Kinghorn
BP(1994)
Use ofmultiple genetic
markers inprediction
ofbreeding
values. Genetics137,
319-329Weller
JI,
Fernando RL(1991)
Strategies
for theimprovement
of animalproduction using
marker-assisted selection. In: GeneMapping: Strategies, Techniques
andApplications
(LB
Schook,
HALewin,
DGMcLaren,
eds),
MarcelDekker,
NewYork, USA,
305-328Zhang W,
Smith C(1992)
Computer
simulation of marker-assisted selectionutilizing
linkage disequilibrium.
TheorAppl
Genet 83, 813-820APPENDIX A
Theory
forcomputation
of PDMsLet
G
s
=M; M;,
G
d
=
MIM2
andG
i
=Mi M2
be the markergenotypes
of 2parents
s and d and theiroffspring
i. GivenC
s
,
G
d
andG
i
,
theprobability
thatMik
i
descended fromMp!
does notdepend
on other information in thepedigree.
Thus,
Pr(Mf° «
MpP!Go6s)
=Pr(Miki !
M;
P
ICs, C
d
, Ci),
which can be obtainedas
The numerator and denominator of
[All
areeasily computed
from Mendelianprinciples.
Forexample,
if 2parents
and theiroffspring
each has markergenotype
A
l
A
2
,
ieG
s
=Ms Ms
=A
A
2
,
I
G
d
=MlM2 -
2
A
A
l
andG
i
=Mi M
2
=A
l
A
2
,
thenThus,
Pr(M1 ! M
;
IC
s
,
G
d
,
G
i
)
=1/2.
Other
examples
are listed below.Eight
PDMs for each individual i are collected into matrixS
i
,
which is defined in!7!.
APPENDIX B
Theory
forcomputation
of PDQs
Because
Qf°
andM ki
are on the same chromosome of individuali,
each must have descended from the sameparent.
Thus,
Pr M,&dquo; 4--
Mp&dquo;54,, Qk° ! QPP !Gobs)
is null.Now,
There are 2
probabilities
on theright-hand
side in!B2!.
The firstprobability
is a PDM for individual i
(see
[All
for itscomputation).
The secondprobability
can be
expressed
in terms of PDMs and of the recombination rate r between the ML and theMQTL
asexplained
below.Given
Mf°
«
M:!,
theprobability
thatQf°
descended fromQP
P
does notdepend
on other information in thepedigree.
Thus,
If
k
§
=kp,
then recombination has not takenplace,
so thatIf
k’ p 54
kp,
then recombination has takenplace,
so thatThe
PD(as,
Pr( Q7
i
{=
Q
k,
!Gobs),
forki,
k
P
=1, 2,
can be obtainedby using
theabove in
!B2!:
where p = s or d.
In summary,
for
k
i
= 1 or 2 and p = s ord,
where p = r whenkp
= 1 andp = 1- r when
kp
= 2. Note thatPDC!s
are nowexpressed
in terms of PDMs and r.APPENDIX C
Theory
forcomputation
ofPr(TkskdIGobs)
The event that the
pair
of alleles(<!,<3!)
in individual i descended fromparental
pair
(Qs
s
,
Qd
d
)
(fig 1),
is denotedby
kskd
T
fork
s
k
d
= 1 or 2. This event can occur in 1 of 2 ways:1.
Q¡
descended fromQ!s
andQ/
fromQd
d
,
denoted(Q! 4= Q!8, Q
7
.;=
Q!d)
2.Ql
descended fromQk
d
andQ2
fromQk
s
denoted(Q$ «
0!,Q! !=
Qs
s
)
Given the
pedigree
and markergenotypes,
theprobability
ofT
kskd’
which is denotedby
Pr(T!!!!Go!),
can be written asConsider the first
probability
on theright-hand
side in[Cl],
which can beexpressed
aswhere
Pr(Q2 !
Q!!G!)
is aPDQ
andPr(Q/ « Q!Q? !
Qa
d
, G
obs
)
can beexpressed
in terms ofPDQs
for individuali,
asexplained
below.Observe that event
Q/ «
s isimplied by
Q} {= Qss;
therefore,
Further,
Thus,
[C3]
can be rewritten in terms ofPD(as
asAfter
substituting
[C4]
in!C2!,
the firstprobability
on theright-hand
side in[Cl]
can be written in terms ofPD(as.
The sameapproach
isapplied
to the secondprobability
in!C1!.
Then,
obs
)
G
kskJ
Pr(T
can beexpressed
in terms ofPD(as
asIf 1 of the denominators in