HAL Id: hal-00893829
https://hal.archives-ouvertes.fr/hal-00893829
Submitted on 1 Jan 1990
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Genotypic covariance matrices and their inverses for
models allowing dominance and inbreeding
Sp Smith, A Mäki-Tanila
To cite this version:
Original
article
Genotypic
covariance matrices
and their
inverses for
models
allowing
dominance
and
inbreeding
SP Smith
A Mäki-Tanila*
Animal Genetics and Breeding Unit, University
of
NewEngland,
Armidade, NSW 2351, Australia(Received
11 June 1988; accepted 3 July1989)
Summary - Dominance models are parameterized under conditions of
inbreeding.
Theproperties of an infinitesimal dominance model are reconsidered. It is shown that mixed-model
methodology
is justifiable as normality assumptions can be met. Tabular methods for calculating genotypic covariances among inbred relatives are described. These methodsemploy 5 parameters required to accommodate additivity, dominance and
inbreeding.
Rules for calculating inverse genotypic covariance matrices are presented. These inverse matrices can be used directly to set up the mixed-model equations. The mixed-modelmethodology allowing
for dominance andinbreeding
provides a powerful framework tobetter explain and utilize the observed variation in quantitative traits.
dominance / inbreeding / infinitesimal models / inverse / mixed model / recursion
Résumé - Matrices de covariances génotypiques et leurs inverses dans les modèles incluant dominance et consanguinité. Les modèles de génétique quantitative incluant la dominance sont considérés dans des conditions de consanguinité.
Après
une discussion des propriétés du modèle infinitésimal, on montre que la méthodologie des modèles mixtespeut être appliquée à cette situation, dans la mesure où les hypothèses de normalité
peuvent être satisfaites. On décrit des méthodes tabulaires pour calculer les covariances
génotypiques parmi des apparentés consanguins, dont l’emploi nécessite l’introduction de 5 composantes de variances. On
présente
lesrègles
du calcul direct de l’inverse de ces matrices de covariancesgénotypiques,
connaissant la généalogie, et ces 5 composantes de la variance. Ces matrices inverse peuvent être utilisées directement pour établir les équationsdu modèle mixte. La méthodologie du modèle mixte, prenant en compte les interactions
de dominance et la
consanguinité, fournit
un cadre pour une meilleure explication et une meilleure utilisation de la variabilité des caractèresquantitatifs.
dominance / consanguinité / modèle infinitésimal / inverse / modèle mixte /
algorithme récursif
*
Permanent address: Department of Animal Breeding, Agricultural Research Centre (MTTK), 31600 Jokioinen, Finland.
**
INTRODUCTION
The mixed linear model has
enjoyed widespread
acceptance in animalbreeding.
Mostapplications
have been restricted to models whichdepict
additive gene action.However,
there is also concern with non-additive effects within and between breeds and crosses(eg,
Hill,
1969;
Kinghorn, 1987;
Miki-Tanila andKennedy,
1986).
Henderson
(1985)
provided
astatistical framework
formodelling
additive and non-additivegenetic
effects when there is noinbreeding.
Withinbreeding,
the mixed model allows statisticalanalysis, however,
considerabledevelopmental
workremains.
Inbreeding
complicates
covariance structures(Harris, 1964).
Moreover,
inbreeding
depression
is a manifestation of interactions like dominance andepistasis.
Models which include
only
additive effects and covariates forinbreeding
(eg,
Hudson and VanVleck,
1984)
arerough
approximations.
The proper treatment of
inbreeding
and dominance involves 6genetic
parameters(Gillois,
1964; Harris,
1964).
These parameters define the first and second momentsof
genotypic
values in the absence ofepistasis.
Agenetic analysis
ispossible by
repetitive
sampling
of lines derived from onepopulation
through
a fixedpedigree
(eg,
Chevalet andGillois,
1977).
However,
we should like toperform
ananalysis
where the
pedigrees
are realized with selectionand/or
randommating.
This could be done if an infinitesimal model was feasible and we couldapply
normaltheory
and the mixed model.Furthermore,
it would be useful to build covariance matricesand inverse structures
easily,
to enable use of Henderson’s(1973)
mixed-modelequations.
This paper shows how tojustify
andimplement
these activities. It isan extension of Smith’s
(1984)
attempt togeneralize
models with dominance andinbreeding.
DOMINANCE MODELS Finite loci
In this section we introduce the 6
genetic
parameters needed to modeladditivity,
dominance andinbreeding depression.
These parameters are functions of genefrequency
(p
i
for thei
th
allele)
in much the same way thatheritability depends
on genefrequency
forpurely
additive traits.First,
consider thegenotypic effect,
gij for 1 locusrepresented by
where p
is the mean, ai and aj are the additive effects for thei
th
andj
th
allele,
andd
ij
is thecorresponding
dominance deviation.Equation
(1)
represents a system ofr(r
+1)/2
equations
in r + 1 +r(r
+1)/2
unknows(ie,
!, ai, aj,d
ij
)
where r isthe number of alleles. To
uniquely
determine p, aiandd
ij
requires
additional r + 1 constraintsgiven
as:These constraints are derived from effectual definitions
applied
topopulations
inIt follows that in
populations undergoing
randommating,
the additive variance is:and the dominance variance is:
To accommodate
inbreeding
requires
3 additional parameters:(i)
thecomplete
inbreeding
depression:
(ii)
the dominance variance amonghomozygotes:
and
(iii)
the covariance between additive and dominance effects among homozy-gotes:It is convenient to work with the
parameter 6
2
=06 + 6
which
is a second moment.The
symbol
&dquo;&dquo;’&dquo;is a reminder that the associated
parameter refers
to 1 locus. When there are nloci,
then the
parameters ofinterest,
say v,are
thesingle
locusterms v
= (8fl, Qa, u6, 7.2
82
-2 -
&dquo;summed&dquo; over loci. All parametersterms
fol2,
fa2 a,
2, a2 - d, Ub, 2, U67 1 62
62
ora2 bi
aa6l
&dquo;summed&dquo; over loci.All parameters
in v =
{!a,
a-d,
U6
,
U
6
6
2
ora-6,
aa
6)
are formal sums. The column vector ofinbreeding
depression
(u
6
)
which is defined as a list ofE
6
for loci1, 2, ... , n,
isalso
very
useful.
Among
theparameters,
we have thedependancies
ua
=u u
6
andor 6 2 = 62 _ U2 6*
The parameters v describe a
hypothetical population
of infinite sizeundergoing
random
mating
andinlinkage
equilibrium.
Thispopulation
is sometimes referredto as the base
population,
but we find this usagemisleading.
In thespirit
ofBulmer
(1971),
let us introducesegregation
effects
defined asdeviations
frommid-parent
values. Infact,
both additive and dominance effects havemid-parents
values,
as will be seen later. Now we can define v as parameters that determine thestochastic
properties
ofsegregation
effects for an observedsample
of animals from aknown
pedigree.
Whether or not thesesegregation
effects arerepresentative
of someancestral
population
(perhaps
severalgenerations
old)
is,
of course,questionable.
Indeed,
ancestral effects associated with asample
of animals can be treated as fixed(Graser
etal,
1987)
and, hence, segregation
effects and estimates of v can be farremoved from the ancestral base. This
interpretation
is robust underselection,
with the addedassumption
thatlinkage
disequilibrium
in onegeneration
influences thenext
generation only
through
themid-parent
values. Ourassumption
needonly
beapproximately
correct over a fewgenerations
(perhaps
far removed from thebase).
It is
important
topoint
out that these views are definitional and no method ofThe
disruptive
forces ofgenetic
drift on our usage of v areprobably
ofnegligible
importance;
a smallpopulation
isjust
anotherrepetition
of a fixedpedigree sampled
from the base
population.
Infinite loci
It is feasible to define an infinitesimal model with dominance
(Fisher,
1918).
Whenthere is directional
dominance,
wemight
observe1
U6
I
going
toinfinity
orU2
and
a!d
d2going
to zero(Robertson
andHill,
1983).
However,
it is our belief that thisproblem
is characteristic of
particular
infinitesimalmodels,
not all infinitesimal models. Toshow
this,
we have constructed a counterexample.
Because
o, 2,a
2
,
a2,
aa6 and U6 are formal sums, it is necessary(but
notsufficient)
for the contributions from
single
loci to be of the ordern-
1
where n is the numberof
loci; ie,
if the limit of v is finite.Whereas,
itmight
seem reasonable torequire
location effects like
U
6
toapproach
0 at a rate ofn-
1
/
1
,
this is not necessary and it may result in infiniteinbreeding
depression.
Now let us
imagine
an infinite number ofloci,
each with 4possible
alleles,
thatcould be
sampled
withequal
likelihood.Assuming
that the dominance deviationsfor each locus are as
given
in TableI,
these deviations are consistent with constraints(2).
In thisexample
it ispossible
to use any additive effects also consistent with(2),
wherea2
is
proportional
to n-1. For aparticular locus,
theinbreeding
depression
and dominance variances are:U
6
=-1/(2n);
a’
=1/(4n
2
)
+3/(8n);
a2
=1/(4n
2
)
+1/(2n).
Summed over n loci thesebecome:
u6 =-1/2;
Qd
=
1/(4n)
+3/8;
a
6
=1/(4n)
+ 1/2.
Letting
n drift toinfinity
gives
thefollowing
non-trivial parameters: u6 =
-1/2;
a
§
=
3/8;
or2
=1/2.
Thisprovides
our counterexample.
There does not not seem to be ananalogous
example
involving
only
two alleles.However,
the biallelic situation isuninteresting
because itimplies
asingularity:
-2
= a2a2
smgu arlty:
Uab - uau8’ 6* >The above demonstration may seem artificial because it is
spoiled by
global
changes
in genefrequency
(WG
Hill, 1988,
personal
communication).
However,
we can construct other more elaborate counterexamples.
Forinstance,
let loci vary in their contribution to the parameters. Let there be infinite loci indexed1, 2, ... ,
n,where
Qd
,
!6 !
0,
and there is no directionaldominance;
ie,
u6 = 0.Among
where k
2
< n
<(k
+1)
2
.
By
redefining
the contributions fromsingle
loci toE
6
=-n-
l
/
2
+u
s
anda2 a
2
<
n-
1
/
2
(perhaps
by altering
gene
frequency),
we notice that ub =
-1,
andQd
and!6 are
non-zero at the limit when n goes toinfinity.
We can create yet anothersubsequence
with indices2, 5, ... ,
k2+ 1,
wherek
2
+1 <_
n <(k+ 1)
2
+ 1.
Taking
theprevious
alterations andassuming
for the lattersequence
E
6
=1/2n-
1
/
2
+ E
6
,
the limit valueof u
6
is-1/2.
It ispossible
to select afinite number of
subsequences
and make similar alterations. Each of these sequencesbecomes
infinitely long
as napproaches infinity. Hence,
infinitesimalmodels,
where0 <
J
U61
< 00,U2
q 0
0 andQa 5!
0,
are feasible.With an infinitesimal
model,
v is a function of summary statistics that involves genefrequency.
Individual genefrequencies
have little or no effect on v.Moreover,
genotypic
effects summed over loci follow a normal distribution. Thisimplies
that selection and
genetic
drift can be accommodatedby
the mixedmodel,
assuggested by
Bayesian
arguments
(eg,
Gianola andFernando,
1986).
Inparticular,
theassumption
about the influence oflinkage
disequilibrium,
discussedearlier,
is valid under the infinitesimal model.The real issue is not whether
[u
6
is
infinite or dominance variances are zero, butwhether
normality
andlinearity
areappropriate
assumptions
given
a finite numberof loci. If
[u
6
is estimated from realdata,
it will be found to beinfinite,
although
itmay be very
large.
Furthermore,
if dominance variances arefound
to be non-zero,and if many loci are
involved,
then it would seem that a contrived infinitesimalmodel
(like
the onesabove)
isappropriate.
Normalapproximations
areadequate
under most realistic models for
genetic
variation;
therebeing
a small number ofmajor
loci and alarge
number of minor loci(Robertson,
1967).
However,
with a very small number ofloci,
theseapproximations
become lessadequate
with eachadditional
generation
of selection.GENOTYPIC COVARIANCE STRUCTURES
Harris
(1964)
developed
recursion formulae forevaluating
theidentity
coefficients needed to determine covariances among inbred relatives. In a later paper,Cocker-ham
(1971)
elaborated on these methods.Using zygotic
networks,
Gillois(1964)
also devised a scheme to evaluate
identity coefficients,
and Nadot andVaysseix
(1973)
published
analgorithm
forimplementing
Gillois’sprocedure.
In this paper, tabular methods for
evaluating
second moments arepresented.
These
techniques
allow the exact evaluation ofgenotypic
covariances withoutcal-culating
individualidentity
coefficients. The first class of methods areconceptually
easy and are modelled after thegenomic
table describedby
Smith and Allaire(1985).
The second class(those
based oncompression)
areconceptually
moreMethods based on
gametes
Each animal in a
pedigree
receives 1genomic
half or gamete from each of its parents.Thus,
every animal has 2genomic
halves and the total number of such halves isr =
2s,
where s is the number of animals.Let ai be a column vector of additive
effects,
such that theIt
h
element of aiequals
the additive contribution of thel
th
locus in thei
th
gamete. If there are nloci,
then aihaslength
n. Under an infinitesimalmodel,
aiisinfinitely long.
Defined
ij
as a vector of dominancedeviations, typical
of the union ofgametes
i andj.
Theit
h
element ofd
ij
equals
the dominance contribution of theE!h
locus. If i andj
aregenomic
halves from differentanimals,
the vectord
ij
depicts
the dominance deviations for aphantom
animal.Like
animals,
gametes have apedigree; genomic
halves in one animal form aparental pair
forproducing
gametes. Let us assume the gametes are ordered suchthat i >
j,
if gamete i is a descendant of gametej.
Furthermore,
let us assumei
> j
implies
thatgamete j
is a basepopulation
gamete
if i is.Next,
imagine
the ordered sequence:where I is an
identity
matrix of order n. This is a verylong
listcomprising
of(r+1)(r+2)/2
arrays.Fortunately,
we needonly
select a much smallersubsequence,
G =
{I,
gl, g2,...,gp}
from thislist;
ie,
the arrays that areactually
needed forrecursive calculations. An
algorithm
forextracting
G ispresented
inAppendix
A. The elements of G are used as row and columnheadings
in a tabledepicting
thesecond moments
E{G’G}
which isrepresented
by:
This table is referred to as the extended
genomic
table(cf
Smith andAllaire,
1985),
and is denotedby
E.Elements of E are
computed by
recursion.Starting
with the first row, elementsare evaluated from left to
right.
When the first row iscompleted,
the first column isfilled in
using
symmetry. Theremaining
elements in the second row are evaluatedfrom left to
right
and the second column is then filled inusing
symmetry.
Thisprocess is continued for each additional row and column. The recursions used to
compute E are listed
below,
where B is defined as the index set of all basegametes,
i >
j, k,
m, k > m, and parent gametes of i are x and y. Theproofs
of these formulaeare due to
properties
involving
sums ofexpectations
and conditionalexpectations.
where i - x or y represents the event that the
t
th
locus ofgamete
i is identicalby
descent to that x or y,respectively.
Theproduct
gig
w
is intended to involve thegametes
i, j, k
and m(v
and w are used toidentify
the associated columns ofG).
(i)
First n rows:(ii)
Subsequent
rows:(a)
Additive and additive(c)
Dominance and dominancethe recursive formulae in
(c)
appears in Smith(1984).
When
ie0,
the above recursions are initializedassuming
thatgametes
aresampled
at random from asingle population.
For this case, we have additionalsimplification
for all values of i:Now that the recursive structure of E has been
shown,
it ispossible
to describe theal
f
orithm
ofAppendix
A. Definef (v)
as the youngestgamete
associated withthe vtcolumn of
G,
say g&dquo;. The matrix or list G is said to be closed undergametic
recursion if the terms used toexpand
any gvby
parent gametes off (v)
are alsoof G. More
formally,
(g
v
I f (v) =
x)
and(g
v
I f (v)
=y)
are columns of G whenf(v);(),
and has parentgametes
x and y. Thealgorithm
inAppendix
A is calleda
depth-first
search and itproduces
sets of vectors closed under recursion.Any
element needed to evaluate any recursion canalways
befound
in E. Thealgorithm
of
Appendix
A can also be used to define thesubsequence
G
introduced below.It is
possible
to combine additive and dominance effects intogenotypic effects,
say
i
ij
= ai + aj+
d
ij
,
and use these as row andcolumn
headings
of a new table.The
headings
are ordered as somesubsequence,
sayG,
ofThe recursions for
E{G’G}
areexactly
asthey
are forEfd
ij
}
andEld
’
3d
k
,,,},
After
building
a matrix of second moments, the(co)variance
matrix(for
genetic
effects summed over
loci)
is obtainedby
absorbing
the first n rows. Theresulting
array is a function
of u
6
only through
u2
=u6 U{j.
Thevw
th
element of the absorbedarray E is:
which reflects the
assumption
that genotypes are additive overindependently
segregating
loci. Thisassumption
can berelaxed,
aslinkage disequilibrium
cansometimes be accommodated via conditional
(ie, Bayesian)
analyses.
In
practice,
we never evaluate the entire array E orE{G’G}.
Inparticular,
thefirst n rows and columns can be
represented
implicitly
by
one row and column: rows of:are
simple multiple
of each other. Our purpose is to show structuralproperties
that allow inversion rules.Nevertheless,
the above recursions arehelpful
inevaluating
particular
moments; eg, those needed to compute the inverse. This can beaccom-plished by
adapting
Tier’s(1990)
recursivepedigree algorithm:
one calculatesonly
needed moments and avoids redundant calculation. We may add to our
recursions,
shortcuts for
particular
degenerate
cases:These remarkable results do not
depend
on i >j, k,
m ork >
m.They
aredue
to theprinciple
of conditionalindependence
and to the rule thatprobabilities
areadditive for
mutually
exclusive events. The first ruleappeared
in Maki-Tanila andKennedy
(1986).
It is similar to a rule in Crow and Kimura(1970,
p134)
basedon additive
relationship,
although
rule(i)
is more robust underinbreeding.
We alsowhere 77
=QabQa
2r
’
Y=
82U;
;
2,
a=o-!o-!!
and
p=ubQ! 2.
Because
E,
excluding
the first n rows andcolumns,
is at most of the orderr
2
/2
by
r
2
/2,
where r is the number of gametes, onemight incorrectly
conclude thatproposed
calculations areprohibitive
(of
the orderr
4
/8)
and of nopractical
value.Recursive
algorithms,
like thedepth-first
search inAppendix A,
can besurprisingly
fast. The value
of r
4
/8
should beregarded
as an upperboundary
that protects thealgorithm
from combinatorialexplosion -
the kind ofexplosion
thatmight
occurwhen
enumerating genetic pathways
in apathological pedigree.
Compressed
tablesThe
genomic
tablegiven by
Smith and Allaire(1985)
can becompressed.
We mayadd
together
elements in 2by
2 blockscorresponding
toanimals,
andmultiply
this matrixby
1/2,
togive
the numeratorrelationship
matrix.Compression
of E
isalso
feasible,
and hasalready
been demonstrated above for a caseinvolving
G. Ingeneral,
E iscompressed by
combining
columns of G to create a new matrix C. Tobe
useful,
C should be smaller than G and containpertinent
effects.It is
possible
to devise recursive methods forevaluating
E{C’C},
when Cis not closed under recursion.
However,
methods become more meticulous. Forexample,
since the vector of additive merits for animals is not closed undergametic
recursion,
we need to addinbreeding
coefficients to thediagonals
whencalculating
the numeratorrelationship
matrix.Whereas,
whencompression
is defined as the addition of all Gcolumns,
it ispossible
todo
thisstochastically,
as Harris(1964)
has done. Forexample, Harris,
by
preferring
azygotic analysis
over agametic analysis,
devised a scheme where entities were createdby
a randomsampling
of genes fromexisting
genotypes.Compression
is animportant
area and it needs to bedeveloped
further. Someconcepts will be illustrated later
by
anexample.
INVERSE STRUCTURES General rule
Matrix E contains second moments and not
(co)variances
asrequired by
themixed model
equations. However,
deleting
the first n rows and columns of E-1gives precisely
an inverse matrix of(co)variances.
The extendedgenomic
tableis characterized
by
blocksalong
thediagonal.
By inspecting
labels attached to vectors inG,
it is seen thatthey
come in groups. Forexample,
the group associatedwith gamete i is a
subsequence
of ai,,
d
li
d2it ...
d
ii
.
Likewise,
whenconsidering
E =
E{G’G}
we find blocksalong
thediagonal
associated with gametes. Recursionsabove the
diagonal
blocks are functions of column indices and not of row indices.Now consider a submatrix
A
k
wherefor some L and
A
k
contains the first k + 1 blocks. The matrix L is asimple
matrix definedby
column indices. If k = 1 note that:for some
L
o
,
whereA
o
corresponds
to the baseassignments,
andB
o
is the second block.Let us assume that
Ao
is given
(perhaps
without the first n rows andcolumns)
and note that:
-
-With
(B
o
—
LoA
o
L
o
)-
1
evaluated,
we find thatAll
is
asimple
function ofAÕ
1
.
Given
Ao
1
,
it ispossible
to computeA2
1
,
whereand
B
1
is the third block. Ingeneral,
given
A-’
we
can evaluateA-1
1
,
whereand
B
k
is block k + 2. Thegeneral
inversion formula is:To evaluate
E-
1
,
apply
this rulerecursively starting
with k = 0.It is
hoped
thatB
k
-
k
L
k
L[A
will besufficiently
small orsufficiently
sparse
sothat its inversion is feasible
(eg,
Tier andSmith,
1989).
Forevaluating
E-
1
,
theworst scenario is that the order of
B
k
-
LkA!L,!
is r +1.However,
this occurrenceis
unlikely.
Note that Henderson’s(1975)
rule forcalculating
the inverse numeratorThere are some notable
simplifications
when E-1 is to be evaluated.First,
evaluation ofAo
is
best doneby
absorbing
the first n rows and thendeleting
the first n rows and columns. The
resulting
matrix is somepermutation
of a blockdiagonal
matrixinvolving
2by
2 matrices:and 1
by
1 matricesa§ and
ad.
This is a trivial matrix to invert.Second,
k
-
B
k
L
k
A
k
L
has apeculiar
structure that can be identifiedby
examining
the recursive definition in Section IIIA. If blockB
k
corresponds
togamete i which has parent gametes x and y, then
L
k
is a matrix that&dquo;picks&dquo;
appropriate
terms fromA
k
that involve x and y.Moreover, B
k
is also definedby
terms that involve x and y. Assume that the columnheadings
for Bk are:It
might
be that i =j
m
.
Now define the columnheadings
where
x
=
H
(Fi
I
i =x) =
{a!, d
d
x
l’
j
Xj
2’
....
d!,,.}
}
andHy = (Fi
I
-y) _
{ay,
dy
Ù’
d
Y
j
2
7 ...
dyj&dquo;,
I
In the definition of
H
x
andH!,
it is understood thatd
Xj
&dquo;,
=d!!
andd!j&dquo;,
=dyy,
if i =
j&dquo;,.
Select elements fromA
k
and build thematrix,
where
M!!
=E{HxH!}, M
x
y
=M!!
=E{H
x
Hy},
andMyy
=EIH’Hy}.
A direct
application
of the recursionsgives:
Futhermore,
asL
k
is a matrix that&dquo;picks&dquo;
terms underheadings
H!
andH!:
and thus:
Equation
(5)
can also be derivedif B,!-L!A,!Lk
isrecognized
as the(co)variance
matrix for the
segregation
effects due to recombination ofgametes x
and y in theformation of gamete i. The
mid-parent
values ofF
i
are the column vectors of1/2H!
+1/2Hy.
As thesegregation effects,
S =F! - 1/2H! - 1/2H!,
have a meanof zero, the
(co)variance
matrix is:where S =
1/2H! -
Finally,
in rule(3) L
k
(B
k
-
L!A,!Lk)-1L!,
k
-
k
(B
-L
L’A
k
L
k
)-
l
and(B! - Lj!A!L!) !
are contributions added under Hby
Hheadings,
Hby F
i
head-ings,
andF
i
by
F
i
headings,
respectively.
Existence of inversesWhen there is no
inbreeding,
E-1 can be shown to exist.First,
we present thefollowing
Lemma:Lemma 1: In the absence of
inbreeding,
there
exists a matrixM
k
,
which is asubmatrix of
A
k
,
and there exists a matrixX
k
of full column rank such that:where
B!,,L,!
andA
k
are associated with E.Proof.
Becauseequ(5)
isgiven,
weonly
prove that suchM!
andX
k
can be foundwhere XkM
k
X
k
=1/4(M!! -
M!! - M!!
+Myy).
The matrixM
k
,
definedby
eqn(4),
will be ’a submatrix ofA
k
if there are no indicesj
m
= i,
j
v
= xand j
w
= yused in the
definition
ofF
i
.
Thealgorithm presented
inAppendix
A will not createindices
j&dquo;,,
= i, j
v
= xand j
w
=y when there is no
inbreeding.
For this case, we can takeM
k
=M
k
,
and ’
=1/2{I, -I},
where theidentity
matrix I has orderm + 1. Matrix
X
k
has full column rank((a.E.D).
Theorem 1: If E is constructed
by applying
the recursion rules to some finite and non-inbredpedigree,
then E-1exists,
provided:
Proof.
The matrixA
o
isnon-singular
when the condition of the theorem holds. Now assume thatA-’
exists,
thenby
the inversion rule(3)
A!+1
exists if(B
k
-
L’A
k
L
k
)-’
exists.By
the aboveLemma,
B
k
-
LkA
k
L
k
=3CkM
k
Xk-BecauseM
k
is a submatrix ofA
k
,
it isnon-singular. Therefore,
(X!M!X!)-1
exists becauseX
k
has full column rank. We conclude that the existence ofAk
1implies
the existenceof A!+1.
AsAÕ1
exists,
the theorem followsby
mathematical induction(Q.E.D).
The reader
might
think that the concurrence of identical twins would contradictTheorem 1.
However,
this is not the case, as thetheory
assumes that gametes aredistinct and can be ordered
using
indices.Thus,
for identicaltwins,
the recursiveformulae
presented
earlier areincomplete.
This is not apractical
problem,
asidentical
gametes
can berepresented only
once in G.Henderson
(1985)
considered
a non-inbredpopulation
and studied a dominancerelationship
matrix D. He used D-1in many formulae withoutproof
of its existence.However,
as D is a submatrix ofE,
the theoremimplies
that D-1 exists.When there is
inbreeding,
thealgorithm
inAppendix
A willproduce
labels likedii,
di.
anddiy,
wherei!9
and has parent gametes x and y. Ingeneral,
E issingular
Even with
inbreeding,
4 labels like those ineqn(6)
need not occurtogether
asheadings
of E.However,
it ispossible
to merge theidentity
(6)
with the recursionand remove
d
ii
forI(0
from the sequence G to getG -
seeAppendix
A. The vectorG
is closed under recursion ifeqn(6)
is used to redefined
ii
fori18.
The matrix of second momentsE
=Eli7x4U}
has no row and columnheading
d
ii
forI(0
and it isnon-singular.
To provethis,
we introduce Lemma 2 forE.
First, let
usimagine
that
B
k
-
LkAkL!
corresponds
to aparticular
absorbed block ofE.
Lemma 2: There exists a matrix
M
k
,
which is a submatrixof A
k
,
and there existsa matrix
X!
of full-columnrank,
such thatB
k
-
k
k
L
L’A
=-
-
4
whereB
k
,
L
k
andA
k
are associated withE.
Proof.-
Because thepedigree
isinbred,
we expectindices j,
= x andj
w
= y(x,
y parents ofi)
in the definition ofF
i
- otherwise,
follow theproof
of Lemma 1.By
construction,
there is no indexj
m
= i. Thisimplies
that there will be vectorsd
x
y,
d
xx
=dxx, +dxx,,-dx!x&dquo;
anddyy
= d!yx+d!yy-dyzyv
inH,
where x
x
,
xy andy x
, yy are parent gametes of x and y,
respectively.
With no loss ingenerality,
switchthe vectors aiwith
d
ix
,
andd
i
y
withdi!&dquo;,
inF
i
.
So as to maintainconsistency
with the definition H ={H
x
=(Fi I
i =x),
Hy = (F
i
i
=y)},
columns of H may be alteredby
switching:
These
permutations
preserve theidentity
where
X
k
=1/2{I, -I}
andM
k
=E{H’H}
if we furtherstipulate
that selectedcolumns of
G
have also beenrearranged.
Note that columns and rows m + 1 andm+2 2 in
M
k
areredundant,
asthey
are bothrepresented
by
d
x
y
=dy
x
.
Therefore,
we may delete row and column m + 2 to create
M
k
(delete
column m + 2 inH)
and note that
where
and I has order m — 2.
Clearly,
i
k
is offull-column
rank and is a candidate forX
k
.
When x and y are base
gametes
(ie,
, xxx!, Yxand yy
areunknown),
we are finished as we can takeM!
=R
k
,
which is a submatrix ofA
k
.
Unfortunately,
when atleast one of the
zygotes
(x
x
,xy)
and,yy)
Yx
(
areknown,
M
k
is not a submatrix ofA
k
because of thecomposite
vectorsd!x
and/or dy
.
.
We assume that both(x
x
,
yy)
and
(y
x
, y
x
)
are known in the remainder of theproof
(when
only
1zygote
isknown,
It
might
be thatd
xx
&dquo;
andd!!&dquo;
existalready
between columns 2 and m in theredefined H.
Likewise,
d
.
y
.
andd!!&dquo;
may existbeyond
position
m. If any of thelabels xx!, xxy7yyx and yyy do not
exist,
we introduce them as dominance vectorsin H.
Further,
we may redefine the first and last columns to represent labels zzzyand yxyy,
respectively.
Thisgives
us a modified matrixBecause
G
isc_losed
underrecursion,
all columns ofH
are inG.
Moreover,
all columns of
H
areunique;
XxX
y =1=
yxy!, because animals cannot mate withthemselves. We conclude that the matrix
is a submatrix of
A
k
.
Now define the matrix/.. . I ... r. B.
where m and f are null vectors, except for ones at
positions
corresponding
to zzz
, Xxyl yyx and yyy; M and F are like
identity matrices,
except that rowscorresponding
to zzz, xxy, YYx and yyy would be deleted if thecorresponding
columns were introduced into H. Now
X
k
has full-column rank. Sincethe Lemma is
proved
(Q.E.D).
Theorem 2: If
E
is constructedby applying
recursion modifiedby
equ(6)
to somefinite
pedigree,
thenE
exists,
provided
Proof.
With the first n rowsabsorbed, A
o
isnon-singular
when the condition of the theorem holds.Therefore,
A
o
isnon-singular
when the first n rows are intact.The remainder of the
proof
is identical to that of Theorem1,
except that referenceis made to Lemma
2,
rather than Lemma 1(Q.E.D).
We should expect
singularities
wheninbreeding
coefficientsapproach
unity,
asoccurs with the numerator
relationship
matrix.Indeed,
the matrixS
=1/2H
x
-1/2Hy
becomes a null matrix when gametes xand y are identically
equal.
AsE{S’S}
equals
eqn(5),
the absorbed block forgamete
i is a matrix of zeros andE
is
singular.
Our construction ofE
assumes that the basepopulation
is a random set of unrelated gametes which have beensampled
from some infinitepopulation;
that is even
though
the basepopulation
isfinite,
inbreeding
in the base does notexist. Under
diploidy, inbreeding
coefficients cannot beunity
with finitepedigrees.
Feasibility
It is
beyond
the scope of the present paper to demonstrate the calculation of E-1pedigree
borrowed from a beef cattle selectionexperiment
conducted at theAgri-cultural Research
Centre,
Trangie,
NSW(PF
Parnell(1988),
personal
communica-tion).
Thepedigree
involved 1 122 animals with records and 625 ancestors withoutrecords. The average number of
generations
on the female side was 9 and the maxi-mum was 16. On the maleside,
the average was 8generations,
with a maximum of13. There were
only
55 base gametes, and the averageinbreeding
of thepopulation
was 0.1
(the
maximum was0.26).
The order of E and
E
was about 190 000. MatrixA
o
was of the order 1595.However,
excluding
A
o
,
the maximum block sizes were 321 and323,
respectively.
The distribution of block sizes is
presented
in Table II. Most of the blocks were ofthe order 2 and 3. Given the absorbed
blocks,
andignoring possible singularities,
matrices E-1 and
E
can
be evaluated. Thecomputations
are nottrivial,
but arefeasible.
There are other
approaches
tomodelling
dominance that do not involve E-1or
E-
1
.
Forexample,
wemight
extract from Eonly
those elements that involve animals or animals with records.If the
extracted matrix is put into a matrixD,
then we could attempt to evaluate
D-
1
using
sparse matrixabsorption
(Tier
andSmith,
1989).
ThenU-
1
could be used in the mixed-modelequations.
Alternatively,
D
could be useddirectly
in the same way that Henderson(1985)
used D. BecauseE is an
enlarged
matrix with unrealized combinations of gametes(the
so-calledphantom
animals),
alternatives toE-
1
are attractive.However,
breeding
schemes that utilize dominance variation are bound torequire
theprediction
of dominance effects thatcorrespond
to untriedgamete pairs
(say,
among gametes of parents that contribute genes to the nextgeneration).
Further research is needed to evaluate allILLUSTRATION
In this section we demonstrate our
methods, using
thesimple
pedigree
displayed
in
Fig.
1. Thispedigree
involves 4 animals or 8genomic
halves. Forsimplicity,
we assumeor2
=or =
1;
U2
=a
2
= 0. Given that gametespairs 12, 35,
64 and 78 represent dominance vectors for animals and are to be included inG,
thealgorithm
of
Appendix
A creates 14 additionalpairs -
ignoring
first array I and additive vec-tors. The matrix G isgiven
as row and column labels of Table III. Second momentsof Table III are derived
by applying
the recursion of formulae. Forexample,
theelement
Eld’2ld5il
was calculated as1/2EId’
l
d
ll
}
+1/2EId’
l
d
2ll
= 0 + 1/2
= 1/2.
To evaluate E-1
requires
the determination of(A
2
-
LiBiL!)-1
for i =0,1, 2, 3.
The absorbed blocks can be evaluated
recursively,
but for now, the reader mayobtain these
by applying
Gaussian eliminationdirectly
to Table III. The inverse blocks are:The matrices
L!
for
i =0,1,2,3,
aredisplayed
below thediagonal
in Table IV. In accordance with formulae(3),
the elements ofE-
1
aregiven
above thediagonal.
We can now compress E
by defining
d
b
=d
u
+d
22
(with
labels,
this is notatedas b: 11 +
22),
d, =d
31
+,
32
d
d
d
= d41 +d
42
,
d
f =
dsi
+d
52
,
di
=d
74
+d
76
.
Thesedefinitions,
as well as othersymbolic definitions,
aregiven
as row labels forthe
compressed
matrix of Table V. As anexercise,
thecompressed
table can beevaluated
by inspecting
Table III andadding
together
appropriate
elements. ForThe real
challenge
is to findsimplifying
recursions that allow evaluation ofcom-pressed
tables. In order toapply
inversion formulae(3),
recursions to theright
of blocksalong
thediagonal
should be maintainedexclusively
as a function of columnindex,
as in the presentexample.
These recursions aregiven implicitly
below thediagonal
in Table VI as matricesL
i
for i =0,1, 2, 3.
Forexample,
the elementE(d )di )can
also be obtainedby adding
The blocks
(A
i
-
LiB
i
L
il
-
1
for i =0,1, 2,
are:Inverse elements of the
compressed
matrix aregiven
above thediagonal
inTable VI.
DISCUSSION
Although
we have notemphasized
the evaluation of variousidentity
coefficients,
gametic
recursion of such coefficients is feasible.Moreover, gametic recursions,
likethose used to define
E,
are very easy to understand and mayprovide
anapproach
useful in
teaching,
whilst there may be morecomplicated
(but
perhaps numerically
moreefficient)
alternatives. Butgametic
recursion need not benumerically
ineffi-cient and iscertainly
notsubject
to combinatorialexplosion. Depth-first searches,
like the
algorithm
ofAppendix A,
can be used toimplement
gametic
recursion in anefficient way. For
example,
to evaluateinbreeding
coefficients viagametic
recursion,
one would first conceive of a
symmetric
table withr
2
/2
elements.However,
r
2
/2
operations
are notrequired
to evaluateinbreeding
coeflicients viagametic
recur-sion. All that is needed is to
identify
labelsi j
of the dominance vectorsd
ij
in G.By moving
down a list of suchlabels,
it ispossible
to evaluate the coefficientsusing
one work vector. We need
only modify
this scenario forgeneral identity
coefficients. Thesimplest explanation
of the twocommonly
found andcomplementary
phe-nomena
(heterosis
andinbreeding
depression)
is that there is dominance of allelesat many loci.
Therefore,
dominance should beregarded
as an essential feature ofgenetic
models for lociaffecting
quantitative
traits. Characteristicsclosely
con-nected with
fitness,
such asreproduction,
wouldmostly
haveonly
non-additive variationleft,
as the additive has been exhausted(Robertson,
1955).
There are alsosuggestions
that dominance of alleles that maintain normal enzymeactivity,
is auniversal biochemical property
(Kacser
andBurns,
1981).
It is clear that
additivity,
dominance andinbreeding
can be modelledby
applying
mixed-modelmethodology.
The ramifications of such adevelopment
are(i)
Determination ofoptimum
anddynamic mating
structures whichcapitalize
on additive and dominance variance while
providing
forinbreeding.
Some of thesebreeding strategies
can be studiedby
the use ofmoment-generating
matrices forregular mating
systems. Cockerham( 1971)
hasgiven
anexample
of such a transition matrix for full-sibmating.
Possibleapplication
areas are:(a)
mate selection(Jansen
andWilton,
1985;
Smith andAllaire,
1985);
(b)
group selection(Jansen,
1985;
Smith andHammond,
1987)
when theselection of a random
mating
genepool
is createdby
a finite number of parents.The
objective
of group selection is toimprove
both additive merit and the averagespecific
combining ability
of genes in thepool;
(c)
crossbreeding plans
to utilize between breed additive and heterotic effects(Kinghorn,
1987);
(d)
selection of clones and animals createdby
futuristictechniques.
(ii)
To allow dominance variance to bepartitioned,
andthus,
remove some ofthe
confounding
that would otherwise corrupt statisticalanalyses.
(iii)
Toimprove
ourunderstanding
of traits which arelikely
to show considerableamounts of non-additive
genetic
variation. Thisunderstanding
can be advancedby
simulation
studies,
eg,concerning
selection bias and finite loci. Other studies mayinvolve simulation of infinitesimal models with dominance via
sampling
from normaldistributions.
The
theory
wepresent
here is still veryunderdeveloped.
New methods areneeded for the estimation of
genetic
parameters, as pastapproaches
have metwith very limited success
(eg,
Gallais,
1977).
It is our belief that an extension ofthe derivative-free
algorithm
of Graser et al.(1987)
ispossible,
andconsequently,
genetic
parameters could be estimatedby
restricted maximum likelihood. Researchis needed in the area of
computing strategies
(eg,
compression,
inversion).
Further,
models which allow different parameters for different
populations
would be useful incrossbreeding
studies. A multivariate extension of our work would also be desirable.The
forgotten
papers of Harris(1964)
and Gillois(1964)
have been resurrected. While the methods arearguably
complicated, they
are,however,
feasible.Whereas,
classical
quantitative
genetics
is unable tofully
utilize information on dominanceand
inbreeding
inpredictions,
theappropriate
mixed model under a wide range ofassumptions,
including
selection and environmentalnoise,
does that. ACKNOWLEDGEMENTSThe authors thank Keith Hammond for his encouragement. This work was