HAL Id: hal-00893816
https://hal.archives-ouvertes.fr/hal-00893816
Submitted on 1 Jan 1989
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Use of sparse matrix absorption in animal breeding
B. Tier, S.P. Smith
To cite this version:
Original
article
Use of
sparse matrix
absorption
in
animal
breeding
B.
Tier
S.P. Smith
University of New England,
Anirreal Genetics andBreeding Unit, Ar!nidale,
NSW 2351,Australia
(received
1 March1988; accepted
1 June1989)
Summary -
Although
thecapacity
of modern computers isincreasing dramatically
sotoo are the
complexity
of models that animal breedersemploy,
with the result that westill find computers
limiting.
This paper demonstrates theemployment
of linked lists forsparse matrix
manipulations
and their use in a number of relevantapplications.
animal breeding - prediction of genetic merits - numerical methods - sparse matrix Résumé - Utilisation en génétique animale de l’absorption dans des matrices creuses.
Malgré
l’accroissement decapacité
desordinateurs,
lacomplexité également
croissante des modèlesemployés
engénétique
animale nécessite leraffinement
des méthodesnumériques.
Cet articleexplique
l’utilisation de listes liées pourmanipuler
de trèsgrandes
matri-ces creuses, et illustre leur usage dans
différents
typesd’application
dans le cadre de l’évaluation desreproducteurs: absorption d’effets fixes,
inversion dematrices,
estimations de composantes de la variance par le maximum de vraisemblance.génétique animale - évaluation des reproducteurs - analyse numérique - matrices creuses
INTRODUCTION
As the
capacity
of moderncomputers
increases so does thequantity
of data andthe
complexity
of models that animalgeneticists
wish to use in theiranalyses.
Inthe
early
years ofcomputing
when main memory was amajor limitation,
avariety
of
techniques
weredeveloped
to utiliseefficiently
that memory which was available(Bunch
andRose,
1976).
This paper illustrates the use of one of thesetechniques
-linked lists - to store sparse matrices and eliminate
(absorb)
equations. Examples
are
given
of how thistechnique
can be useful.TYPICAL MODELS
Linear models that are
commonly
usedby
animalgeneticists
havequalities
that lend themselves to efficient methods ofstorage.
Consider the model:The mixed model
equations
(Henderson, 1974)
arewhere G is the covariance matrix of u, and R is a covariance matrix of residuals.
For a univariate
analysis
G =7
A
forsome, where A is the numerator
relationship
matrix.
be the mixed model array. Because
Q
issymmetric
it isonly
necessary to storethe upper
(or
lower) triangle.
This means that moreequations
can be stored in thememory. When an animal model
(or
reduced animalmodel)
isemployed,
thenQ
is very sparse.
LINKED LISTS
A linked list consists of a list of elements linked
together by
pointers
to theirphysical
locations. The
physical
location of the first element in the row is stored and everyelement has associated with it a
pointer
to the location in the memory of the nextelement in the sequence. The
pointer
associated with the last element in the list iszero. Knuth
(1968)
provided
a detailedexplanation
of linked lists.When
using
FORTRAN 3 vectors arerequired
to store a matrix in this way -one for the element(a
u
),
one for the column(J)
and one for thepointer
to thenext element. A scalar
(NUSED)
is used topoint
to the lastoccupied
location inthese vectors. As the list is
being
built,
new elements are stored in the next availablelocation in these vectors but the order of the row is maintained
by adjusting
thepointers
(illustrated
in TableI).
Because matrices such as
Q
andG-
1
are sparse,they
lend themselves to thisvectors are reserved for the first element in each row. Each row forms its own linked list. Because
they
aresymmetrical,
it ispossible
to store the upper(or
lower)
triangle only -
thus the first(last)
element in any row is thediagonal.
Consider the
simple
linear modelwhere A and B are
systematic
effects with 2 classes each.Assign
the effectsA1, A2,
B1 and B2 to
equations
(1)
to(4)
respectively
and theright-hand
side toequation
(5).
Reserve the first 5 elements in the vectors for the first(diagonal)
element ineach row. Store the address of the last
occupied
location(5)
in the scalar NUSED. The mechanics ofusing
a linked list are illustratedby
3 records shown in Table II.Each record
generates
6 contributions to the uppertriangle.
Each of these is eitheran addition to an
existing
element or a new element. In both cases, it is necessaryto follow the sequence of
pointers
along
theparticular
row until either the elementis
found,
or an element which lies to theright
of the current contribution isfound,
or the end of the row is found. If the element is found in the list then the current
contribution is added to it. If the element is new then it is stored in the next
available location and the
pointers
(in
the row and to the lastoccupied
location)
are
adjusted accordingly.
Asimple
algorithm
to do this is shown in theAppendix.
The matrix
Q
derived from the 3 records in Table II isTable III illustrates the status of the linked list after each record has been
processed.
If iteration(e.g. Gauss-Seidel)
is to be theonly manipulation involving
Q,
then the vectorscontaining
the coefficients and column identities can be sorted afterbuilding
Q
and thepointers
associated with the first N elements can be usedto store the number of elements in each row.
However,
toimplement absorption,
ABSORPTION OF
EQUATIONS
Absorption
orgaussian
elimination is described in Smith and Graser(1986).
If thesparsity
of the matrix is to bepreserved,
as is desirable for a linked list to beuseful,
then it isimportant
to choosepivots
so that new elements do notproliferate.
Gilland
Murray
(1974)
suggest
choosing
rows with the least number ofoff-diagonal
elements first.
As each row is
absorbed,
the space itoccupied
is released and can be made available to new elements that are created in other rows. Beforeabsorbing
anyequations,
it is useful to link theunoccupied
space in the vectors into aseparate
linked list. As space is
released,
it can be added to the list of free space for reuse.Because the elements in the row are
already
connectedby
pointers,
thecomplete
row can beplaced
at the start of the list of free spaceby
modifying
thepointers
at the end of the row and at the start of the free space. If backward substitution is to beimplemented
then the row should be written as an exterior file.and its linked list
representation
is shown in Table IV. There is no need to zerothe column and coefficient vectors from the row
being
absorbed;
when this space isreused
they
will beassigned
new values.During
elimination of each row ofQ,
it ispossible
todesign
thealgorithm
so thatsubsequent
rows and elements within rows are modifiedsequentially;
redundantsearching through Q
can and should be avoided. If the selectedpivot
is zero, thenthe row can be
regarded
ashaving
beenpreabsorbed.
Analgorithm
to absorbequations
in this manner is shown in theAppendix.
For
large problems,
it ispossible
and desirable to divideQ
into 2parts:
anexterior file and a linked list.
Absorption
of a row entails:reading
the row from theexterior
file; merging
theinput
with the linkedlist;
andabsorbing
the row in thelinked list. When the linked list is
full,
it can bemerged
with the exterior file so asto create a new exterior file. This clears the vectors for a new iteration.
APPLICATIONS
All the
following
examples
have the form ofmanipulating
amatrix
Row
by row
absorption
isequivalent
torepeated application
of the formula forW
*
,
where U is a scalar
(the pivot)
and T is a column vector.1)
Sparse
matrix inversionFor
example,
findE-
1
given
thepositive
definite andsymmetric
matrix of E.Set
U
=
Enxn
T =
Inxn
W
=
Onxn
then W* =
-
E
-
1
Sometimes
only
thediagonal
ofE-
1
may berequired,
in which case the calculationand
storage
ofoff-diagonal
elements ofE-
1
can beneglected.
2)
Estimation of(co)variance
components by
maximum likelihood(ML)
or restricted maximum likelihood(REML)
Many
of the arrays in this section can be found in Searle(1979).
a)
Evaluate in MLb)
Evaluate in REMLd)
Evaluate Z’PZ in REML : method 2e)
Evaluate thelog-likelihood
(L)
for REMLusing
the derivativefree search
ofGraser et al.
(1987).
To evaluate L we note that
where
I UI
is the determinant of one of thelargest non-singular
submatrices ofU;
and
I UI
is evaluatedby
theproduct
of the non-zeropivots.
To
implement
a derivative-free search sometimes we needrank(X)
which is the number of non-zeropivots
minus the order ofG-
1
.
3)
Calculating
the exactA-’
forsub-populations
When a sire model is used it is
possible
to buildA-’
for the fullpedigree
of the siresand then absorb all female relatives. Some sires may be absorbed as well if
they
arenot
part
of thesubpopulation.
PartitionA-’
into 2parts:
animals to be absorbed inU,
and animals that are to remain in W. The W* is the exact inverserelationship
matrix for theremaining
selected animals.Experience
has shown thatabsorption
seems to create many elements that are
essentially
zero. Linked-listabsorption
works well when these zero elements are released from
storage,
particularly
from a row before it is absorbed.4)
Conducting
secondary absorptions
Sometimes it is necessary to absorb 2 groups of factors out of the model. The model
used
by
Smith(1987)
included 2765 effectsrepresenting
contemporary
groups, 2611 1fixed sire effects and 539 random sires. Rows
representing
contemporary
groups wereabsorbed as the data were
read.
Then rowsrepresenting
fixed sires were absorbedin
a reasonable timeusing
sparse matrixtechniques.
Theabsorption
of the fixed-sireThe
order used for thesecondary absorption
was determinedby
the size of thediagonals
after theprimary
absorption.
Rows with smallerdiagonals
were absorbedfirst. This order is
opposite
to the usualpractice,
however,
it minimizes the creationof non-zero elements and hence preserves the efficient use of memory. Use of the
traditional
approach
would have been asimpossible
as matrix inversion.5)
Partialabsorption prior
to iterationSometimes it may be advisable to absorb some
equations
prior
toiteration,
such asis
implicitly
doneusing
the reduced animal model(Quaas
andPollak,
1980).
CONCLUSION
Some of the
applications
we have described may not bepractical.
Forexample,
evaluating
V-
1
,
P and Z’PZ may bebeyond
currentcomputing
capabilities
even with a linked list.However,
some of theapplications
(e.g.,
evaluating
L,
constructing
A-1
forsub-populations,
andsecondary
absorptions)
are realistic and have beentested on real data structures. Without linked lists these
applications
may not befeasible.
A common
misconception
is thatevaluating
Q-
1
is about as difficult asabsorbing
all rows of
Q.
For non-sparsematrices,
inversionrequires
3 times the work ofabsorption.
For sparse matrices thecomparison
istypically
much more extreme.Inversion can be
prohibitive
even with a linkedlist,
whileabsorption
of the samematrix may be a
relatively simple
operation.
REFERENCESBunch J.R. & Rose D.J.
(ed.)
(1976)
Sparse
MatrixComputations.
AcademicPress,
New YorkGill P.E. &
Murray
W.(1974)
Methods forlarger
scalelinearly
constrainedproblems.
In: Numericad Methodsfor
ConstrainedOptimisation.
(Gill
P.E. &Murray
W.(ed.),
AcademicPress, London,
pp. 93-147Graser
H.-U.,
Smith S.P. & Tier B.(1987)
A derivative freeapproach
forestimating
variance
components
in animal modelsby
REML. J. Anim.Sci.
64,
1362-1370 Henderson C.R.(1974)
Sire evaluation andgenetic
trends. In:Proceedings of
the AnimalBreeding
and GeneticsSymposium, Blacksburg, July
29, 1972,
Am. Soc. Anim. Sci. and Am.Dairy
Sci.Assoc.,
Champaign,
IL.,
p. 10Knuth D.E.
(1968)
The Artof Computer Programming,
vol. 1. FundamentadAlgorithms,
AddisonWesley, Reading,
MAQuaas
R.L. & Pollak E.J.(1980)
Mixed modelmethodology
for farm and ranch beef cattletesting
programs. J. Anim. Sci.51,
1277-1287 .Searle S.R.
(1979)
Notes on variancecomponents
estimation: a detailed account ofSmith S.P.
(1987)
Genetic parameters fortype
in Australian Holstein-Friesiandairy
cattle. In:Proceedings of
the SixthConference,
Australian Assoc.of
AniTrc.Breeding
Genet., Perth, Australia,
February 9-11, 1987,
Anim. Genet.Breeding
Unit,
University
NewEngland, Armidale,
pp; 55-58Smith S.P. & Graser H.-U.
(1986)
Estimating
variancecomponents
in a class of mixed modelsby
restricted maximum likelihood. J.Dairy
Sci.69,
1156-1165APPENDIX
Subroutines
LINKAIJ,
LINKFREE and ABSROWThe
following
storage
isrequired
for these subroutines:ELEMENT(LENGTH)
isa vector of elements.
POINTER(LENGTH)
is a vectorholding
pointers
to the next element in the row.JAY(LENGTH)
is a vectorholding
the column(J)
of theelement. NUSED is a
pointer
to the lastoccupied
location in these vectors. ITHISand NEXT are
pointed
to this and the next elementrespectively.
Subroutine LINKAIJ stores the contribution to the element a I j which are
passed
asparameters.
Subroutine LINKFREE links up the free space in POINTER and initialises
IHEAP before ABSROW is called. IHEAP
points
to the next available locationSubroutine ABSROW absorbs the
i
th
row. Theit!’
row is transferred to twowork vectors
(WORK
which contains the elements and JWORK which containsthe
columns).
Space
releasedby
this transfer isplaced
in the availableheap
andthen the row is absorbed. OPZERO is the