Use of sparse matrix absorption in animal breeding

(1)

HAL Id: hal-00893816

https://hal.archives-ouvertes.fr/hal-00893816

Submitted on 1 Jan 1989

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Use of sparse matrix absorption in animal breeding

B. Tier, S.P. Smith

To cite this version:

(2)

Original

article

Use of

_{sparse matrix}

_absorption

in

_animal

_breeding

B. Tier

S.P. Smith

University of New England,

Anirreal Genetics and

_{Breeding Unit, Ar!nidale,}

NSW 2351,

Australia

(received

1 March

_{1988; accepted}

1 June

₁₉₈₉₎

Summary -

Although

the

_capacity

of modern _computersis

_{increasing dramatically}

so

too are the

_complexity

of models that animal breeders

_employ,

with the result that we

still find _computers

_limiting.

This paper demonstrates the

employment

of linked lists for

sparse matrix

manipulations

and their use in a number of relevant

_{applications.}

animal _{breeding -}_predictionof _genetic_{merits - numerical methods - sparse matrix} Résumé - Utilisation en _génétiqueanimale de _{l’absorption}dans des matrices creuses.

Malgré

l’accroissement de

_capacité

des

ordinateurs,

la

_{complexité également}

croissante des modèles

_employés

en

_génétique

animale nécessite le

_raffinement

des méthodes

_numériques.

Cet article

_explique

l’utilisation de listes liées pour

manipuler

de très

grandes

matri-ces _{creuses, et}illustre leur _usagedans

_différents

_types

_{d’application}

dans le cadre de l’évaluation des

_{reproducteurs: absorption d’effets fixes,}

inversion de

_matrices,

estimations de composantes de la variance par le maximum de vraisemblance.

génétique animale - évaluation des _{reproducteurs - analyse numérique -}matrices creuses

INTRODUCTION

As the

_capacity

of modern

computers

increases so does the

_quantity

of data and

the

_complexity

of models that animal

_geneticists

wish to use in their

_analyses.

In

the

_early

_yearsof

_computing

_{when main memory}was a

_{major limitation,}

a

_variety

of

_techniques

were

_developed

to utilise

_efficiently

that memory which was available

(Bunch

and

_Rose,

_1976).

This _paperillustrates the use of one of these

_techniques

-linked lists - to store _sparsematrices and eliminate

_(absorb)

_{equations. Examples}

are

_given

of how this

_technique

can be useful.

TYPICAL MODELS

Linear models that are

_commonly

used

_by

animal

_geneticists

have

_qualities

that lend themselves to efficient methods of

_storage.

Consider the model:

(3)

The mixed model

_equations

_{(Henderson, 1974)}

are

where G is the covariance matrix of u, and R is a covariance matrix of residuals.

For a univariate

_analysis

G =

7 A

for

some, where A is the numerator

_relationship

matrix.

be the mixed model array. Because

Q

is

_symmetric

it is

_only

necessary to store

the upper

(or

lower) triangle.

This means that more

_equations

can be stored in the

memory. When an animal model

(or

reduced animal

model)

is

_employed,

then

Q

is very sparse.

LINKED LISTS

A linked list consists of a list of elements linked

_{together by}

_pointers

to their

_physical

locations. The

_physical

location of the first element in the row is stored and _every

element has associated with it a

_pointer

to the location in the memory of the next

element in the _sequence.The

_pointer

associated with the last element in the list is

zero. Knuth

₍₁₉₆₈₎

_provided

a detailed

_explanation

of linked lists.

When

_using

FORTRAN 3 vectors are

_required

to store a matrix in this _way -one for the element

(a

u

),

one for the column

_(J)

and one for the

_pointer

to the

next element. A scalar

_(NUSED)

is used to

_point

to the last

occupied

location in

these vectors. As the list is

_being

built,

new elements are stored in the next available

location in these vectors but the order of the row is maintained

_{by adjusting}

the

pointers

(illustrated

in Table

_I).

Because matrices such as

Q

and

G-

1

are sparse,

they

lend themselves to this

(4)

vectors are reserved for the first element in each row. Each row forms its own linked list. Because

_they

are

_symmetrical,

it is

_possible

to store the upper

(or

lower)

triangle only -

thus the first

_(last)

element in any row is the

_diagonal.

Consider the

_simple

linear model

where A and B are

_systematic

effects with 2 classes each.

_Assign

the effects

A1, A2,

B1 and B2 to

_equations

₍₁₎

to

₍₄₎

_respectively

and the

_right-hand

side to

_equation

(5).

Reserve the first 5 elements in the vectors for the first

_(diagonal)

element in

each row. Store the address of the last

_occupied

location

₍₅₎

in the scalar NUSED. The mechanics of

_using

a linked list are illustrated

_by

3 records shown in Table II.

Each record

_generates

6 contributions to the upper

triangle.

Each of these is either

an addition to an

_existing

element or a new element. In both cases, it is necessary

to follow the _sequenceof

_pointers

_along

the

particular

row until either the element

is

_found,

or an element which lies to the

_right

of the current contribution is

found,

or the end of the row is found. If the element is found in the list then the current

contribution is added to it. If the element is new then it is stored in the next

available location and the

_pointers

_(in

the row and to the last

_occupied

_location)

are

_{adjusted accordingly.}

A

_simple

_algorithm

to do this is shown in the

_Appendix.

The matrix

_Q

derived from the 3 records in Table II is

Table III illustrates the status of the linked list after each record has been

processed.

If iteration

_{(e.g. Gauss-Seidel)}

is to be the

_{only manipulation involving}

Q,

then the vectors

_containing

the coefficients and column identities can be sorted after

_building

_Q

and the

_pointers

associated with the first N elements can be used

to store the number of elements in each row.

_However,

to

_{implement absorption,}

(5)

ABSORPTION OF

_EQUATIONS

Absorption

or

_gaussian

elimination is described in Smith and Graser

(1986).

If the

sparsity

of the matrix is to be

_preserved,

as is desirable for a linked list to be

_useful,

then it is

important

to choose

_pivots

so that new elements do not

_proliferate.

Gill

and

Murray

(1974)

suggest

choosing

rows with the least number of

off-diagonal

elements first.

As each row is

_absorbed,

the _spaceit

occupied

is released and can be made available to new elements that are created in other rows. Before

_absorbing

any

equations,

it is useful to link the

_unoccupied

space in the vectors into a

_separate

linked list. As space is

released,

it can be added to the list of free space for reuse.

Because the elements in the row are

_already

connected

_by

_pointers,

the

complete

row can be

_placed

at the start of the list of free space

by

modifying

the

pointers

at the end of the row and at the start of the free _space.If backward substitution is to be

_implemented

then the row should be written as an exterior file.

(6)

and its linked list

_{representation}

is shown in Table IV. There is no need to zero

the column and coefficient vectors from the row

_being

_absorbed;

when this _spaceis

reused

they

will be

_assigned

new values.

During

elimination of each row of

_Q,

it is

_possible

to

_design

the

_algorithm

so that

subsequent

rows and elements within rows are modified

sequentially;

redundant

searching through Q

can and should be avoided. If the selected

_pivot

is _zero,then

the row can be

regarded

as

having

been

preabsorbed.

An

algorithm

to absorb

equations

in this manner is shown in the

_Appendix.

For

_{large problems,}

it is

_possible

and desirable to divide

_Q

into 2

_parts:

an

exterior file and a linked list.

_Absorption

of a row entails:

_reading

the row from the

exterior

_{file; merging}

the

_input

with the linked

_list;

and

_absorbing

the row in the

linked list. When the linked list is

_full,

it can be

_merged

with the exterior file so as

to create a new exterior file. This clears the vectors for a new iteration.

APPLICATIONS

All the

_following

_examples

have the form of

_manipulating

a

matrix

(7)

Row

_{by row}

_absorption

is

_equivalent

to

_{repeated application}

of the formula for

_W

_*

_,

where U is a scalar

(the pivot)

and T is a column vector.

1)

Sparse

matrix inversion

For

_example,

find

E-

1 _given

the

_positive

definite and

_symmetric

matrix of E.

Set

U

₌

_Enxn

T =

Inxn

W

₌

_Onxn

then W* =

-

E

-

1

Sometimes

_only

the

_diagonal

of

E-

1

may be

required,

in which case the calculation

and

_storage

of

_off-diagonal

elements of

E-

1

can be

_neglected.

2)

Estimation of

_(co)variance

_{components by}

maximum likelihood

_(ML)

or restricted maximum likelihood

(REML)

Many

of the _arraysin this section can be found in Searle

(1979).

a)

Evaluate in ML

b)

Evaluate in REML

(8)

d)

Evaluate Z’PZ in REML : method 2

e)

Evaluate the

_{log-likelihood}

_(L)

for REML

_using

the derivative

_{free search}

of

Graser et al.

_(1987).

To evaluate L we note that

where

I UI

is the determinant of one of the

largest non-singular

submatrices of

U;

and

I UI

is evaluated

_by

the

_product

of the non-zero

_pivots.

To

_implement

a derivative-free search sometimes we need

rank(X)

which is the number of non-zero

_pivots

minus the order of

G-

1 .

3)

Calculating

the exact

A-’

for

_{sub-populations}

When a sire model is used it is

_possible

to build

A-’

for the full

_pedigree

of the sires

and then absorb all female relatives. Some sires _maybe absorbed as well if

they

are

not

part

of the

_{subpopulation.}

Partition

A-’

into 2

_parts:

animals to be absorbed in

U,

and animals that are to remain in W. The W* is the exact inverse

_relationship

matrix for the

_remaining

selected animals.

_Experience

has shown that

absorption

seems to create _manyelements that are

_essentially

zero. Linked-list

_absorption

works well when these zero elements are released from

_storage,

particularly

from a row before it is absorbed.

4)

Conducting

secondary absorptions

Sometimes it is necessary to absorb _{2 groups}of factors out of the model. The model

used

_by

Smith

₍₁₉₈₇₎

included 2765 effects

_representing

_contemporary

_{groups, 2611}1

fixed sire effects and 539 random sires. Rows

_representing

_contemporary

groups were

absorbed as the data were

read.

Then rows

_representing

fixed sires were absorbed

in

a reasonable time

_using

sparse matrix

techniques.

The

absorption

of the fixed-sire

(9)

The

order used for the

_{secondary absorption}

was determined

_by

the size of the

diagonals

after the

_primary

_absorption.

Rows with smaller

_diagonals

were absorbed

first. This order is

_opposite

to the usual

_practice,

_however,

it minimizes the creation

of non-zero elements and hence preserves the efficient use of memory. Use of the

traditional

_approach

would have been as

_impossible

as matrix inversion.

5)

Partial

_{absorption prior}

to iteration

Sometimes _{it may}be advisable to absorb some

_equations

_prior

to

_iteration,

such as

is

_implicitly

done

_using

the reduced animal model

_(Quaas

and

Pollak,

1980).

CONCLUSION

Some of the

_applications

we have described _maynot be

_practical.

For

_example,

evaluating

V-

1 ,

P and Z’PZ _{may be}

_beyond

current

_computing

_capabilities

even with a linked list.

_However,

some of the

_applications

_(e.g.,

_evaluating

_L,

_constructing

A-1

for

_{sub-populations,}

and

_secondary

_absorptions)

are realistic and have been

tested on real data structures. Without linked lists these

_applications

may not be

feasible.

A common

_{misconception}

is that

_evaluating

_Q-

₁

is about as difficult as

_absorbing

all rows of

Q.

For _non-sparse

_matrices,

inversion

_requires

3 times the work of

absorption.

For _sparsematrices the

_comparison

is

_typically

much more extreme.

Inversion can be

_prohibitive

even with a linked

_list,

while

_absorption

of the same

matrix _maybe a

_{relatively simple}

_operation.

REFERENCES

Bunch J.R. & Rose D.J.

_(ed.)

₍₁₉₇₆₎

_Sparse

Matrix

_{Computations.}

Academic

_Press,

New York

Gill P.E. &

_Murray

W.

₍₁₉₇₄₎

Methods for

_larger

scale

_linearly

constrained

problems.

In: Numericad Methods

_for

Constrained

_{Optimisation.}

_(Gill

P.E. &

Murray

W.

_(ed.),

Academic

_{Press, London,}

_pp.93-147

Graser

_H.-U.,

Smith S.P. & Tier B.

₍₁₉₈₇₎

A derivative free

_approach

for

_estimating

variance

_components

in animal models

_by

REML. J. Anim.

Sci.

64,

1362-1370 Henderson C.R.

₍₁₉₇₄₎

Sire evaluation and

_genetic

trends. In:

_{Proceedings of}

the Animal

_Breeding

and Genetics

_{Symposium, Blacksburg, July}

_{29, 1972,}

Am. Soc. Anim. Sci. and Am.

_Dairy

Sci.

_Assoc.,

_Champaign,

_IL.,

_p.10

Knuth D.E.

₍₁₉₆₈₎

The Art

_{of Computer Programming,}

vol. 1. Fundamentad

Algorithms,

Addison

_{Wesley, Reading,}

MA

Quaas

R.L. & Pollak E.J.

₍₁₉₈₀₎

Mixed model

_methodology

for farm and ranch beef cattle

_testing

_programs.J. Anim. Sci.

_51,

1277-1287 .

Searle S.R.

₍₁₉₇₉₎

Notes on variance

_components

estimation: a detailed account of

(10)

Smith S.P.

₍₁₉₈₇₎

Genetic parameters for

_type

in Australian Holstein-Friesian

dairy

cattle. In:

_{Proceedings of}

the Sixth

_Conference,

Australian Assoc.

_of

AniTrc.

Breeding

Genet., Perth, Australia,

February 9-11, 1987,

Anim. Genet.

_Breeding

Unit,

University

New

_{England, Armidale,}

pp; 55-58

Smith S.P. & Graser H.-U.

₍₁₉₈₆₎

_Estimating

variance

_components

in a class of mixed models

_by

restricted maximum likelihood. J.

_Dairy

Sci.

_69,

1156-1165

APPENDIX

Subroutines

_LINKAIJ,

LINKFREE and ABSROW

The

_following

_storage

is

_required

for these subroutines:

_{ELEMENT(LENGTH)}

is

a vector of elements.

_{POINTER(LENGTH)}

is a vector

_holding

_pointers

to the next element in the row.

JAY(LENGTH)

is a vector

_holding

the column

_(J)

of the

element. NUSED is a

_pointer

to the last

_occupied

location in these vectors. ITHIS

and NEXT are

_pointed

to this and the next element

_{respectively.}

Subroutine LINKAIJ stores the contribution to the element a I j which are

_passed

as

_parameters.

Subroutine LINKFREE links _upthe free space in POINTER and initialises

IHEAP before ABSROW is called. IHEAP

_points

to the next available location

(11)

Subroutine ABSROW absorbs the

i

th

row. The

it!’

row is transferred to two

work vectors

_(WORK

which contains the elements and JWORK which contains

the

_columns).

_Space

released

_by

this transfer is

_placed

in the available

_heap

and

then the row is absorbed. OPZERO is the

_operational

zero

(if

the absolute value of

Use of sparse matrix absorption in animal breeding

HAL Id: hal-00893816

https://hal.archives-ouvertes.fr/hal-00893816

Submitted on 1 Jan 1989

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Use of sparse matrix absorption in animal breeding

B. Tier, S.P. Smith

To cite this version:

Original

article

Use of

sparse matrix

absorption

in

animal

breeding

B.

Tier

S.P. Smith

University of New England,

Breeding Unit, Ar!nidale,

(received

1988; accepted

1989)

Although

capacity

increasing dramatically

complexity

employ,

limiting.

employment

manipulations

applications.

Malgré

capacité

ordinateurs,

complexité également

employés

génétique

raffinement

numériques.

explique

manipuler

grandes

différents

d’application

reproducteurs: absorption d’effets fixes,

matrices,

capacity

computers

quantity

complexity

geneticists

analyses.

early

computing

major limitation,

variety

techniques

developed

efficiently

(Bunch

Rose,

1976).

techniques

(absorb)

equations. Examples

given

technique

_{sparse matrix}

_absorption

_animal

_breeding

_{Breeding Unit, Ar!nidale,}

_{1988; accepted}

₁₉₈₉₎

_capacity

_{increasing dramatically}

_complexity

_employ,

_limiting.

_{applications.}

_capacité

_{complexité également}

_employés

_génétique

_raffinement

_numériques.

_explique

_différents

_{d’application}

_{reproducteurs: absorption d’effets fixes,}

_matrices,

_capacity

_quantity

_complexity

_geneticists

_analyses.

_early

_computing

_{major limitation,}

_variety

_techniques

_developed

_efficiently

_Rose,

_1976).

_techniques

_(absorb)

_{equations. Examples}

_given

_technique

_commonly

_by

_geneticists

_qualities

_storage.

_equations

_{(Henderson, 1974)}

_analysis

_relationship

_symmetric

_only

_equations

_employed,

_{together by}

_pointers

_physical

_physical

_pointer

_pointer

₍₁₉₆₈₎

_provided

_explanation

_using

_required

_(J)

_pointer

_(NUSED)

_point

_being

_{by adjusting}

_I).

_they

_symmetrical,

_possible

_(last)

_diagonal.

_simple