Covariance between relatives for a marked quantitative trait locus

(1)

HAL Id: hal-00894093

https://hal.archives-ouvertes.fr/hal-00894093

Submitted on 1 Jan 1995

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Covariance between relatives for a marked quantitative

trait locus

T Wang, Rl Fernando, S van der Beek, M Grossman, Jam van Arendonk

To cite this version:

(2)

Original

article

Covariance between

relatives

for

a

marked

quantitative

trait

locus*

T Wang

1 RL Fernando

S

van

der

Beek

M

Grossman

JAM

van

Arendonk

1 University

of Illinois, Department of

Animal

_{Sciences, 1207,}

W

_{Gregory Drive, Urbana,}

IL

_{61801, USA;}

2 Wageningen

Agricultural University, Department of

Animal

Breeding,

PO Box

338,

6700 AH

_Wageningen,

The Netherlands

(Received

6

_{April 1994; accepted}

lst

_February

₁₉₉₅₎

Summary - Best linear unbiased

_prediction

_(BLUP)

can be

_applied

to marker-assisted selection. This

_{application requires computation}

of the inverse of the conditional covariance matrix

_(G

_v

₎

of additive effects for the

_quantitative

trait locus

_(QTL)

linked to the marker locus

_(ML),

_given

marker genotypes. This paper presents

theory

and

_algorithms

to construct

G

v

and to obtain its inverse

_efficiently.

These

_algorithms

are

suf&ciently

_general

to accommodate situations

₍₁₎

where

paternal

or maternal

_origin

of marker alleles cannot

be determined and

₍₂₎

where the marker genotypes of some individuals in the

_pedigree

are unknown.

genetic marker

_/

marker-assisted selection

_/

best linear unbiased prediction

/

co-variance between relatives

_/

gametic relationship

*

Supported

in _part

_by

the Illinois

_{Experiment Station,}

Hatch

_Projects

35-0345

_(RLF)

and 35-0367

_(MG).

**

Correspondence

and reprints.

Résumé - Covariance entre apparentés pour un locus de caractère quantitatif marqué. La meilleure

prédiction

linéaire sans biais

(BL UP)

s’applique

à la sélection assistée par marqueur. Cela demande d’inverser la matrice

(G

v

)

des covariances entre

apparentés

des

_{effets génétiques additifs}

du locus

_quantitatif

lié au locus _marqueur,

covariances conditionnelles aux

génotypes

du _marqueur.Cet article

présente

la théorie et les

algorithmes

pour établir

G

v

et pour obtenir son inverse d’une manière

ef!îcace.

Ces

algorithmes

sont assez

_généraux

_pour

_prendre

en _comptedes situations

_i)

où

_l’origine

paternelle

ou maternelle des allèles marqueurs ne _peut_pasêtre

déterminée,

ii)

où le

génotype

marqueur de certains individus dans le

pedigree

n’est pas connu.

marqueur génétique

/

sélection assistée par marqueur

/

meilleure _prédictionlinéaire

(3)

INTRODUCTION

Theory

for covariance between relatives

_provides

the basis for use of data from rel-atives in

_genetic

evaluation. At

_present,

_genetic

evaluations in animal

_populations

are

_primarily

obtained

_by

best linear unbiased

prediction

(BLUP;

Henderson

1973)

using

trait

_phenotypes

_(T-BLUP).

Due to advances in molecular

_{biology, genetic}

markers are

becoming increasingly

available for use in

_genetic

evaluation. Several

approaches

for use in

_genetic

evaluation

_using

marker

_genotypes

and trait

pheno-types

have been discussed

_(Geldermann,

_1975;

_Soller,

_1978;

Soller and

Beckmann,

1982;

Smith and

_Simpson,

_1986;

Kashi et

al,

1990).

In

_addition,

Fernando and Grossman

₍₁₉₈₉₎

described how BLUP can be used for

_genetic

evaluation

using

marker

_genotypes

and trait

_phenotypes

_(TM-BLUP).

Some

_strategies

have been

proposed

to make TM-BLUP

_{computationally}

efficient

_(Cantet

and

Smith, 1991;

Hoeschele,

1993;

van Arendonk et

_al,

_1994).

TM-BLUP has also been extended to accommodate

_multiple

markers

_(Goddard,

_1992;

van Arendonk et

_{al 1994).}

TM-BLUP

_requires

_computation

of the inverse of the conditional covariance matrix

_(G

_v

₎

of additive effects for the

_quantitative

trait locus linked to the marker

locus, given

marker

_genotypes.

To

_compute

this

_inverse,

Fernando and Grossman

(1989)

provided

an

algorithm

that

required

information on the

parental

(paternal

or

maternal)

origin

of marker

_alleles,

in addition to information on marker

_genotypes.

The

_parental

_origin

of marker alleles in an

individual, however,

is not

_always

known. For

example,

if 2

_parents

and their

offspring

each has

_genotype

A

l

A

2

at the same marker

_locus,

marker allele

A

1

in the

_offspring

could have descended from either of the

parents,

thus the

parental origin

of

A

1

in the

_offspring

is unknown.

The

_objective

of this paper is to

present

theory

and

_algorithms

to

_compute

the conditional covariance matrix and its inverse when

_{parental origin}

of the marker alleles may not be known.

Theory

and

_algorithms

are

developed

for

pedigrees

where the marker

_genotype

of each individual is known

_(complete

marker

_data).

Application

of this

_theory

is

_given

for

_pedigrees

where the marker

_genotype

of some

individuals is unknown

_(incomplete

marker

_data).

Wang

et al

₍₁₉₉₁₎

_presented,

without

_proof,

a recursive

_equation

to construct

G

v

and an efficient

algorithm

to

compute

its inverse. This recursive

_equation

has been used

_by

van Arenonk et al

₍₁₉₉₄₎

and Hoeschele

_(1993).

In the

_present

_paper, we _provethat the recursive

equation

holds when marker data are

complete,

but does not hold

_generally

when marker data are

_incomplete.

Chevalet et al

₍₁₉₈₄₎

have described a method to

_compute

_G

_v

_given

marker

phenotypes.

This method does not

_{require knowing}

the

parental origin

of marker alleles and can accommodate

_missing

marker

_phenotypes.

The

method,

_however,

is not

_{computationally}

feasible for the

_{large pedigrees typically}

encountered in animal

breeding. Computation

of the conditional covariance matrix and its inverse become feasible

_{by conditioning}

on marker

genotypes

instead of marker

phenotypes.

NOTATION AND ASSUMPTIONS

Consider a

_single

_polymorphic

marker locus

(ML)

_closely

linked to a

_quantitative

trait locus

_((aTL),

which will be referred to as the marked

QTL

(MQTL).

Assume

(4)

denote 2 alleles at the

_ML,

and let

_QI

and

_Q;

denote

MQTL

alleles linked to

_M/

and

_Ml

as shown below

If the 2 marker alleles for individual i are

_known,

then

_they

will be

_arbitrarily

labelled as

Mi

and

M

2 .

For

example,

_supposeindividual i has marker alleles

A

3

and

_A

_l

_,

then

A

3

can be labelled as

Mi

and

A

1

as

M?,

or

_A

₁

can be labelled as

M!

and

A

3

as

M?.

If the 2 marker alleles for individual

_i,

_however,

are

unknown,

Mi

can be _anyof the marker alleles

_segregating

in the

_population,

and

_M2

can also be any of the marker alleles. For

example,

suppose there are 3 marker alleles

(A

l

,

A

2 ,

and

₃

₎

_A

_segregating

in the

_population,

then

_M

₂

₁

can be

_,

_A

_l

A

2 ,

or

_A

₃

_,

and

M

2

can also be

_A

₁

_,

_A

₂

_,

or

_A

₃

_.

Further,

let

_vl

and

_v?

be the additive effects of

_Q!

_and

_Q2,

and let

_w

=

Var( vI) =

Var(v!)

be their

_variance,

for i =

1, ... , n.

Observed marker

_genotypes

are denoted

_by

Gobs.

COVARIANCE OF

_MQTL

EFFECTS GIVEN COMPLETE MARKER DATA

The conditional covariances of additive effects of

_MQTL

alleles will be derived

separately

for alleles between individuals and for alleles within an individual.

Covariance between individuals

Suppose

s and d are

_parents

of

i,

and j

is not a direct descendant of

_{i (fig 1).}

The conditional covariance of the additive effects of

_MQTL

alleles

_Qk

_i

and

Q

in individuals i and

_{j, given}

the observed marker

_genotypes

_(Gobs),

is

where k

i

and

_k

_j

can be 1 or

2,

and

Pr(Q7

i

==

Q)&dquo; )

Gobs)

is the conditional

probability

that

_Q7

_i

is identical

_by

descent to

Q/

_{given Gobs}

_(eg,

Fernando and

_Grossman,

1989).

Because individuals s and d are

_parents

of

_i,

Q7

i

can be identical

_by

descent to

Q

in 1 of 4 _ways:

1.

_Q7

_i

descended from

_Q!

and

_Q;

was identical

by

descent to

Q)! ,

denoted

by

(Q7i {= Q;, Q; == Q!j)

2.

_Q7

_i

descended from

_Q!

and

_Q;

was identical

by

descent to

Q!j,

denoted

_by

(Q7i {= Q;, Q; == Q!j)

3.

_Q7

_i

descended from

_Qà

and

_Qà

was identical

_by

descent to

Q!j,

denoted

_by

(5)

Fig

1. Chromosome

_{fragments containing}

the ML and the

MQTL

for individuals s,

d,

i

and

_j.

4.

_Q!i

descended from

_Q!

and

_Q§

was identical

by

descent to

Q!j,

denoted

_by

!!..<-

r)2 r)2 —

r)!-’’)

Therefore,

the

_probability

in

_[1]

can be written as

Because

_{individual j}

is not a direct descendant of individual

_i,

and marker

genotypes

of s and d are

known,

the conditional

sampling

of

Q7

i

from s or d is

independent

of alleles

_{in j}

_being

identical

_by

descent to alleles in s or d

(fig 1),

given Gobs. Thus,

the

_probability

in

_[1]

can be

computed

recursively

as

Equation

[3]

was first

_given

_by

_Wang

et al

_(1991).

It will be shown later that

_[3]

does not hold

_generally

_incomplete.

Generalizing

in

_(3!,

Pr(Q7

i

«

Q;

P

IG

obs

)

is the conditional

_probability

that allele

Q/°

in

_offspring

i descended from allele

Qp

in

_parent

_p= _{s or}d for

k

i

,

k

P

= 1 _or2.

This conditional

_probability

will be referred to as the

_probability

of descent for a

(6)

for

_k

_i

= 1 or 2 and p = s or

_d,

where p = r when

_k

_P

= 1 and

p = 1 - r when

_k

_P

=

_2,

and where _ris the recombination _ratebetween the ML and

MQTL.

Further,

_{Pr(Miki !}

M;

P

¡G

obs

)

is the conditional

probability

that marker allele

_Mk’

in

_offspring

i descended from marker allele

M!’

in

_parent

_p,

_given

the

_pedigree

and marker

_genotypes.

This conditional

_probability

will be referred to as the

_probability

of descent for a marker allele

(PDM).

There are 8 PDMs for each

individual,

and their

_computations

are

_explained

in

Appendix

A. Note that the PDMs and

PD(as

associated with the unknown

_parent(s)

are undefined.

Equation

[4]

explicitly

shows the

_relationship

between

_PDQs

and PDMs in scalar notation. For

_convenience,

it is rewritten in matrix notation as

where

Covariance within an individual

The conditional covariance between additive effects

_vi

_{t and v?}

_of

MQTL

alleles

_(!

_z and

_Q?

_in

individual i with

_parents

s and

_{d, given}

_Gobs,

can be written from

[1]

as

where

_fi

=

Pr(Ql

> Q/ )Gobs)

is the conditional

probability

that 2

homologous

alleles at the

_MQTL

in individual i are identical

_by

_{descent, given Gobs.}

_{Thus, f}

_i

is the conditional

_inbreeding

coefficient of individual i for the

_{MQTL, given}

_Gobs.

This is different from

_{Wright’s inbreeding coefficient,}

which is the conditional

_probability

that 2

_homologous

alleles at _anylocus in individual i are identical

_{by descent, given}

only

the

_pedigree.

The

pair

of 2

_homologous

alleles at the

_MQTL,

_Ql

and

_Q?,

in individual i descended from 1 of the

_following

parental pairs:

(Qs,

Qd),

(Q9,

Qd),

8 Q’)

or

s

Q§) .

Let

_T,!skd

denote the event that the

_pair

of alleles in i descended from the

(7)

Because

_{(QI = Q2!Tks!d> Gobs)}

_implies

_{(QSS - Qdd !Gobs),}

_[10]

becomes

The

_Pr(T

_k

_g

_kdI

_G

_obs

₎

can be

_expressed

in terms of

_PD(as

_(see

_Appendix

_G!

as

For

_example,

where

_B

_i

_{(l, k)}

are elements

_{of B}

_i

in

_(5!.

If 1 of the denominators in

_!12!

is zero, then the entire

_{corresponding}

term is set to zero.

Tabular method to construct covariance matrix

_G

_v

The conditional covariance matrix

_(G

_v

₎

between additive effects of

_MQTL

alleles can be

_written,

from

[1]

and

!9!,

as

where A is the matrix of conditional

_{probabilities}

that

the

2

_homologous

alleles

at

_MQTL

are identical

by descent, given Gobs.

The matrix A includes a row and column for each of the 2

_MQTL

alleles in each individual. Thus the order of A is

_2n,

where n is the number of individuals in the

_pedigree.

This matrix is the conditional

gametic

relationship

matrix

_(Smith

and

_Allaire,

_1985),

_{given Gobs.}

It follows that each

_diagonal

element of this matrix is

_unity.

The tabular method to construct A is

_explained

below.

Following

Henderson

_(1976),

individuals are ordered such that

_parents

_precede

their _progeny,and individuals 1

_through

b are considered to be unrelated and non-inbred.

_Thus,

the _upperleft submatrix of A is an

_identity

matrix of order

_2b,

which is

_{expanded sequentially by}

the 2 rows and 2 columns

_{corresponding}

to individual

i,

for i =

b + 1, ... , n,

as follows:

Let 81

=

2(i — 1)

+ 1 and

6f

=

2(i — 1)

+ 2 be the row indices of A

corresponding

to the 2

_MQTL

alleles

_Ql

_and

_Q2

_of

individual i. From

_!3!,

the elements of the 2

rows 6/

and

_{6i ,}

_{corresponding}

to the 2

_MQTL

alleles of individual i with

_parents

s

(8)

for j

=

61 -

1,

where

_B

_i

_(L,

_k)

were defined in

_!6!.

Element

_À

_p

₆₁

=

fi,

where

f

i

is

_given

in

_!11!.

Elements of columns

_6!

_and

_8;

_areobtained

_by

_symmetry.

If 1

parent

is

_unknown,

terms

_involving

the unknown

parent

are

_dropped

from

!14!.

For

_convenience,

the tabular

_algorithm

described above can be written in matrix notation. Let

_A

_i

_-

_i

be the upper left submatrix of A

expanded

up to i - 1. For individual

_i,

with

_parents

s and

_{d, A}

i

_

1

is

_expanded

to

Ai

as

where

and

In

_{(17!, q’}

is a 2 x

_2(i-1)

matrix with at most 8 non-zero

_elements,

which are from

B

i

and are located in columns

6s, 8;,

6d

_and

6d.

The above tabular

_algorithm

to construct A is similar to that used to construct the numerator

_relationship

matrix

_(Emik

and

Terrill, 1949; Henderson,

1976).

Further,

A

_plays

the same role in

_prediction

of

_MQTL

effects as the numerator

relationship matrix, A,

does in

_prediction

of

_breeding

values.

ALGORITHM TO INVERT COVARIANCE MATRIX OF

_MQTL

ALLELE EFFECTS

Theory

Tier and S61kner

_(personal

communication,

1994)

and van Arendonk et al

₍₁₉₉₄₎

used

_partitioned

matrix

_theory

to

_develop

rules to invert the numerator

_relationship

matrix

_efficiently

for

populations

with unusual

_{relationships.}

A similar

approach

is used here to invert A

_efficiently.

From

_[13],

_Gj!

=

_{A -1 /a!.}

In

_general,

the inverse of

_A

_i

_,

_partitioned

as in

!15!,

can be obtained as

where

_Di

=

Ci - q§Aj- i qi

is 2 x 2 matrix

(Searle, 1982).

From

!18!,

the contribution

of individual i to

_Ai

is

_{given by}

the second term on the

_right-hand

side of this

(9)

Because of the sparse structure of qi as shown in

(17!, qiA

i

_

l

qi

can be written as

B

ics

,

d

B§,

where

_d

_,

_s

_C

is the 4 x 4 conditional

_{gametic relationship}

matrix for

parents

of

_i,

s and

d,

the elements of which are in

₁

_,

_A

_{_}

_Z

and

_B

_i

is the matrix of

PDQs

defined in

_(6!.

Thus

If

_{fi, f}

_s

and

_f

_d

are

_nulle,

then

where

₁

₂

is a 2 x 2

_identity

matrix.

The submatrix

_qiDilq!

in

_[18]

is a _{square matrix}of order

2(i — 1)

that contains

only

16 non-zero

_elements,

which are

_given

_by

l

Bi.

i

Dz

B

The submatrix

Di

l

qi

is a matrix of order 2 x

_{2(i -}

₁₎

that contains

_only

8 non-zero

elements,

which are

given

by

DilB!.

Thus,

there is a total of 36 non-zero elements

_contributing

to

_Ail

i from individual i. For

_convenience,

these 36 non-zero elements are collected into a 6 x 6 matrix:

Because

_W

_i

contains all contributions to

_l

_i

_A

from

individual

_i,

we refer to it as the ’contribution matrix’. The

_position

of contribution element

_W

_i

_(l,

_k)

is

_given

_by

element

_Il

_i

_{(1, k),}

so we define the

_{corresponding}

_’position

matrix’ for

_W

_i

as

where 6b

=

2(a-1)+b

for a = _s,

d,

or i and b = 1 _or2. If both

_parents

of individual

i are

_known,

then all elements in

_Ii

_i

are defined. If at least 1

_parent

is

_unknown,

then elements in

_II

_i

_parent(s)

are not defined.

Because qi has at most 8 non-zero

_elements,

and the

positions

of these elements are

_simple

functions of s and

_d,

[18]

leads to an efficient

_algorithm

to invert

_A,

where the number of arithmetic

_operations

for

_inverting

is

_proportional

to

_2n,

the size of A. It is

_noteworthy

that any

symmetric positive

definite matrix can be inverted

using

!18!.

Unless _q_i is _sparseand the

_positions

of the non-zero elements can be determined

_easily,

this

approach

will not be efficient. Note that

_[19]

_requires

_C

_s

_,

_d

_,

which is from

_A

_i

_{_}

₁

_.

Thus for an inbred

_pedigree,

_C

_s

_,

_d

needs first to be

_computed,

similar to the situation where

_inbreeding

coefficients need first to be

computed

when Henderson’s

_{rapid algorithm}

_(Henderson,

₁₉₇₆₎

is used to invert a numerator

(10)

Algorithm

1. Set

A-

1 equal

to the null matrix. 2. For individual

_i,

i =

_{1, ... ,}

_n:

(a)

if both

_parents

are

_unknown,

then add Is to

_A

_6i

_l

_bi

and

_A

_a

_i

₁₆₂

(b)

if at least 1

_parent

is

_known,

then:

i)

compute B

i

according

to

_[5]

ii)

compute

D

i

according

to

_[19]

for

inbreeding

or

_[20]

for

non-inbreeding

iii)

compute W

i

according

to

_[21]

iv)

for each ’defined’ element in

_II

_i

_,

add element

_W

_i

_{(l, k)}

to

A-’

at the

_{position given by}

_{Hi (I, k)}

NUMERICAL EXAMPLE WITH COMPLETE MARKER DATA Consider the

pedigree

of 5 individuals in table I. These 5 individuals are numbered

sequentially

so that

_parents

_precede

their

_offspring,

and are assumed to be from a

population

with marker allele

frequencies

of

p(A

d

=

0.7,

p(A

2 )

=

0.1,

and

p(A

3 )

= 0.2. For

_convenience,

_weassumed that

J

fl =

1.0 and _r= 0.1. For this

example,

genotype

A

Z

A

2

is

_assigned

to individual

_2,

so that marker data are

complete.

Computing

PDMs

The PDMs are undefined for individuals 1 and

2,

because their

_parents

are unknown. Individual 3 has

_parents

1 and 2.

_Thus,

as shown in

_{Appendix A,}

the 8 PDMs for individual 3 can be

computed

as

for

_k

₃

_{, kp, p}

= 1 or

_2,

where

_,

_l

_G

₂

_,

and

_G

₃

_represent

marker

_genotypes

of individuals

1, 2,

and 3. The

_right-hand

side of

_[23]

can be

_computed

from Mendelian

(11)

For individual

_4,

the

_paternal

_parent

is unknown.

_Thus,

PDMs for individual 4 can be

_computed

as

for

k

4 , k

2

= 1 _or2 where

G

u

=

AiA!

is the ordered marker

genotype

for the unknown

_paternal

_parent.

The _upperlimit of the summation is the number of marker alleles

segregating

in the

_population.

The

_resulting

PDMs are

The first 2 columns in

_S

₄

are undefined because the

_paternal

_parent

is unknown. For individual

_5,

both

_parents

are known.

_{Thus, computation}

of PDMs for individual 5 is similar to that for individual

_3,

and the

_resulting

PDMs are

Constructing

A

Individuals 1 and 2 are unrelated and non-inbred

_{(table I),}

thus the _upperleft submatrix of the conditional

_gametic

_relationship

matrix A is an

_identity

matrix of

order 4. This submatrix will be

_expanded

_by

the tabular method for individuals

3,

4,

and

_5,

as shown below.

The matrix

_B

₃

of

_PD(as

for individual 3 with

_parents

1 and 2 is

_computed

_using

S

3 according

to

_[5]:

Now,

from

_[14],

elements

_A5

_,

_j

and

_,

_j

_A6

_{for j}

=

l, ... , 4,

which

correspond

to

individual

3,

are

_computed

as linear functions of elements in the first 4 rows, which

correspond

to the

_parents

1 and 2:

Diagonal

elements

_A

₅

_,

₅

and

₆

_,

₆

_A

for individual 3 are

_unity.

Off-diagonal

element

_.!6,5,

which is defined as the conditional

_inbreeding

coefficient in

_!10!,

is null because the

(12)

of elements

_A

₅

_,

_j

_{for j =}

_{1, ... ,}

5 and

_!6,!

_{for j}

₌

_{l, ... ,}

6 are

The

_{corresponding}

column elements are obtained

by

_symmetry.

The

_PD(as

_{computed using}

!5!:

For individual

_4,

numerical values of elements

_A

₇

_,

_j

_{for j}

₌

1, ... ,

7 and

_)..8,

_j

for

j = 1,...,8 are

The

PD(as

_computed

_using

!5!:

To

_compute

_f

₅

defined in

_!10!,

we need

-

3 Pr(Q3

Q!4IGobs)

and

Pr(T

kak4

IG

obs

)

for

k

3 , k

4

= 1 or 2.

_{Probabilities,}

Pr(Q3

3 -

Q!4IGobs),

have

_already

been

_computed

as

Probabilities,

Pr(r!![Go6s);

can be obtained

_according

to

_[12]

as

Similarly,

Pr(Tl2!Gobs)

=

41/100,

Pr(T2l!Gobs)

=

41/100,

and

Pr(T

22I

G

obs

)

(13)

For individual

_5,

numerical values of elements

_>’9,

_j

_{for j =}

_{1, ... ,}

9 and

_A

_l

_o,j

for

j = 1,...,10 are

The conditional

_{gametic relationship}

matrix

_(A)

is

Inverting

A

Set

A-

1

to the null matrix. For each of the 5

_individuals,

the contribution matrix

W

i

and

_{corresponding}

_position

matrix

_H

_i

are

computed

as described below. The inverse of A is obtained

_by

_adding

elements

_W

_i

_(l,

_k)

to

A-

1

at

_positions

indicated

by

elements

_{IIi(l, k).}

For the first 2

_individuals,

the

_parents

are unknown.

_Thus,

add Is to

_{AIL A2!,}

A3!

and

_A4!.

For individual

_3,

_PD(as

_(B

₃

₎

can be obtained as shown earlier. Because individual 3 is not

_{inbred, D}

₃

=

I

2 -

B3B!,

from

(20!.

Matrix

W

3

is _in

table II and

_II

₃

is in table III.

Similarly,

for individual

_4,

matrices

_W

₄

and

₁₁

₄

are in tables II and III. Note that 1

_parent

of individual 4 is unknown. Those elements in

_W

₄

and

₁₁

₄

_parent

are undefined.

From the

_{previous section,}

individual 5 is inbred

_{( f}

₅

=

0.045).

Thus,

[19]

is used

to obtain

_D

₅

=

C5 - B5C3,4B!,

where

C

5

and

C

3 ,

4

_were

_computed

_inthe

_previous

section:

(14)

(15)

The

A-

1

matrix is

COVARIANCE OF

_MQTL

EFFECTS GIVEN INCOMPLETE MARKER DATA

Algorithms

to construct and invert the conditional

_{gametic relationship}

matrix

_(A),

given complete

marker

_data,

are based on the recursive

_equation

!3).

In

_deriving

[3]

from

_!2!,

it was

_{assumed, given complete}

marker

_data,

that events

Q7; {::::

_Q§

and

Qs -

_Q

_i

_kj

for example,

are

_{independent. They}

_maynot

_always

be

_independent,

however,

when marker

_genotypes

of the

_parents

are unknown.

_Thus,

_although

[2]

holds for

_complete

and

incomplete

marker

data,

[3]

may not hold for

_incomplete

marker data.

Therefore, algorithms developed

for

_complete

marker data cannot be

_{directly applied,}

in

_general,

to

_pedigrees

with

_incomplete

marker data. In this

section,

we first demonstrate that

_[3]

may not hold when marker

_genotypes

of

parents

are unknown. A

_strategy

to accommodate

_pedigrees

with

_incomplete

marker data is then

_presented.

The

_pedigree

in table I is used to demonstrate that

_[3]

_{may not}hold when marker

_genotypes

of the

_parents

are unknown. In this

_pedigree,

marker

_genotype

of individual

2,

the maternal

parent

of individuals 3 and

4,

is unkown.

_Thus,

as shown

_below,

_{Pr(Q4 -}

_Q2)

cannot be

_{computed using}

_!3!.

From

_!2!,

.

The last 2 terms in

_[26]

are null because the

_QTL

alleles in the unknown

_parent

of individual 4 cannot be identical

_by

descent to

_QTL

alleles in individual 3. In

deriving

[3]

from

_!2!,

it was

_{assumed, given Gobs,}

that

Q4 ! Q’

and

Qz = Q2,

for

example,

are

_independent,

ie

Because the marker

_genotype

for the maternal

_parent

of individual 4 is

_unknown,

(16)

Given the

parents’

genotypes,

the

_genotypes

of

_offspring

are

_independent.

Therefore,

Pr(Qi « Q’, Q’

=

!w3I!’robs)

can be

_computed

_{by conditioning}

on the

genotype

of individual 2

_(parent

of individuals 3 and

₄₎

as

The

_{probabilities required}

in the above

_computation

are

From the above

table,

Pr(Q4 ! Q)]G

obs

)

and

_Pr(Q!

=

Q3IG

obs

)

can also be

(17)

The values of

_{Pr(<! 4=}

_Q2!Gobs)

=

1/24,

Pr(Q2

=

Q3IGobs)

=

1/2,

and

Pr

(Q’ 4 - <- Q2, Q2

=

Q3!Gobs) =

3/400

illustrate that

Pr(! 4= Q

,

Ql

-

l

Q2

i

G

ob,,) =

A

Pr(‘‘!4 ! Q

(

w2 =

‘

2l

Pr

obs)

G

Q

2

3 ob.,)

Because

_[3]

_maynot hold when marker

_genotypes

of

parents

s and d are

unknown,

the tabular

_algorithm

for

_complete

marker data cannot be

_{applied directly}

to construct

_A,

_{given incomplete}

marker data. The tabular

_algorithm

can be

used,

however,

to construct A

_{given incomplete}

marker

_data,

as described below..

Let S2 be the set of all

_possible

marker

_genotype

_{configurations}

for individuals with unknown

_genotypes,

and let

Gobs

be the observed marker

_genotypes

for individuals with known

_genotypes.

The conditional

_{gametic relationship}

matrix

given

incomplete

marker

data,

_A

_IGobs’

can then be

computed

as

where

_A

_lw

_,G

_Ob8

is the conditional

_gametic

relationship

matrix

_given

marker

geno-types

w for individuals with unknown

genotypes

and

Gobs

genotypes,

and

_Pr(w

_I

_G

_obs

₎

is the conditional

_probability

of individuals with unknown

_genotypes

_having

marker

_genotypes

w,

given

marker

genotypes

Gobs

_genotypes.

The matrix

_A

_lw

_,

_GOb8

can be constructed

using

the tabular method

_given

_complete

marker

_data,

and the

_probability

_Pr!Go!)

can be

computed

as

where

_{Pr(w, G}

_obs

₎

can be

computed efficiently

(Elston

and

Stewart, 1971;

Bonney,

1984).

The

conditional

_{gametic relationship}

matrix

_(A)

for the

_pedigree

in table

I,

computed

using

!27!,

is

(18)

unknown

_genotypes.

_Further,

an efficient

algorithm

to invert

_AI

_Gobs

has not been found.

_Therefore,

2

_approximate

methods to

_compute

_A¡

_Gobs

and its inverse are

presented:

1)

We have

_already

shown that

_[3]

may not hold for

incomplete

marker data

because,

given

Gobs,

Q7i !

Q!

and

_{Q9 =}

_Q

_ki

in

_[2],

for

_example,

_maynot be

_independent.

If we

_ignore

this

_dependency,

then

[15]

and

(18!,

which are based on

(3!,

can be used to

_approximate

A and its inverse. This

_{approximation}

will

_require

PDMs for individuals with

_incomplete

marker data. For individual

_i,

with unknown marker

genotypes

for

_parents

s and

d,

PDMs can be

_computed

as

where each summation is over all

_possible

_genotypes

at the ML. If

_G

_s

_,

_G

_d

_,

or

G

i

is not

_missing,

then the

_{corresponding}

summation should be

dropped

from

!28!.

The

_computation

of

_{Pr(G,,Gd,GilG!b,)}

can be _very

_{time-consuming}

when a

_large

number of individuals have unknown marker

_genotypes.

An

_{approximation}

for

_Pr(G

_s

_,

_G

_d

_,

_{Gi ] Gobs)}

can be

_obtained,

however,

by conditioning only

on marker

information of ’close’ relatives of

_i,

s and

_{d, where,}

for

_example,

a set of ’close’ relatives for an individual could be its

parents,

sibs and

_offspring.

The conditional

gametic relationship

matrix

_(A),

for the

_pedigree

in table

_{I, using}

this

approxima-tion is

The consequence of this

approximation

is that the summation in

[27]

has been

brought

into inside of A and

_performed

on

_B

_i

_(or

_S

_i

_,

see

[5]).

2)

Let wmax be the

genotype

configuration

in S2 with the

_{largest probability.}

Given w

max and

Gobs,

[15]

and

[18]

can be used to

_approximate

A and its inverse. Sheehan et al

₍₁₉₉₃₎

_proposed

a

_sampling

_scheme

to

compute

the

_probability

of

genotype

configurations.

For

the

_pedigree

in table

_{I, given}

_Gobs,

_G

_z

=

A

i

A

2 has

the

_largest

.conditional

_probability

_(2/3)

among all

possible

genotypes

for

G

2 , ie

w

max =

(Gz

=

AiA

2 ).

Thus,

[15]

_canbe used _{to construct}A with

G

Z

=

_A

_i

_A

₂

_.

_The

(19)

The consequence of this

approximation

is that the

_resulting

A is conditional on wmax ·

A measure of how well an

_{approximation}

_comparesto the exact method is the correlations

coefficient,

r’exact,a.pprox;between upper

off-diagonal

elements of

_AI

_Gob

_{s ,}

computed

exactly

by

!27!,

and

_{corresponding}

elements

_{computed by}

_approximate

methods. For the

_pedigree

in table

I,

Texact,approxl = 0.9877 for

approximation

1 and _?’exact_,_a_pp_rox2 = 0.8735 for

_{approximation}

2.

To further examine these

_{approximations}

_{rexa!t,apProxi and rexa!c,appTOx2}were

computed

for a

_pedigree

of 99 individuals with 3

_generations.

The first

_generation

consisted of 3

_grandsires,

each mated with 12

_granddams.

The second

_generation

consisted of 2 sires and 10 dams from each

_grandsire

for a total of 6 sires and 30 dams. Each sire was

_randomly

mated with 4

_{dams, avoiding}

full-sib and halfsib

matings.

The third

_generation

consisted of 2

_grandsons

and 2

_{granddaughters}

from each sire for a total of 12

grandsons

and 12

_{granddaughters.}

Marker

_genotypes

were assumed

_missing

for the 30 maternal

_granddams.

Thus covariances were

_only

computed

for the

_remaining

69 individuals in the

_pedigree.

Marker

_genotypes

for these 69 individuals were

_{generated randomly. Granddaughters}

and dams without progeny were

_assigned

_missing

marker

genotypes

with

_probability

0.6.

Exact and

_approximate

covariances were

_computed

for 20

_{randomly generated}

marker

_genotype

_{configurations.}

_{The average for}_r_exact_,_a_pp_rox_iwas 0.8923 and for

!’exact,approx2 was 0.8939. The effect of these

_{approximations}

on marker-assisted

genetic

evaluation needs to be studied.

DISCUSSION

Theory

and

_algorithms

are

_presented

here to construct the conditional covariance matrix between relatives for a marked

_quantitative

trait locus

_(G

_v

=

Au v 2)

and

(20)

construct

_A!Gobs

for

_incomplete

marker data may not be efficient for

large pedigree.

Therefore,

we

presented

2 alternative

_strategies

to

_approximate

_A!Gobs

and its inverse. Simulation results indicate that the 2

_{approximations}

are similar because

they

have similar correlations with the exact method

₍

_?’exact

_,

_a

_pp

_rox

_{i =}

_0.8923,

rexact,approx2 !

0.8939).

Approximation

(1)

is

preferred, however,

because it may be difficult to search for wmax when a

_large

number of individuals have unknown marker

genotypes.

We also

_presented

an

_algorithm

to

_compute

the conditional

_inbreeding

coefficient

( fi)

for a

QTL

_{given Gobs,}

which is different from

_Wright’s

_inbreeding

coefficient. This conditional

_inbreeding

coefficient is the

_probability

that the 2

_homologous

alleles at the

_MQTL

in an individual are identical

_by

descent

_given

the

_pedigree

and marker

information,

whereas

_{Wright’s inbreeding}

coefficient is the conditional

probability

that the 2

_homologous

alleles at _anylocus in an individual are identical

by

descent

_given

_only

the

_pedigree.

A numerical

_example

is used to show that

equation

!3!,

which is the basis of tabular method to construct

_G

_v

_,

does not hold

generally

_incomplete.

In most

_{practical situations,}

marker information will not be available on distant ancestors.

_Thus,

TM-BLUP cannot be

computed.

One of the 2

_{approximations}

presented

in this paper,

however,

can be

_employed

to

_compute

_A

_IG

_o

_bs’

Thus available marker information can be used to obtain

_improved

_genetic

evaluations

by approximate

TM-BLUP.

_Further,

in

_general,

information on distant ancestors has little

_impact

on

_genetic

evaluations.

If the ML and

_MQTL

are in

_{linkage disequilibrium,}

marker data

_provide

information on the first moment of

_MQTL

effects. In this

_{situation, regression}

techniques

can be used for

_genetic

evaluation

_using

marker and trait information

(Lande

and

_Thompson,

_1990;

_Zhang

and

Smith,

1992).

If the ML and

_MQTL

are in

_linkage

_equilibrium,

marker data do not

_provide

information on the first moment of the

_MQTL

effects. Even with

equilibrium, however,

marker data do

_provide

information on covariances of

_MQTL

effects. In this

_situation,

TM-BLUP can be used for

_genetic

evaluation

_{by fitting MQTL}

effects as random effects within animal

(Fernando

and

_Grossman,

_1989;

Cantet and

_Smith,

_1991;

_Goddard,

_{1992; Hoeschele,}

1993).

Genetic evaluation

by

TM-BLUP

_requires

_knowledge

of

_genetic

_parameters,

such as r and

o, v 2.

This is also true for

_T-BLUP,

which

_requires

_knowledge

of

genetic

variances and covariances. In

_practice,

true values of

_genetic

_parameters

are unknown and estimates are used in their

_places.

Both restricted maximum likelihood and maximum likelihood

approaches

can be used to estimate

_parameters

required

for TM-BLUP

_(Weller

and

_Fernando,

_1991).

Ideally,

marker-assisted selection will be based on

multiple

marker loci. When the

_linkage

phase

between

_flanking

marker loci is known in addition to the

_parental

origin

of marker

_alleles,

the method

_presented

_by

Goddard

₍₁₉₉₂₎

for

multiple

markers can be used for TM-BLUP. Further research is needed for TM-BLUP

_using

multiple

markers when both the

_linkage

_phase

between

_flanking

marker loci and the

(21)

ACKNOWLEDGMENT

The authors would like to thank an _anonymousreviewer for

_pointing

out an error in the

manuscript.

REFERENCES

Bonney

GE

₍₁₉₈₄₎

On the statistical determination of

_major

_genemechanisms in contin-uous human traits: _regressivemodels. Am J Med Genet 18, 731-749

Cantet

_RJC,

Smith C

₍₁₉₉₁₎

Reduced animal model for marker-assisted selection

_using

_prediction.

Genet Sel

_{Evol 23,}

221-233

Chevalet

_C,

Gillois

_{M, Khang}

JVT

₍₁₉₈₄₎

Conditional

_{probabilities}

of

_identity

of _genes

at a locus linked to a marker. Genet Sel

Evol 16,

431-444

Elston

_RC,

Stewart J

₍₁₉₇₁₎

A

general

model for the

_{genetic analysis}

of

_pedigree

data. Hum Hered _21,523-542

Emik

LO,

Terrill CE

₍₁₉₄₉₎

_{Systematic procedures}

for

_{calculating inbreeding}

coefficients. J Hered

_40,

51-55

Fernando

_RL,

Grossman M

₍₁₉₈₉₎

Marker-assisted selection

_using

prediction.

Genet Sel

Evol 21,

467-477

Geldermann H

₍₁₉₇₅₎

_{Investigations}

on inheritance of

_quantitative

characters in animals

by

gene markers. I. Methods. Theor

A

P

pI

Genet

_46,

319-330

Goddard ME

₍₁₉₉₂₎

A mixed model for

_analyses

of data on

_{multiple genetic}

markers. Theor

Appl

Genet 83, 878-886

Henderson CR

₍₁₉₇₃₎

Sire evaluation and

_genetic

trend. In: Anim Breed Genet

_Symp

in Honor

_of

Dr J L

_Lush,

Am Soc Anim Sci and Am

_Dairy

Sci

_{Assoc, Champaign, IL,}

USA,

10-41

Henderson CR

₍₁₉₇₆₎

A

_simple

method for

_computing

the inverse of a numerator

relationship

matrix used in

_prediction

of

_breeding

values. Biometrics

_32,

69-83 Hoeschele I

₍₁₉₉₃₎

Elimination of

_quantitative

trait loci

_equations

in an animal model

incorporating genetic

marker data. J

_Dairy

Sci

76,

1693-1713

Kashi

Y,

Hallerman EM, Soller M

₍₁₉₉₀₎

Marker-assisted selection of candidates sires for progeny

testing

programs. Anim Prod

51, 63-74

Lande

R, Thompson

R

₍₁₉₉₀₎

_Efficiency

of marker-assisted selection in the

improvement

of

_quantitative

traits. Genetics

_124,

743-756

Searle SR

₍₁₉₈₂₎

Matrix

_{Algebra Useful for}

Statistics. John

_Wiley

&

_Sons,

New

_York,

USA Sheehan

_N,

Thomas A

₍₁₉₉₃₎

On the

_{irreducibility}

of a Markov chain defined on a _space

of _genotype

_{configurations by}

a

_sample

scheme. Biometrics

_49,

163-175

Smith

_{C, Simpson}

SP

₍₁₉₈₆₎

The use of

_{genetic polymorphisms}

in livestock

_improvement.

J Anim Breed Genet _103,205-217

Smith

SP,

Allaire FR

₍₁₉₈₅₎

Efficient selection rules to increase non-linear merit:

applica-tion in mate selection. Genet Sel

_{Evol 17,}

387-406

Soller M

₍₁₉₇₈₎

The use of loci associated with _quantitativetraits in

_dairy

cattle

improvement.

Anim Prod 27, 133-139

Soller

_M,

Beckmann JS

₍₁₉₈₂₎

Restriction

_{fragment length polymorphisms}

and

_genetic

improvement.

In: 2nd World

_Congress

Genet

_Appl

Livest

Prod, Madrid,

Editorial

_Garsi,

Madrid, Spain,

vol 6, 396-404

van Arendonk

JAM,

Tier

_{B, Kinghorn}

BP

₍₁₉₉₄₎

Use of

multiple genetic

markers in

prediction

of

_breeding

values. Genetics

_137,

319-329

(22)

Weller

JI,

Fernando RL

₍₁₉₉₁₎

Strategies

for the

improvement

of animal

_{production using}

marker-assisted selection. In: Gene

_{Mapping: Strategies, Techniques}

and

_Applications

(LB

Schook,

HA

_Lewin,

DG

_McLaren,

_eds),

Marcel

_Dekker,

New

_{York, USA,}

305-328

Zhang W,

Smith C

₍₁₉₉₂₎

_Computer

simulation of marker-assisted selection

utilizing

linkage disequilibrium.

Theor

_Appl

Genet 83, 813-820

APPENDIX A

Theory

for

_computation

of PDMs

Let

_G

_s

=

M; M;,

G

d

=

MIM2

and

G

i

=

Mi M2

be the marker

_genotypes

of 2

parents

s and d and their

_offspring

i. Given

_C

_s

_,

G

d

and

_G

_i

_,

the

_probability

that

Mik

i

descended from

Mp!

does not

_depend

on other information in the

pedigree.

Thus,

_{Pr(Mf° «}

MpP!Go6s)

=

Pr(Miki !

M;

P

ICs, C

d

, Ci),

which _canbe obtained

as

The numerator and denominator of

_[All

are

easily computed

from Mendelian

principles.

For

_example,

if 2

_parents

and their

_offspring

each has marker

_genotype

A

l

A

2 ,

ie

_G

_s

=

Ms Ms

=

A

2 ,

I

G

d

=

MlM2 -

2 A

A

l

and

G

i

=

Mi M

2

=

A

l

A

2 ,

then

Thus,

Pr(M1 ! M

;

IC

s

,

G

d

,

G

i

)

=

1/2.

Other

examples

are listed below.

Eight

PDMs for each individual i are collected into matrix

_S

_i

_,

which is defined in

_!7!.

APPENDIX B

Theory

for

_computation

_{of PDQs}

(23)

Because

Qf°

and

_{M ki}

are on the same chromosome of individual

_i,

each must have descended from the same

_parent.

_Thus,

Pr M,&dquo; 4--

Mp&dquo;54,, Qk° ! QPP !Gobs)

is null.

Now,

There are 2

_{probabilities}

on the

_right-hand

side in

!B2!.

The first

_probability

is a PDM for individual i

_(see

_[All

for its

_{computation).}

The second

_probability

can be

_expressed

in terms of PDMs and of the recombination rate r between the ML and the

MQTL

as

_explained

below.

Given

_Mf°

_«

M:!,

the

_probability

that

_Qf°

descended from

QP

P

does not

depend

on other information in the

pedigree.

Thus,

If

_k

_§

₌

_kp,

then recombination has not taken

place,

so that

If

_{k’ p 54}

_kp,

then recombination has taken

place,

so that

(24)

The

_PD(as,

_{Pr( Q7}

_i

_{=

Q

k,

_!Gobs),

for

_ki,

_k

_P

=

_{1, 2,}

_canbe obtained

by using

the

above in

_!B2!:

where p = s or d.

In _summary,

for

k

i

= 1 or 2 and p = s or

_d,

_{where p =}r when

_kp

= 1 and

p = 1- r when

kp

= 2. Note that

_PDC!s

are now

expressed

in terms of PDMs and r.

APPENDIX C

Theory

for

_computation

_{ofPr(TkskdIGobs)}

The event that the

_pair

of alleles

_(<!,<3!)

in individual i descended from

_parental

pair

(Qs

s

,

Qd

d

)

(fig 1),

is denoted

_by

_kskd

_T

for

_k

_s

_k

_d

= 1 _or2. This _eventcan occur in 1 of _{2 ways:}

1.

_Q¡

descended from

_Q!s

and

_Q/

from

_Qd

_d

_,

denoted

_{(Q! 4= Q!8, Q}

₇

_.;=

Q!d)

2.

_Ql

descended from

_Qk

_d

and

_Q2

from

_Qk

_s

denoted

_{(Q$ «}

_{0!,Q! !=}

_Qs

_s

₎

Given the

_pedigree

and marker

genotypes,

the

_probability

of

_T

_kskd’

which is denoted

_by

_{Pr(T!!!!Go!),}

can be written as

Consider the first

_probability

on the

right-hand

side in

[Cl],

which can be

expressed

as

where

_{Pr(Q2 !}

Q!!G!)

is a

PDQ

and

Pr(Q/ « Q!Q? !

Qa

d

, G

obs

)

can be

expressed

in terms of

PDQs

for individual

_i,

as

explained

below.

(25)

Observe that event

_{Q/ «}

s is

implied by

Q} {= Qss;

therefore,

Further,

Thus,

[C3]

can be rewritten in terms of

PD(as

as

After

_substituting

_[C4]

in

_!C2!,

the first

_probability

on the

right-hand

side in

[Cl]

can be written in terms of

PD(as.

The same

_approach

is

_applied

to the second

probability

in

_!C1!.

_Then,

_obs

₎

_G

_kskJ

_Pr(T

can be

_expressed

in terms of

_PD(as

as

If 1 of the denominators in

_[C5]

is zero

(indicating

the event in 1 of the terms in

Covariance between relatives for a marked quantitative trait locus

HAL Id: hal-00894093

https://hal.archives-ouvertes.fr/hal-00894093

Submitted on 1 Jan 1995

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Covariance between relatives for a marked quantitative

trait locus

T Wang, Rl Fernando, S van der Beek, M Grossman, Jam van Arendonk

To cite this version:

Original

article

Covariance between

relatives

for

a

marked

quantitative

trait

locus*

T Wang

1

RL Fernando

S

der

Beek

M

Grossman

JAM

Arendonk

1

University

of Illinois, Department of

Sciences, 1207,

Gregory Drive, Urbana,

61801, USA;

2

Wageningen

Agricultural University, Department of

Breeding,

338,

Wageningen,

(Received

April 1994; accepted

February

1995)

prediction

(BLUP)

applied

application requires computation

(G

v

)

quantitative

(QTL)

(ML),

given

theory

algorithms

G

v

efficiently.

algorithms

suf&ciently

general

(1)

paternal

origin

(2)

pedigree

_{Sciences, 1207,}

_{Gregory Drive, Urbana,}

_{61801, USA;}

_Wageningen,

_{April 1994; accepted}

_February

₁₉₉₅₎

_prediction

_(BLUP)

_applied

_{application requires computation}

_(G

_v

₎

_quantitative

_(QTL)

_(ML),

_given

_algorithms

_efficiently.

_algorithms

_general

₍₁₎

_origin

₍₂₎

_pedigree

_/

_/

_/

_by

_{Experiment Station,}

_Projects

_(RLF)

_(MG).

_{effets génétiques additifs}

_quantitatif

_généraux

_prendre

_i)

_l’origine

_provides

_genetic

_present,

_genetic

_populations

_primarily

_by

_phenotypes

_(T-BLUP).

_{biology, genetic}

_genetic

_genetic

_using

_genotypes

_(Geldermann,

_1975;

_Soller,

_1978;

_Simpson,

_1986;