Bayesian estimation of dispersion parameters with a reduced animal model including polygenic and QTL effects

(1)

HAL Id: hal-00894199

https://hal.archives-ouvertes.fr/hal-00894199

Submitted on 1 Jan 1998

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Bayesian estimation of dispersion parameters with a

reduced animal model including polygenic and QTL

effects

Marco C.A.M. Bink, Richard L. Quaas, Johan A.M. van Arendonk

To cite this version:

(2)

Original

article

Bayesian

estimation of

_dispersion

parameters

with

a

reduced

animal

model

_{including polygenic}

and

_QTL

effects

Marco

C.A.M.

Bink

Richard

L. _Quaas

Johan A.M. Van Arendonk

a

Animal

_Breeding

and Genetics

_{Group, Wageningen}

Institute of Animal

_Sciences,

Wageningen Agricultural University,

PO Box

_338,

6700 AH

Wageningen,

the Netherlands

b

Department

of Animal Science Cornell

University, Ithaca,

NY

_14853,

USA

(Received

21

_April

_1997;

_accepted

29 December

₁₉₉₇₎

Abstract - In animal

_breeding,

Markov chain Monte Carlo

_algorithms

are

_increasingly

used to draw statistical inferences about

_marginal

_posteriordistributions of _parameters in

_genetic

models. The Gibbs

_{sampling algorithm}

is most

_commonly

used and

_requires

full conditional densities to be of a standard form. In this

study,

we describe a

_Bayesian

method for the statistical

_mapping

of

_quantitative

trait loci

_((aTL),

where the

_application

of a reduced animal model leads to non-standard densities for

_dispersion

_parameters. The

_{Metropolis Hastings algorithm}

is used to obtain

_samples

from these non-standard densities. The

_flexibility

of the

_{Metropolis Hastings algorithm}

also allows us

_change

the

parameterization

of the

_genetic

model.

_{Alternatively}

to the usual variance _components,

we use one variance _component

_{(= residual)}

and two ratios of variance _components,i.e.

heritability

and

proportion

of

_genetic

variance due to the

_(aTL,

to

_parameterize

the

genetic

model. Prior

_knowledge

on ratios can more

_easily

be

_{implemented, partly by}

absence of

scale effects. Three sets of simulated data are used to

_{study performance}

of the reduced animal

_model,

parameterization of the

genetic model,

and

_testing

the presence of the

QTL

at a fixed

_{position. ©}

Inra/Elsevier,

Paris

reduced animal _{model / dispersion}_parameters

_/

quantitative trait loci

*

Correspondence

and

_reprints

E-mail:

_{[email protected]}

Résumé - Estimation _Bayésiennedes paramètres de _dispersiondans un modèle animal réduit _comprenantun effet polygénique et l’effet d’un QTL. En

_génétique

animale,

les

_algorithmes

de Monte-Carlo par chaînes de Markov sont utilisés de

_plus

en

(3)

du modèle

_{génétique. L’algorithme d’échantillonnage}

de Gibbs est utilisé

_largement

et

demande la connaissance des densités

conditionnelles,

dans une forme standard. Dans

cette

_étude,

on décrit une méthode

Bayésienne

_{pour la}

cartographie statistique

d’un locus

à effet

_quantitatif

_((aTL),

où

l’application

d’un modèle animal réduit conduit à des densités de

_paramètres

de

_{dispersion, qui}

n’ont _pasde forme standard. On utilise

_{l’algorithme}

de

Metropolis-Hastings

pour

l’échantillonnage

de ces densités non standard. La

_souplesse

de

_{l’algorithme}

de

_{Metropolis-Hastings}

permet

également

de

_changer

la

_{paramétrisation}

du modèle

_{génétique :}

au lieu des _composantesde variances

habituelles,

on _peututiliser

une _composantede variance

(résiduelle)

et deux _rapportsde _composantesde variance : l’héritabilité et la

_proportion

de la variance

génétique

dûe au

QTL.

Il est

plus

facile de

spécifier

l’information a

priori

sur des

proportions,

en

_partie

_parce

_qu’elle

ne

_dépend

_pas

de l’échelle. Trois fichiers de données simulées sont utilisés pour étudier la

performance

du

modèle animal

réduit,

par rapport au modèle animal strict, l’effet de

_{paramétrisation}

du modèle

_génétique

et la

_qualité

du test de la

_présence

d’un

_QTL

à une _positiondonnée.

©

Inra/Elsevier,

Paris

modèle animal réduit

_/

paramètres de dispersion

/

méthode de Monte-Carlo par chaînes de Markov

_/

locus quantitatif

1. INTRODUCTION

The wide

availability

of

_high-speed

_computing

and the advent of methods based

on Monte Carlo

_simulation,

_particularly

those

_using

Markov chain

_algorithms,

have

opened powerful pathways

to tackle

complicated

tasks in

_(Bayesian)

statistics

_[9,

10].

_(MCMC)

methods

provide

means for

obtaining

marginal

distributions from a

_complex

non-standard

_{joint density}

of all unknown

parameters

(which

is not feasible

_{analytically).}

There are a

_variety

of

_techniques

for

implementation

[9]

of which Gibbs

_sampling

_[11]

is most

_commonly

used in animal

breeding.

The

_applications

include univariate

_models,

threshold

_models,

multi-trait

analysis,

segregation

analysis

and

_{QTL mapping}

_[15,

17, 29,

31,

33].

Because Gibbs

_{sampling requires}

direct

_sampling

from full conditional

distribu-tions,

data

_augmentation

_[22]

is often used so that ’standard’

_sampling

densities are

obtained.

_Often,

_however,

this is at the _expenseof a substantial increase in

num-ber of

parameters

to be

sampled.

For

example,

the full conditional

_density

for a

genetic

variance

_component

becomes standard

_(inverted

gamma

distribution)

when

a

_genetic

effect is

_sampled

for each animal in the

_pedigree,

as in a

(full)

animal

model

_(FAM).

The

dimensionality

increases even more

rapidly

when the FAM is

applied

to the

_analysis

of

_{granddaughter designs}

_[34]

in

_{QTL mapping experiments,}

i.e. marker

_genotypes

on

_{granddaughters}

are not known and need to be

_sampled

as

well. In

addition,

absence of marker data

_hampers

accurate estimation of

_genetic

effects within

_{granddaughters,}

which form the

_majority

in a

_{granddaughter design.}

This

_might

lead to very slow

mixing properties

of the

dispersion

parameters

(see

also Sorensen et al.

_!21!).

The reduced animal model

_(RAM,

_Quaas

and

_Pollak,

_[19])

is

_equivalent

to the

FAM,

but can

_greatly

reduce the

_{dimensionality}

of a

_problem

_{by eliminating}

effects

(4)

[14, 18!,

however,

a standard

_density

is not

_required,

in

fact,

the

_sampling

_density

needs to be known

only

up to

proportionality.

Another alternative for the FAM is the

_application

of a sire model which

implies

that

only

sires are evaluated based on _progenyrecords. With a sire

model,

the

_genetic

merit of the dam of progeny is

not accounted for and

_only

the

_phenotypic

information on

_offspring

is used. The RAM offers the

_opportunity

to include maternal

_{relationships, offspring}

with known marker

_genotypes

and information on

_{grandoffspring.}

As a result the RAM is better

suited for the

_analysis

of data with a

_{complex pedigree}

structure.

The

_flexibility

of the MH

_algorithm

also allows for a

_greater

choice of the

param-eterization

_(variance

_components

or ratios

thereof)

of the

genetic

model. If Gibbs

sampling

is to be

_employed,

the

_{parameterization}

is often dictated

_by

mathemat-ical

_tractability

to obtain the

_{simple sampling density.}

The MH

_{algorithm readily}

admits much

_flexibility

in

_modelling

_prior

belief

_regarding

dispersion

parameters,

which is an

_advantageous

_property

in

_Bayesian

_analysis

!16!.

In this paper, we

_present

MCMC

algorithms

that allow

_Bayesian

linkage analysis

with a RAM. We

_study

two alternative

_{parameterizations}

of the

_genetic

model and

use a test statistic to

postulate

presence of a

QTL

at a fixed

_position

relative

to an informative marker bracket. Three sets of simulation data

_using

a

_typical

granddaughter design

are used.

2. METHOD

2.1. Genetic model

The additive

_genetic

variance

_(o,2)

_underlying

a

_quantitative

trait is assumed to be due to two

_independent

random

_effects,

due to a

putative

QTL

and residual

independent polygenes.

The

_QTL

effects

_(v)

are assumed to have a

N(0,

GO,2)

prior

distribution where G is the

_gametic

_relationship

matrix

_{[2, 8],}

and

_ui

is

the variance due to a

single

allelic effect at the

_QTL.

Matrix G

depends

upon one

unknown

_parameter,

the _map

_position

of the

_QTL

relative to the

_(known)

_positions

of

_bracketing

_{(informative)}

markers. Here we consider the location of the

_QTL

to be known. The

_polygenic

effects

_(u)

have a

N(0,

Au u 2)

_prior

distribution,

where A

is the numerator

_relationship

matrix. The

_genetic

model

_underlying

the

_phenotype

of an animal is

where b is the vector with fixed

effects,

vi

and

_v?

are the two

_(allelic)

_QTL

effects for

animal i,

and ei !

N(0,

lo,2). e

(QTL

effects within individual _are

assigned

according

to marker

alleles,

as

_{proposed by Wang}

et al.

_[32]).

The sum of the three

_genetic

effects is the animal’s

_breeding

value

_(a).

In addition to

_{genetic effects,}

location

parameters

comprise

fixed effects that are, a

_priori,

assumed to follow the proper uniform distribution:

_{f (b) -}

_U[b

_n

_,;

_n

_,

_{bmax! !}

where

_n

_b

_i

_m

and

_b

_max

are the minimum

and maximum values for elements in b. 2.2. Reduced animal model

_(RAM)

(5)

with neither descendants nor marker

_genotypes,

i.e.

ungenotyped

_non-parents.

The

phenotypic

information on these animals can

_easily

be absorbed into their

_parents

without loss of information.

_Absorption

of

non-parents

that have marker

genotypes

becomes more

_complex

when

_position

of

QTL

is

unknown;

it is therefore better to include them

_explicitly

in the

_analysis.

In the remainder of the _paper,it is assumed that marker

genotypes

on

_non-parents

are not available. The

genetic

effects of

non-parents

can be

_expressed

as linear functions of the

_{parental genetic}

effects

by

the

following

equations

[4],

and

where each row in P contains at most two non-zero elements

(= 0.5),

and each row in

Q

has at most four non-zero elements

[32],

the terms _wnon_-p_arcn_t_s_and

§non-parents

pertain

to

_{remaining genetic}

variance due to Mendelian

_segregation

of alleles. In a

granddaughter design,

the P and

Q

for

granddaughters,

not

_having

marker

_genotypes

observed nor

augmented,

have similar

_structures,

where Q9 denotes the Kronecker

product,

and J is a

_unity

matrix

_[20].

This

equality

does not hold if marker

_genotypes

are

augmented,

since

phenotypes

contain

information that can alter the marker

_genotype

_{probabilities}

for

_ungenotyped

non-parents

[2].

The

_phenotypes

for a

_quantitative

trait can now be

expressed

as,

for row vectors

_P

_i

and

Q

i

(possibly null),

and

where u)

i

reflects the amount of total additive

_genetic

variance that is

_present

in

E

.

2

Based on the

_pedigree,

four

_categories

of animals are

_{distinguished}

in the

R

A

M

_{(table 1).}

The vectors

P

i

and

Q

i

contain

_{partial regression}

coefficients. For

parents,

the

_only

non-zero coefficients

_pertain

to the individual’s own

_genetic

effects

(ones);

for

_non-parents,

the individual’s

_{parents’ genetic}

effects

_(halves).

Note that

P

i

and

_GZ

_i

are null for a

_non-parent

with unknown

_parents,

and that

non-parents’ phenotypes

in this

_category

contribute to the estimation of fixed effects and

_phenotypic

_(residual)

variance

_only.

2.3. Parameterization

Let B denote the set of location

parameters

(b,

u and

_v)

and

dispersion

(6)

We consider the

_following

two

_{parameterizations}

for the

dispersion

parameters

where

and

In the

_first,

0 v

c ,

the

parameters

are the variance

_components

(VC).

This is the

usual

_{parameterization.}

A

_difficulty

with this is that it is

_problematic

for an animal

breeder to elicit a reasonable

_prior

of the

_genetic

VC. Animal

breeders,

it seems

to _us,are much more

likely

to

_have,

and be able to

_state,

_{prior opinions}

about such

_things

as heritabilities.

_{Consequently,}

in

_O

_RT

_,

_{parameter h}

2

is the

_heritability

of a

trait,

and

_{parameter &dquo;(}

is the

_proportion

of additive

_genetic

variance due to the

_{putative QTL.}

This

_{parameterization}

allows more flexible

modelling

of

_prior

knowledge

because

h

2

and _-ydo not

_depend

on scale. Theobald et al.

_[23]

used

a variance

_ratio,

a u/Ue 2 2,

_{parameterization}

but noted that the animal breeder _may

prefer

to think in terms of

_{heritability.}

We

_prefer

the

_part-whole

ratios

h

2 and

_y.

The

_components

_or2

_and

_{<7! can}

be

_expressed

in terms of

_Q

_e,

h

2 _{2 and}

and

2.4. Priors

(7)

used to describe

_uncertainty

on VC. The inverted _gamma

(IG)

_{distribution,}

or its

special

case the inverted

_{chi-square distribution,}

is common because it is often the

conjugate prior

for the VC if the FAM

_(or

sire

_model)

is

_applied.

_Hence,

the full conditional distribution for VC will then be a

_posterior

_updating

of a standard

_prior

!9!.

This

_simplifies

Gibbs

_sampling.

We will use the IG as the

_prior

for

0 ’; - though

with a RAM it is not

_conjugate,

where x =

e, u, or v. The rhs of

(10)

constitutes the kernel of the

distribu-tion. The mean

_(p)

of an

3)

(

IG(o:,

is

((a -

1)(3)

-1

,

and the variance

equals

((a- 1)!-2)/!)’B

Van Tassell et al.

_[29]

suggest

setting

a = 2.000001 and

/3

!

(!)-1

for an ’almost flat’

_prior

with a mean

_{corresponding}

to

_{prior expectation}

(p,).

The IG distributions for three different

prior expectations

are

_given

in

_figure

1. When the

_{prior expectation}

is close to zero

(p,

=

5.0),

the distribution is _more

peaked

and has less variance because mass accumulates near zero. When the

_prior

expectation

is

_{relatively high}

_(p,

=

60),

the

probability

of

or2

being

equal

_tozero

is very

small,

which

_might

be undesirable

_and/or

unrealistic for

_ui.

An alternative

prior

distribution for

_or2

_is

which is a _proper

_prior

for

ufl

with a uniform

_density

over a

_{pre-defined large,}

finite

interval,

for

example

from zero to 200

_{(figure 1).}

These

_prior

distributions for VC

are used

_mainly

to

_represent

_{prior uncertainty}

_!21,

_30,

_31!.

Corresponding

to

_{(10) (11)}

there is an

_equivalent

_prior

distribution for

_A(!y).

However,

because neither

₍₁₀₎

nor

(11)

were chosen for _anyintrinsic

_{’rightness’}

we

prefer

a

simpler

alternative of

_using

Beta distributions for the ratio

_parameters

A

_{and -y}

to

_represent

_prior

_knowledge,

where x =

h

2

_or

-y. When

prior

distribution

parameters

ax and

/

3 x

are both

set

_equal

to

_1,

the

_prior

is a uniform

density

between 0 and 1

(figure 2),

i.e.

flat

_prior.

_{Alternatively,}

ax and

!3!

can be

_specified

to

_represent

_prior

expecta-tions for

_parameters

of interest

_{(figure 2).}

For

_example,

one can centre the

den-sity

for

_heritability

of a

yield

trait in

dairy

cattle around the

_{prior expectation}

(= 0.40),

with a

_relatively

flat

(Beta

(2.5, 3.75))

or

_peaked

(Beta

(30.0, 45.0))

distribution when

prior certainty

is moderate or

_strong,

_{respectively. Furthermore,}

(8)

2.5. Joint

posterior

density

(9)

animals of

_category

i

_{(table 1),}

the total number of observations

being given

as

N,

and let _qdenote the number animals with

_offspring,

i.e.

_parents.

_{Then, 2q}

is the number of

_QTL

effects

_(two

allelic effects _per

_animal).

With

_O

_v

_c,

Under

_B

_RT

_,

dispersion

parameters,

and

priors thereof,

are different from

₀

_vc

the

(10)

2.6. Full conditional densities

From the

_{joint posterior}

densities

₍₁₃₎

and

_(14),

_density

for each element in B can be derived

_by

treating

all other elements in 0 as constants and

selecting

the terms

_involving

the

_parameter

of interest. When this leads to the kernel of a standard

density,

_e.g.Normal for location

_parameters

or an IG

_{distribution,}

e.g. variance

components

with

FAM,

Gibbs

sampling

is

_applied

to draw

samples

for that element in 0.

_Otherwise,

_density

is non-standard and

sampling

needs to be done

_by

other

_techniques.

_(All

full conditional densities are

given

in the

_Appendix).

2.7.

_Sampling

non-standard densities

_{by Metropolis-Hastings}

algorithm

Sampling

a non-standard

_density

can be carried out a

_variety

of _ways,

_including

various

_rejection

_{sampling techniques}

_[6,

_{7, 12,}

_13!,

and

_{Metropolis-Hastings}

sam-pling

within Gibbs

_sampling

_!6!.

We use the

_{Metropolis-Hastings algorithm}

(MH).

Let

_!r(x)

denote the

_target

_density,

the non-standard

_density

of a

_particular

element in

_0,

and let

_{q(x, y)}

be the candidate

_generating

_density.

_Then,

the

_probability

of

move from current value x to candidate value _y

for O

i

is,

When _{y is}not

_accepted,

the value for

0 z

remains

_equal

to x, at least until the next

_update

for

_{0z .}

Chib and

Greenberg

[6]

described several candidate

generating

densities for MH. We use the random walk

approach

in which

candidate y

is drawn

from a distribution centred around the current value x. To ensure that all

_sampled

parameters

are within the

parameter

space the

sampling

distribution,

q(x, y),

was

U(B

L

,

B

u

)

with

where t is a

_positive

constant determined

_empirically

for each

_parameter

to

_give

acceptance

rates between 25 and 50

%

_{[6, 24].}

For each of the non-standard

_densities,

a univariate MH was used. We

perform

univariate MH iterations

(ten

times)

within a MCMC

_cycle

to enhance

_mixing

in the MCMC

_chain,

as

_suggested

_by

Uimari et al.

_[26].

2.8.

_Comparison

to a full animal model

(FAM)

From the conditional densities

presented,

two

_hybrid

MCMC chains can be used

to obtain

_samples

of all unknown

parameters

(O

vc

or

B

RT

)

_using

a RAM. For

(11)

and

_B

_RT

_).

The conditional densities for the FAM are a

special

case of RAM

(see

table

_{I ):}

all animals are in

_category

4 and wz = 0. In _caseof

O

vc

the conditional

densities for

_{o, 2, o}

_l

_2,

and

_<7!

are now

_recognizable

IG distributions and Gibbs

sampling

can be used to draw

_samples

from these densities

_directly.

In the case

of

O

RT

the conditional densities for

h

2 and

q remain non-standard and MH is used to draw

_samples.

Table II

_gives

the four constructed MCMC

_sampling

schemes.

2.9. Post MCMC

_analysis

Depending

on the

dispersion parameterization

(O

vc

or

_q

_RT

_),

three of five

parameters

were

sampled

(table II).

In each MCMC

cycle, however,

the

_remaining

two were

_{computed, using}

(6)

and

(7)

or

(8)

and

(9),

to allow

_comparison

of results of different

_{parameterizations.}

For

_parameter

_X,

the auto-correlation of a _sequence

m-l

of

_samples

was calculated

as

-

1 :

_{[(a! — /!)(.E,+i — j}

_i

_{z )]}

_/s!

where m = number

m i= l

’

x

of

_{samples, ji}

_z

and

A

x

are

_posterior

mean and standard

deviation,

respectively.

The correlation among

samples

for

_parameters

x and _z,within MCMC

_cycles,

was

I rn

computed

as - E

[(x

i

-

)]

[Lz

-

Zi

)(

Lc

[

1[(!xg,].

For each

_parameter

an effective m

z= i i

sample

size

_(ESS)

was

_computed

which estimates the number of

independent

samples

with information content

_equal

to that of the

_dependent

_samples

_!21!.

The null

_{hypothesis that -y}

= 0 - the

QTL explains

_no

genetic

variance - _was

.. mode

{p(r)}

}

.

tested via an odds ratio

p(! - 0) >

20

_following

Janss et al.

_(17!.

_They

_suggest

P(-! = 0)

(12)

3. SIMULATION

In this

_{study, granddaughter designs}

were

generated by

Monte Carlo simulation.

The unrelated

_grandsire

families each contained 40 sires that were half sibs. The

number of families was 20

_except

in simulation III where

_designs

with 50 families

were simulated as well

_(table

_IIB.

_Polygenic

and

_QTL

effects for

_grandsires

were

sampled

from

_N(0,

_Q

_u)

and

_N(0,

_er!),

_{respectively.}

The

_polygenic

effect for sires

was simulated as

US

=

!(UGs) 2

+

4lz ,

where UGS is the

grandsire’s polygenic

effect,

and

_*!t,

Mendelian

_sampling,

is distributed

_{independently}

as

N(0, Var (4l

j

))

with

Var(4lz)

= 0.75 x

Q

u

_{(no inbreeding).}

Each sire inherited one

QTL

at random from its

_(grand)

sire. The

_maternally

inherited

_QTL

effect for a sire was drawn

from

_N(0,

_er!).

Each sire had 100

_daughters

with

_phenotypes

_observed,

that were

generated

as

where p is a

0/1

variable. In all simulations the

phenotypic

variance and the

heritability

of the trait were 100 and

_0.40,

_{respectively.}

The

_proportion

of

_genetic

variance due to the

QTL

(= 1’

)

was

_by

default

_0.25,

or 0.10 in simulation III

(table

777).

Two

_genetic

markers

_bracketing

the

_QTL

position

at lOcM

_(Haldane

mapping

function)

were simulated with five alleles at each

_marker,

with

_equal

frequencies

over alleles per marker. For

grandsires,

the marker

genotypes

were

fully

informative,

i.e.

_{heterozygous,}

and the

_{linkage phase}

between marker alleles is assumed to be known a

_priori.

The

_uncertainty

on

_{linkage phase}

in sires can

(13)

4. RESULTS AND DISCUSSION

4.1. Simulation I:

_comparison

RAM versus FAM

For each of the four MCMC

_algorithms

that are

_given

in table

_II,

a

_single

MCMC chain was run and 2 000 thinned

samples

were used for

post-MCMC analysis

(table III).

In the case of

₀

_v

_{c , prior}

distributions for

Q

e,

Q

u

and

a

2 were

’flat’ IGs

(figure 1)

with

_expected

means

_equal

to

_60,

30 and 5

_(values

used for

_simulation),

respectively.

In the case of

B

RT

,

the

_prior

for

a

was

_again

an IG and

_priors

for

h

2

and -y

were Beta

(2.5, 3.75)

and Beta

(0.9, 2.7),

_{respectively. Figure}

_{3 presents}

the

mixing properties

for

parameter

ui

within

the chains for the

RAM-0

vc

and

RAM-0vc

alternatives and

_points

to slower

_mixing

when

_using

the FAM. This slow

_mixing

is also indicated

_{by high}

auto-correlation

_{(! 1)}

_among

_samples

for

_parameters

_ui

2

and

when

the FAM was used

(table IV).

With the same

_thinning,

the auto-correlation among

samples

in the RAM is

_x

0.70. The estimates for

posterior

mean and coefficient of

_variation,

derived from

samples

of the four

chains,

are

given

in table V. These estimates are _verysimilar over models

(RAM

and

FAM)

and

parameterizations

(0

v

c

and

_B

_RT

_).

The coefficients of variation for

_{ui and}

’Y

are

relatively large

and indicate that a

_posteriori

knowledge

on these

_parameters

remains

_small,

while estimates for

_{oe and h2}

are

accurate. The

_magnitude

of the

sampling

correlation among

parameters

within MCMC

cycles

was _verysimilar for both models and

_{parameterizations.}

The

_samples

for

_o,

and

_ufl

showed a

moderately

high

negative

correlation

_(-0.7),

while the

_sampling

correlation between

h

2 _and

₇

was

relatively

low and

_positive

(0.2).

The correlation _among

samples

for

u

£

and

h

2

was _very

_high

but

_apparently

did not

_adversely

affect the auto-correlation of these

parameters.

Taking

100 ESS as a minimum

_[26]

the MCMC chain was rather short

for statistical inferences

_{for -y}

in

_FAM-B

_RT

_.

_{However, running}

a

longer

MCMC chain was not

_practical

since the

FAM-0

vc

MCMC chain needed 68 593 min CPU

₍₄₇

days)

on a HP 9000-735

_(125Hz)

workstation. This was almost 100 times the 11 h that were needed to run the RAM with similar chain

_length.

The slow

_mixing

of

_parameters

for a FAM was

_likely

to be due to the lack of marker data on

granddaughters.

Distinction between

polygenic

and

QTL

effects

within these animals is

_{hardly possible. Consequently, they provide}

little informa-tion

_{regarding dispersion}

but because

_they

are so numerous

_they

dominate the

distribution from which the next

_sample

for the

_dispersion

_parameter

is drawn.

Heuristically,

one first

_generates

u and v with variances

_reflecting

current

a

2 .

Sub-sequently

one

samples

a new

o

2 from

a

peaked

distribution with a mean near the

sample

variance of the u and v. Not

_surprisingly

one

_gets

back a

a

2 very

similar to the

_slowly

_mixing.

The data from simulation I were also used to examine the effect of

_priors

on

posterior

inferences on the

_proportion

of

_QTL

when

_e

_RT

was used. Four different

priors

for

7

were

_{used, ranging}

from a ’flat’

(but

not a

’non-informative’)

uniform

prior

to a

density peaked

at zero. The latter reflects the

_{prior expectation}

that the

genetic

variance due to the

_QTL

is small or

_equal

to zero.

_Figure

_{4 presents}

both

prior

and

posterior

densities. The uniform and the

_{’peaked-at-zero’}

prior

resulted in the

_highest

_(0.20)

and lowest

_posterior

mean estimate

(0.10),

_{respectively.}

For this

(14)

(15)

(16)

All

_priors

_{studied, however,}

showed

_consistency

for the

_posterior

_probability

of y =

_0,

i.e. the data

supported

the

presence of a

QTL

at the studied

_position

of the chromosome.

4.2. Simulation II:

_{parameterization}

of the

_genetic

model

In simulation

_II,

five

_replicates

of data were used to

_study

the effects of alternative

_{parameterizations}

of the

_{genetic model,}

for the RAM

_only.

Genetic and

_population

parameters

were similar to those in simulation I

_{(table III).}

Based

on the results for ESS from the initial MCMC chains

(table I!,

the MCMC

chains were run for 250 000

_cycles

and every 250th was

_sample

used for

analysis

(m

=

1000).

Now,

uniform

_priors

for all

dispersion

_parameters

_wereused. The

sampling

correlations were

_averaged

over the five

_replicates

and are

_presented

in

table VI. These correlations are consistent with those from the initial MCMC chains

(table IV);

i.e. auto-correlations were

_highest

_among

_samples

for

0 &dquo;

(in

Ovc)

and

q

(in O

RT

),

i.e. around 0.68. These

parameters

also had lowest and similar ESS

(! 230).

These results indicate that

_{sampling efficiency}

is similar for the two studied

_{parameterizations}

_(0vc

and

_O

_RT

₎

of the

_genetic

model and shorter chains may suffice. The

posterior

mean

_estimates,

_averaged

over five

_replicates,

for all

dispersion

parameters

were in close

_agreement

with the values used for simulation

(17)

4.3. Simulation III: presence of the

QTL

In simulation

III,

two different

_designs

₍₂₀

or 50

_grandsire

_families)

were studied

in combination with two different sizes of the

QTL

(explaining

either 10 or 25

%

of the

_genetic

_variance).

Two different

_priors

_for

_werestudied with the

O

RT

parameterization.

For each combination of

design

and

_-y,

test runs

_preceding

the 25

replicates

were used to

_empirically

determine values for t in the MH

_algorithm,

in order to achieve the desired

_acceptance

rates. From the

_marginal

_{posterior density}

an odds ratio was

_computed

and the presence of the

QTL

was

_{accepted only}

if this ratio exceeded a critical value of 20.

_Using

this test statistic we

_postulated

the _power

of

detecting

the

QTL

for

specific

designs

and

using

different

priors

(table V77).

The small

_design

₍₂₀

x

₄₀₎

has low power of

QTL detection,

i.e.

only

25

%,

for a

QTL

that

_explains

10

%

of the

_genetic

variance. Power increased when either the

_QTL

explained

more

_genetic

variance or when a

_{large design}

(50

x

40)

was used. For

the

_large

_design

with a

_{relatively large}

QTL,

_powerof detection is 100

%,

for both

priors

considered. The use of the

_{’peaked-at-zero’ prior}

reduced _powerin the two intermediate cases but increased power in the small

design

with the small

QTL.

Estimates for

_posterior

_mode,

mean and HPD90 were

_averaged

over the 25

_replicates

and these averages are

presented

in

figure

5. When the

’peaked-at-zero’

_prior

was

used, point

estimates were lower

compared

to

_using

the uniform

_prior.

This

_prior

also led to shorter - and closer to zero - HPD90

_region

in all combinations of

_design

and

(18)

5. CONCLUSIONS

We

_presented

MCMC

_algorithms,

_using

the Gibbs

_sampler

and the MH

(19)

parameters

and also better

_mixing

of the

_dispersion

parameters.

Information on

individual

_phenotypes

led to accurate estimation of both residual variance and

heritability,

as was similar to Van Arendonk et al.

_!27!.

On the

_contrary,

_daughter

yield

deviations

_[28]

may result in poor estimation of

polygenic

and residual variances

_[25].

The use of

_B

_RT

allows a better

representation

of

_prior

belief about

dispersion

parameters

while

_sampling

efficiency

was similar to the usual

0 vc

parameterization.

Considering

ratios of variance

_components

rather than variance

_components

themselves in

_sampling

_procedures

has been

_previously

_proposed

_!23!.

_However,

our

ratios can be

interpreted

directly

and have

implicit

boundaries

(zero

and

one),

where

Theobald et al.

_[23]

needed a

specific

restriction on their ratio. The

_{representations}

of

_prior

_knowledge

in the two

_{parameterizations}

were not

_equivalent

and differences in

_posterior

estimates can be

_expected.

_However,

the use of _vague

_priors

(absence

of

_prior

_knowledge)

in the two

_{parameterizations}

lead to _verysimilar results. In this

_{study, position}

of the

_QTL

was assumed known. Extension of the MCMC

algorithm

to allow estimation of

_QTL

_position

has been studied and

_implemented

(3!.

Currently,

the method of Bink et al.

_[2]

to

_sample

_genotypes

for a

_single

marker

is

_being

extended to

_multiple

markers linked to a

_normally

distributed

_{QTL. Then,}

a

robust MCMC method becomes available for

_{linkage analysis}

in

_multiple

_generation

pedigrees allowing incomplete

information on both trait

_phenotypes

and marker

genotypes.

ACKNOWLEDGEMENT

The authors wish to thank Luc Janss and

_George

Casella for

_stimulating

discussions and

suggestions.

Comments from _{anonymous reviewers}and the editor

_{considerably improved}

the paper. The first author

acknowledges

financial _supportfrom NWO while on research

leave at Cornell

_{University, Ithaca,}

NY. The financial support of Holland Genetics is

gratefully acknowledged.

REFERENCES

[1]

Berger J.O.,

Statistical Decision

_Theory

and

Bayesian Analysis,

2nd

ed.,

Springer-Verlag,

New

_{York, NY,}

1985.

[2]

Bink

_M.C.A.M.,

Van Arendonk

_{J.A.M, Quaas R.L., Breeding}

value estimation with

incomplete

marker

_data,

Genet. Sel. Evol. 30

₍₁₉₉₈₎

45-48.

[3]

Bink

_M.C.A.M.,

Janss

_{L.L.G., Quaas R.L., Mapping}

a

_poly-allelic

_quantitativetrait

locus

using

simulated

tempering,

Proc. 6th World

Congr.

Genet.

Appl.

Livest.

Prod.,

Armidale, University

of New

_England,

Armidale NSW

_2351,

vol.

_26,

Australia, 1998,

pp. 277-280.

[4]

Cantet

R.J.C.,

Smith

C.,

Reduced animal model for marker assisted selection

_using

best linear unbiased

_prediction,

₍₁₉₉₁₎

221-233.

[5]

Casella

_{G., Berger R.L.,}

Statistical

_{Inferences, Duxbury}

_press,

_{Belmont, CA,}

1990.

[6]

Chib

_{S, Greenberg}

E.,

Understanding

the

Metropolis Hastings algorithm,

Am. Stat. 49

₍₁₉₉₅₎

327-335.

[7]

Devroye L.,

Non-uniform Random Variate

_Generation,

_{Springer-Verlag}

_Inc.,

New

(20)

[8]

Fernando

R.L.,

Grossman

M.,

Marker-assisted selection

using

best linear unbiased

prediction,

₍₁₉₈₉₎

467-477.

[9]

Gelfand

A.E.,

Gibbs

_sampling

_(a

contribution to the

_encyclopedia

of Statistical

Sciences),

Technical

_{Report, Department}

of

_{Statistics, University}

of

Connecticut,

1994.

[10]

Gelfand

A.E.,

Smith

_{A.F.M., Sampling-based approaches}

to

calculating marginal

densities,

J. Am. Statist. Assoc. 85

₍₁₉₉₀₎

398-409.

[11]

Geman

S.,

Geman

D.,

Stochastic

relaxation,

Gibbs distributions and the

Bayesian

restoration of

_images,

IEEE Trans. Pattern. Anal. Machine

_Intelligence

6

₍₁₉₈₄₎

721-741

[12]

Gilks

_W.R.,

Wild

_{P., Adaptive rejection sampling}

for Gibbs

_{sampling, Appl.}

Stat. 41

₍₁₉₉₂₎

337-348.

[13]

Gilks

_W.R.,

Best

_N.G.,

Tan

_{K.K.C., Adaptive rejection Metropolis sampling}

within Gibbs

_{sampling, Appl.}

Stat. 44

₍₁₉₉₅₎

455-472.

[14]

Hastings W.K.,

Monte Carlo

sampling

methods

_using

Markov chains and their

applications,

Biometrika 57

₍₁₉₇₀₎

97-109.

[15]

Hoeschele I.,

Bayesian QTL mapping

via the Gibbs

_Sampler,

Proc. 5th World

Congr.

Genet.

_Appl.

Livest. Prod.

_{Guelph, University}

of

_{Guelph, Canada,}

vol.

_{21, 1994,}

pp. 241-244.

[16]

Hoeschele

_I.,

Van Raden

_{P.M., Bayesian analysis}

of

_linkage

between

_genetic

markers and quantitative trait

_loci,

I Prior

_knowledge,

Theor.

_Appl.

Genet. 85

₍₁₉₉₃₎

953-960.

[17]

Janss

L.L.G., Thompson R.,

Van Arendonk

J.A.M., Application

of Gibbs

_sampling

for inference in a mixed

major gene-polygenic

inheritance model in animal

populations,

Theor.

_Appl.

Genet. 91

₍₁₉₉₅₎

1137-1147.

[18]

Metropolis, N..,

Rosenbluth

A.W.,

Rosenbluth

_M.N.,

Teller

_H.,

Teller

_{E., Equations}

of state calculations

_by

fast

_{computing machines,}

J. Chem.

_Physics

21

₍₁₉₅₃₎

1087-1091.

[19]

Quaas R.L.,

Pollak

_E.J.,

Mixed model

_methodology

for farm and ranch beef cattle

testing

programs, J. Anim. Sci. 51

₍₁₉₈₀₎

1277-1287.

[20]

Searle

S.R.,

Matrix

_Algebra

Useful for

_Statistics,

John

_Wiley

&

Sons,

New

_York,

NY, 1982.

[21]

Sorensen

D.A.,

Andersen

S.,

Gianola

D.,

Korsgaard I., Bayesian

inference in threshold models

using

Gibbs

sampling.,

₍₁₉₉₅₎

229-249.

[22]

Tanner

_{M.A., Wong W.H.,}

The calculation of

_posterior

distributions

_by

data

augmentation,

J. Am. Stat. Assoc. 82

₍₁₉₈₇₎

528-540.

[23]

Theobald

_C.M.,

Firat

_{M.Z., Thompson R.,}

Gibbs

_{sampling, adaptive rejection}

sampling

and robustness to

prior specification

for a mixed linear

model,

Genet. Sel. Evol.

29

₍₁₉₉₇₎

57-72.

[24]

Tierney L.,

Markov chains for

_exploring

posterior distributions

_{(with discussion),}

Ann. Stat. 22

₍₁₉₉₄₎

1701-1762.

[25]

Uimari

P.,

Hoeschele

I., Mapping

linked

_quantitative

trait loci

_{using Bayesian}

analysis

and Markov chain Monte Carlo

algorithms,

Genetics 146

₍₁₉₉₇₎

735-743.

[26]

Uimari

P.,

Thaller

G.,

Hoeschele I., The use of

_multiple

markers in a

_Bayesian

method for

_{mapping quantitative}

trait

_loci,

Genetics 143

₍₁₉₉₆₎

1831-1842.

[27]

VanArendonk

J.A.M.,

Tier

_B.,

Bink

M.C.A.M.,

Bovehuis

_H.,

Restricted maximum likelihood

_analysis

between

_genetic

markers and

_quantitative

trait loci for a

_{granddaughter}

design,

J.

Dairy.

Sci.

_(1997).

(21)

[29]

Van Tassell

C.P.,

Casella

_G.,

Pollak

_E.J.,

Effects of selection on estimates of

variance components

using

Gibbs

_sampling

and restricted maximum

likelihood,

J.

_Dairy

Sci. 78

₍₁₉₉₅₎

678-692.

[30]

Van Tassell

C.P.,

VanVleck

L.D., Multiple-trait

Gibbs

_sampler

for animal models: flexible programs for

bayesian

and likelihood-based

_(Co)variance

component

Inference,

J. Anim. Sci. 74

₍₁₉₉₆₎

2586-2597.

[31]

Wang C.S., Rutledge J.J.,

Gianola

D., Marginal

inferences about variance

compo-nents in a mixed linear model

_using

Gibbs

_{sampling, Genet,}

Sel. Evol. 25

₍₁₉₉₃₎

41-62.

[32]

Wang C.S., Quaas R.L.,

Pollak

E.J., Bayesian analysis

of

_calving

ease scores and birth

_weights,

₍₁₉₉₇₎

117-143.

[33]

Wang T.,

Fernando

_R.L.,

VanderBeek

_S.,

Grossman

_M.,

Van Arendonk

_J.A.M.,

Covariance between relatives for a marked

_quantitative

trait

_locus,

(1995)

251-272.

[34]

Weller

J.I.,

Kashi

Y.,

Soller

M.,

Power of

daughter

and

granddaughter designs

for

_{determining linkage}

between marker loci and

quantitative

trait loci in

_{dairy cattle,}

J.

Dairy

Sci. 73

₍₁₉₉₀₎

2525-2537.

A1. APPENDIX: !11 conditional densities

Al.l. Location

_parameters

The conditional densities for location

_parameters

are the same with either set of

dispersion

parameters

(0

v

c

or

O

RT

).

When

sampling genetic effects,

the ratios

a2

of VC needed can be

_computed

from either

_{parameterization,}

i.e.

a

u

_u1 =

2

=

or2 e

(

h2

a2

1 h2

ae

C (

1 -

y

)

x

I

-h

2

and

_a

_{-1 v}

1 =

- 92 e

=

C 2 !

x

1 -h2

In this

study

we considered

B

1 -

h )

_o!

B!2

2 1 -

n//

only

one fixed

_effect,

an overall mean _m,for which the conditional

_density

becomes

where

_ji

_r

_equals

yk corrected for

genetic effects, following

the

categorization

in table L The conditional variance of this overall mean is a

weighted

_averageover

_categories.

Again,

for

_phenotypes

on animals in

_categories

1 to

_3,

the residual

_variance,

₀

_’2

_e

_i

₇

contains

_parts

of the

_genetic

variances. The conditional

density

for the

polygenic

(22)

where

_y

₂

is the ith

_phenotype

for animal

_j,

corrected for all

_effects,

other than

polygenic,

y

l

.

is the _averageof

phenotypes

on

_non-parent

l,

also corrected for all

effects other than

_polygenic,

_op(j)

represents

the

_offspring

of animal

_j,

which are

parents

themselves,

_o

_n

_p(j)

represents

the

_offspring

of animal

_j,

which are

non-parents.

Furthermore,

UM,k is the

polygenic

effect of the other

(if known)

parent

(mate

of animal

_j)

of

_{offspring k,}

_n_j is the number of

_phenotypes

for animal

_j,

6 j

,

=

_1,

4/3,

2 when

_{0, 1,}

_or2

_parents

of j

_areidentified

(with

_no

inbreeding).

(8;

1

is the fraction

_Q

_u

_term

_4>j.)

_Finally,

_{wi is}the

reciprocal

of the amount of variance

present

in the residuals of

_phenotypes

on

animal l,

and can be calculated as

where n, is the number of observations on

animal l,

and

D¡

=

I

2 -

Q

l

_x

QT

(with

no

_inbreeding,

_[2]).

The conditional

_density

for the xth

_QTL

effect of

_{animal j}

_can

be

_given

as

where

(23)

y

z

is the ith

_phenotype

for

_{animal j,}

corrected for all effects other than

_QTL,

_W

_l

_.

is the average of

phenotypes

on

_non-parent

l,

also corrected for all effects other than

QTL,

dq!’1

is the first element of the xth row of

D7’QT

for

animal j,

and corrects for the covariance at the

_QTL

between

_parent

and

_{offspring. Similarly,}

_dqd!’1

is the first element of the xth row of

f!!D! 1Q!

for animal

_j,

and corrects for the covariance between

_parent

and the mate

_belonging

to a

_{particular offspring}

of that

parent j.

A1.2.

_Dispersion

_parameters

In the

_RAM,

the residuals

_(e)

have different variances over the

_categories

of

animals

_{(table 1).}

Hence,

conditional densities for VC in

_0vc

are not standard densities. For

_example,

when

_deriving

_density

for

_o,,2,

the term

W

i (a2+

2a

2 )

is known in the likelihood

_part

of the

_{joint posterior}

_density

_(13).

It

can thus be treated as a

constant,

but it does not

_drop

out of the

equation.

With

B

RT

,

the conditional

_density

of

_u2

_is

_standard,

but those for

h

2 _{2 and -y}

are not. With

0vc ,

the conditional

_density

of variance

_component

x, for x = _e,_{u or}_v,is

where

and

(24)