Behaviour of the additive finite locus model

(1)

HAL Id: hal-00894262

https://hal.archives-ouvertes.fr/hal-00894262

Submitted on 1 Jan 1999

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Behaviour of the additive finite locus model

Ricardo Pong-Wong, Chris S. Haley, John A. Woolliams

To cite this version:

(2)

Original

article

Behaviour of the

additive

finite

locus model

Ricardo

_Pong-Wong*

Chris S.

_Haley,

John A.

Woolliams

Roslin Institute

_(Edinburgh),

_Roslin,

Midlothian EH25

_{9PS, Scotland,}

UK

(Received

7

_{September 1998; accepted}

2

_April

₁₉₉₉₎

Abstract - A finite locus model to estimate additive variance and the

_breeding

values

was

implemented using

Gibbs

_sampling.

Four different distributions for the size of the

gene effects across the loci were considered:

i)

uniform with loci of different

_effects,

_ii)

uniform with all loci

having equal effects,

iii)

exponential,

and

_iv)

normal. Stochastic simulation was used to

study

the influence of the number of loci and the distribution of their effect assumed in the model

_analysis.

The

_assumption

of loci with different and

_uniformly

distributed effects resulted in an increase in the estimate of the additive

variance

_according

to the number of loci assumed in the model of

_{analysis, causing}

biases in the estimated

_breeding

values. When the gene effects were assumed to be

exponentially distributed,

the estimate of the additive variance was still

_dependent

on the number of loci assumed in the model of

analysis,

but this influence was much

less. When

_assuming

that all the loci have the same _geneeffects or when

they

were

normally distributed,

the additive variance estimate was the same

_regardless

of the number of loci assumed in the model of

_analysis.

The estimates were not

_{significantly}

different from either the true simulated values or from those obtained when

_using

the standard mixed model

_approach

where an infinitesimal model is assumed. The results indicate that if the number of loci has to be assumed a

_priori,

the most useful finite locus models are those

assuming

loci with

equal

effects or

normally

distributed

effects.

_©

_{Inra/Elsevier,}

Paris

’

finite locus

_{model / gene}

effect distribution

_/

Gibbs _{sampling /}infinitesimal model

Résumé - _Comportementdes modèles additifs à nombre fini de loci. On a

utilisé,

via la méthode de

_{l’échantillonnage}

de

_Gibbs,

des modèles à nombre fini de loci pour estimer les variances

génétiques

additives et les valeurs

_{génétiques.}

On a

considéré quatre distributions différentes des effets de

_gènes

sur l’ensemble des loci :

i)

distribution uniforme avec loci à effets

_variables,

_ii)

distribution uniforme avec

loci à effets

_égaux,

_iii)

distribution

exponentielle,

et

_iv)

distribution normale. La

simulation

_stochastique

a été utilisée pour étudier l’influence du nombre de loci et de

*

Correspondence

and

_reprints

(3)

la distribution

_supposée

de leurs effets.

_{L’hypothèse}

d’effets différents et uniformément distribués a entraîné le fait que la variance

génétique augmentait quand

le nombre

supposé

de loci

_augmentait,

ce

qui

a causé des biais dans l’estimation des valeurs

génétiques. Quand

les effets de

gènes

ont été distribués

exponentiellement,

l’estimée de la variance

_génétique

additive a été encore

_dépendante

du nombre de loci

supposé,

quoiqu’à

un moindre

_{degré. Quand}

on a

_supposé

_{que tous}les loci avaient les mêmes

effets de

_gènes

ou

_quand

ils ont été normalement

_distribués,

l’estimée de la variance

génétique

additive a été la

_{même, quel}

_quesoit le nombre de loci

supposé

dans

l’analyse.

Les résultats

_indiquent

que si le nombre de loci est

_{supposé d’après}

des considérations a

_priori,

les modèles à nombre fini de loci les

_plus

utiles sont ceux

_qui

supposent des loci à effets

_égaux

ou à distribution normale.

©

Inra/Elsevier,

Paris modèle fini

_/

distribution d’effets

_/

échantillonnage de Gibbs

_/

modèle in-finitésimal

1. INTRODUCTION

Genetic evaluation in livestock has

_{traditionally}

been carried out

_using

an

infinitesimal

genetic model,

where the trait is assumed to be influenced

_by

an

infinite number of _genes,each with an

_{infinitesimally}

small effect.

Although

such a model is

biologically

incorrect,

its use has been

justified

because it

allows the

_handling

of the total additive

_genetic

effect as a

_normally

distributed

variable so that standard statistical mixed model

_techniques

can be

_applied.

Indeed,

solutions from the normal

_{approximation}

_{appear to}be robust

_enough

for

_practical

selection _purposes,

_provided

the trait is not controlled

_by

a small

number of

_loci,

few

_generations

are considered

(so

that there are no substantial

changes

in the alleles

_frequencies

due to selection or

_drift)

and the additive

genetic

effect alone is considered

_!17!.

The

_arguments

_justifying

the use of the infinitesimal model _are,

_however,

being

weakened

_by

the

_increasing

_knowledge

about the

_genetic

architecture of

quantitative

traits.

_Single

_genesthat have a

_{relatively large}

effect on

quantitative

traits

_(e.g.

Booroola _gene,double muscle _gene,

_Callipyge

_gene)

are

_expected

to have a

rapid change

in allele

frequency

due to selection. Under these

_{circumstances,}

the infinitesimal model would

_{wrongly predict}

the evolution of the

_genetic

variance even when the selected trait is also affected

by

a

large

number of loci with small effects

[8].

Moreover,

the

_assumptions

required

to describe dominance with the infinitesimal model are unclear

[25].

Thus,

alternative

_approaches

to

_{incorporating}

the extra

_knowledge

about the

genetic make-up

of

_quantitative

traits should be considered.

In this _paper,an additive finite locus model is defined and

_implemented

using

Gibbs

_sampling.

The effects of the

_assumptions

about the number of loci and the distribution of the size of their effects are

_{studied, extending}

the results

previously reported by Pong-Wong

et al.

_!24!.

The results obtained with the finite locus model are

compared

with those obtained

_using

the mixed model

(4)

2. MATERIALS AND METHODS

2.1. Finite-locus

_genetic

model

A

_quantitative

trait is assumed to be

_genetically

controlled

_by

L unlinked biallelic loci.

_Following

the same notation as Falconer

_[4],

each locus

_l,

has

an additive

(a,)

effect with a

frequency

of the favourable allele in the base

population of pi .

The additive variance

_explained

_by

locus l is then

_2P

_I

_(1-

_PI

_)af.

Since the loci are assumed to be unlinked and in

_linkage

_equilibrium

the total additive variance

_{(or a 2)}

is the sum over all the loci. The trait is also assumed to

be affected

_by

an environmental deviation which is

_normally

distributed with

mean zero and variance

o,

2 .

Other environmental fixed and random effects _may

also be included in the model

_but,

for

_{simplicity, they}

are not considered here. In matrix

_algebra

the linear model is

_expressed

as:

where y is the

(n

x 1)

vector of

phenotypic records,

p the overall mean,

a the

(L

x

1)

vector of additive

_(a)

effects for each

_locus,

e the

(n

x

₁₎

vector of environmental

deviation,

and

_W

_a

is the

_{(n x L)}

matrix of additive effects associated to the individual’s

_genotype.

_Assuming

that the

_genotypes

are denoted as

_AA,

AB and BB

_(BB

the least favourable

genotype),

the value

in column l of

_W

_a

would be

_1,

0 or

_-1,

for a

phenotypic

observation from

an individual with

_genotype

(at

the

l locus)

AA,

AB or

_BB,

respectively.

The

vector a-, is defined the same as a but

excluding

the effect at the locus 1. 2.1.1. Distribution of the size of gene effects

Since the size of the effects across the different loci are assumed to be

different,

an

_assumption

about how the _geneeffects are distributed is

_required.

Here,

three

_possible

distributions to model the gene effects are examined:

i)

uniform,

ii)

exponential,

and

_{iii) (folded-over)}

normal.

The

_probability

_density

functions for the distribution of the size of the additive effects

₍

₀

_(a))

when

_assuming

the

_{uniform, exponential}

and the

(folded-over)

normal

_{distributions,}

_{respectively,}

are:

where

_Aa

is the scale

_parameter

for the

_exponential

and the normal distribu-tion. The

density

function

_0(a)

is defined

_only

for the range of the

positive

numbers

_(including

_zero)

since a

_is,

_{by definition,}

the effect of the favourable

(5)

or

exponentially

distributed is consistent with the

general

belief that most of

the loci

_affecting

a

_given

quantitative

trait would have a small

effect,

while

_only

a few genes have a

_major

effect on the trait in

_question.

2.2.

_{Implementation}

of the finite locus model

using

Markov chain Monte Carlo

Genetic

_analyses

assuming

the

_proposed

finite locus model involve the esti-mation of the gene effect at each

locus,

the

_parameter

_defining

the distribution

of _{the gene}

_effects,

the

_genotype

_probability

for each individual at all the loci and their allele

frequencies.

In the model of

analysis,

the number of loci

affect-ing

the trait in

_question

as well as the distribution of their effects are assumed

known. The total additive variance is estimated as a linear function of the effect

and allele

_frequency

across all the loci

_(i.e.

_er!

=

2 2!(1 -p!a!).

A

graphical

i

representation

of the finite locus model is

_presented

in

_figure

1.

The main

_problem

in

_implementing

a finite locus

_genetic

model

_using

a

standard likelihood

approach

is the calculation of the

_genotype

_probability

for all the loci. In

practice

this task is

_{computationally}

very difficult because of the

large

number of

_possible

_genotype

combinations that need to be

_considered,

a

number which

_rapidly

increases with the number of individuals. This

problem

becomes further exacerbated with

complex

pedigree

structures

_{involving loops}

(6)

In order to avoid this

_problem,

the finite locus model

proposed

is

imple-mented

_using

a Markov chain Monte Carlo

(MCMC)

_approach

based _upon

Gibbs

_{sampling algorithms previously suggested}

for

_segregation

studies of

un-typed single

genes in

complex pedigree

structures

(e.g.

[16,

18]).

These

algo-rithms are

simply

extended to include L loci

accounting

for the entire

genetic

effects. Because all loci are assumed to be unlinked the

sampling

of the

genotype

at each locus is

_performed

_{independently.}

A

_sampling

_protocol

for

_updating

the relevant

parameters

(conditional

on

the

_others)

of a finite locus model in the Markov chain would then be as follow:

1)

sample

overall _mean;

2)

sample

the

_genotype

_{configurations}

locus

_by

_locus;

3)

sample

the gene effects locus

by

locus;

4)

sample

the scale

_parameter

of the assumed distribution of gene effects

(not

needed when

_assuming

a uniform

distribution);

5)

sample

all other environmental fixed and random effects

_(not

included

here);

6)

sample

non-permanent

environmental variance and variance for other

random effects.

The

_sampling

of the allele

_frequencies

for each locus may also be added in the

_sampling

scheme. In this

_{study, however, they}

were not estimated but

_they

were fixed to be 0.5.

The full conditional distributions for the gene effects and the scale parame-ter for the distribution of _gene

_effects,

needed

_during

the

_sampling

_process,are

presented

below. The conditional distributions of other

_parameters

_(e.g.

geno-type

configuration,

environmental

_variance,

other random and fixed

_effects)

are

not shown here since

_they

have been described in

_previous

studies

_reported

in the literature. For the

_description

of the

_algorithms

used to

_sample

_genotypes

see Guo and

Thompson

[16]

and Janss et al.

[18] (the

latter

algorithm

was used

here,

since it allows a better

_mixing

in

_pedigrees

with

_large

family

sizes).

For

the use of Gibbs

_sampling

in more

_general

_genetic

evaluations and the

condi-tional distributions of other environmental

effects,

see Firat

[7]

and

Wang

et al.

[29, 30].

2.2.1. Joint

_{posterior density}

(conditional

on the

_genotype

_structure)

The full conditional

_density

for the effect at each locus as well as the scale

parameter

of the distribution of _geneeffects are obtained from their

_joint

posterior

density by

extracting

the terms

_containing

the variable in

_question.

The

_{joint posterior}

_density

_{of 0’; , a}

and

_A

_a

conditional on the

_genotype

structure

(considered

as known to

_simplify

the

_expression)

is of the form:

(7)

and

_P(a§)

are the

_prior

distributions of

A

a

and

0’

;,

respectively.

The respec-tive

_{conjugate prior}

distribution for

A

a

when

assuming

the _geneeffects

_being

exponentially

and

normally

distributed is

_proportional

to

_(A

_a

_)-

_v

_-’exp(-vs/

_Aa

₎

and

_-

_),

_(A

_exp(-0.5vs/A

_l

_a

₂

_/

_,

_{) -}

_a

where v is the

degree

of belief and s the

_prior

value of

_A

_a

_.

_Assuming

that v is

_equal

to zero

(i.e.

there is no belief _{in any}

particular

value of

_s)

_gives

the ’naive’

_prior,

which is

_proportional

to

1/Aa-This

_prior

denotes a lack of

_prior

knowledge

about the

_parameter

and it has

been used as a

_prior

for variance

_components

_including

some animal

breed-ing implementations

[9, 29!.

In this

_study

’naive’

_priors

were used for both

A

a

and

_{a 2}

2.2.2. Conditional distributions for the

_(size

of

_the)

_geneeffects The conditional distribution of the gene effects

depends

on the

_assumption

of how

_they

are distributed.

!.!.!.1.

_Uniform

and

_independent

When the additive effects are assumed to be

_uniformly

_distributed,

the

conditional

_density

depends only

on the first term of

equation

(5)

(i.e.

the

second term is a

constant).

_Thus,

the conditional distribution for the effect of

the locus l is

_proportional

to:

which is

_equivalent

to a truncated normal distribution with mean

ii,

and

variance

or

evaluated

in the _rangeof

_positive

values. The value for

a

l

is the solution from the linear model

equal

to

(2:

YAA

-

2: Y

BB

)

_/(n

_AA

+

_n

_BB

_),

and

Q

Z

its

error variance

_equal

to

_AA

_{0,2 e /(n}

+

_n

_BB

_),

where _ygis the

adjusted

phenotype

of individuals with

updated

genotype

g, and ng is the number of

records from individuals with such a

_genotype.

The solution of the linear model

â

l

,

is

_equivalent

to the coefficient from the

_regression

_(passing

_through

the

origin)

of the

_phenotype

_(adjusted

for the effect of other loci and any other environmental

_effects)

on the

_genotype

value

(i.e.

_1,

0 or -1 for the record

from an individual

sampled

to have

_genotype

AA,

AB or

BB,

respectively).

The conditional distribution

_resulting

from

assuming

a uniform distribution

has been

_generally

used to

_sample

the

_major

gene effect in mixed inheritance models

_(e.g.

_[18]).

2.2.2.2.

_Uniform

and constant

(8)

regression

is on the number of loci

sampled

as AA minus the number of loci

sampled

as BB for the individual

_contributing

to the

_record).

2.2.2.3.

_E!ponential

The full conditional distribution of the effect of locus l is

_proportional

to:

where

_a

_l

and

Q2

are defined as in

_equation

(6).

_Rearranging

the

_previous

equation

results in the

_following:

where the first term is

_proportional

to a normal distribution with mean

a,

l

-

U2.!a

and

variance

_Q2

_,

and the second term is a constant.

Substitut-ing

the values

a,

and

a

as

defined in

_equation

_(6),

the full conditional dis-tribution is a truncated normal defined for the

_positive

values with mean

(! yAA -

YBB

-

£

0 ’;À-

1 )/(n

AA

+

)

n

BB

and variance

_{oe 2 / (n}

_AA

+

_n

_BB

₎

2.2.2.l!.

Folded-over normal

Extracting

the terms

_containing

a, in

equation

(5),

its conditional distribu-tion is

_proportional

to:

and when

substituting

the values of

_at

and

_!2,

the

_{previous expression}

can be

rearranged

as

which is

_proportional

to a truncated normal with mean

(2:

_y_AA_-

2 :

YBB)

(n

AA

+ nBB+

0’; À;;:-l)

1 and

variance

_AA

_(n

+ nBB +

_0’

_{; À;;:-l )-}

₁

₀

_’

_;.

2.2.3. Conditional distribution of the scale

_parameter

of the _gene effect distribution

(9)

effects is

_being

assumed. The estimation of this

_parameter

is not

_required

when

assuming

that the gene effects are

_uniformly

distributed.

The conditional

_density

of

_A

_a

under the

_assumption

that the _geneeffects are

exponentially

distributed and with ’naive’

_prior

is:

which is

_equivalent

to:

where _’Y₍₁_,_L₎is a _gammadistribution with scale and

_shape

_parameters

_equal

to

1 and

_L,

_{respectively.}

Similarly,

when the _geneeffects are

_normally

_distributed,

the conditional

distribution of

Aa

assuming

a ’naive’

_prior

is:

which is a scaled inverted

chi-squared

of the form:

2.3. Simulated

_population

2.3.1.

_Population

structure

The structure of the simulated

_population

consisted of a base

_population

of 80 unrelated individuals

₍₄₀

males and 40

_females)

_plus

five other discrete

generations.

At each

_generation

five males and 20 females were chosen and

randomly

mated to

_produce

four

_offspring

_(two

males and two

_females)

_per

female. Selection of

_parents

was at random unless otherwise noted in the results. All individuals had one

_phenotypic

record.

2.3.2. Genetic model

The total

_genetic

effects were accounted for

by

20

independent

and diallelic

(10)

frequency

was 0.5. The

_genotype

at each locus of the base individuals was

sampled

from the

expected

genotype

frequency

of a locus in

Hardy-Weinberg

equilibrium.

The

_genotype

of individuals from further

_generations

were

_sampled

assuming

Mendelian inheritance. The total

genetic

effects of an individual are

the sum of all the

_genotype

effects over all loci.

2.3.3. Parameters used

For all the cases the environmental variance was assumed to be

_80,

the

additive

_genetic

variance 20. In order to account for the total

_{genetic variance,}

the effect of each locus was simulated in two _ways:

_i)

_assuming

that all the 20 loci have the same effect

(i.e.

a =

J

2);

_or

ii)

that each effect was

_sampled

from an

exponential

distribution with scale

_parameter

equal

to 1

(which

is

expected

to

_yield

the correct total

_genetic

_variance).

2.4. Situations

compared

Data sets simulated

_using

the

population

structure

_explained

above were

used to

_study

the behaviour of the finite locus model

_(FIN)

in

_genetic

eval-uations. Each data set

_(replicate)

was

analysed

with several FIN

approaches

varying

in the

_assumptions

about the distribution _{of gene}effects and the

num-ber of loci taken in the model of

_analysis.

These variations in

_assumptions

were the

following.

i)

The distribution of the gene effects: effects of loci

uniformly

and

inde-pendently

(FIN-UNI),

uniformly

but constant

_(i.e.

_equal

effects;

FIN-CON),

exponentially

(FIN-EXP)

or

_normally

(FIN-NOR)

distributed.

ii)

The number of loci:

5, 10,

20 or 30.

As

_previously

_stated,

the allele

_frequencies

in the base

population

for each locus were not estimated in the

_analysis.

Instead

_they

were fixed at 0.5.

The case when all loci have the same effects

(FIN-CON)

is similar to the finite locus model

_proposed

_by

Fernando et al.

_!6!.

The same data sets were also

analysed

using

the standard mixed model

approach

(MM)

where an infinitesimal

_genetic

model is assumed. In order

to make the results

comparable

with those obtained with the FIN

_analyses,

the MM was also

performed

_using

a Gibbs

sampling approach

to obtain

the

_marginal

_{posterior density}

of each variance

_component.

From a

Bayesian

perspective,

the variance estimates from MM

using

a restricted maximum

likelihood

_(REML)

_approach

are the mode of their

_{joint posterior distribution,}

which are not

_expected

to coincide with the mode of their

_marginal

distributions

[11].

The

_{implementation}

of the mixed model

using

Gibbs

_sampling

and

its differences from REML

_approaches

have been much studied

_(e.g.

Wang

et al.

_!30!).

2.4.1. Criteria of

_comparison

The criteria of

_comparison

were the estimates of the variance

_components

(11)

3. RESULTS

3.1. Gibbs

_{sampling implementation}

The results

_presented

below are the summaries of 50

_replicates.

The variance

estimates of each evaluation within a

_replicate

is the mean of a Markov chain of

1 000 realisations

sampled

every 50

cycles

after a

_burning

_period

of 5 000

_cycles

(i.e.

total

_length

of the chain = 55 000

_cycles).

This

_{sampling protocol}

ensured that the autocorrelation between consecutive realisations was less than 0.1 for

all the

parameters studied

here.

3.2. True model: the same _geneeffects across all loci

(random

selection)

3.2.1. FIN- UNI

The estimates of the variance

_components

_assuming

that all loci have different effects and are

_uniformly

distributed are shown in table 7. These results were

highly

_dependent

on the number of loci assumed in the model of

_analysis.

The estimate of the additive variance increased when more loci were assumed

in the model of

_analysis.

This trend was

_consistently

observed across all the

replicates.

The additive variance estimate closest to the true simulated value

was

_produced

when

_only

five loci were assumed in the model of

_analysis,

which

is

_{substantially}

less than the true number used to simulate the data.

The increase in the estimated additive variance when

assuming

more loci in the model of

_analysis

was also

accompanied by

a decrease in the estimated

en-vironmental variance.

_However,

this reduction did not

_completely

compensate

for the extra estimated additive

_variance,

thus

_resulting

in an overestimate in

the total

_phenotypic

variance. The estimated total variance increased from 105 when

_assuming

five loci to 129 when the

_analysis

was carried out

_assuming

30 loci

_(the

simulated value was

_100).

The excess of additive variance which

_appeared

when

_increasing

the number of loci had

_{repercussions}

on the estimated

_breeding

values. As

_expected,

the

(12)

individuals with extreme EBV became even more extreme when more loci were

assumed in the model of

_{analysis. Additionally,}

the

_prediction

error variance

associated with the EBV also tended to increase with the number of loci: The

mean

prediction

error variance of the EBV when

_using

five loci was 16

compared

with 25 when the EBV were obtained

_assuming

30 loci

(for

the MM the mean

prediction

error variance was

_12.5).

_{Nevertheless,}

it is

_important

to note that

although

the EBV were _verysensitive to the number of

loci,

the correlation

between the different estimates was

_always

_greater

that 0.9

(table II).

_Thus,

the

_ranking

of individuals was little affected.

The variance estimates when

assuming

all loci had

the

same effects is summarised in table III. Under this

_assumption

the estimates of the additive variance were the same

_regardless

of the number of loci assumed in the model

of

_analysis.

The results from FIN-CON were not

_{significantly}

different from the simulated values or from those obtained with the MM. The EBV and their

(13)

3.2.3. FIN-EXP

Table IV shows the _summaryof the variance

_components

estimated

_assuming

the gene effects

being exponentially

distributed.

Increasing

the number of loci used in the model of

_{analysis yielded}

a

_slight

increase in the estimated additive variance.

_However,

this trend was _verysmall

compared

with the

results from FIN-UNI. In contrast to the case of

_FIN-UNI,

the estimate of

the total

_phenotypic

variance remained constant. The correlation among the EBV obtained with the FIN-EXP

_analyses

with different numbers of loci was

always higher

that 0.95

_(results

not

_shown).

3.2.4. FIN NOR

The results when the _geneeffects were assumed to be

_normally

distributed

appeared

not to be affected

by

the number of loci used in the model of

_analysis

(table V).

The EBV were also the same

_regardless

of the number of loci used

in the model of

_analysis.

The results obtained with FIN-NOR were similar to

(14)

3.3. _{True model: gene}effects simulated as

_{exponentially}

distributed

The main _{purpose of}

_using

simulated data

_assuming

the _geneeffects to be

exponentially

distributed was to test whether the observed behaviour of FIN-EXP is the same even when it

_corresponds

to the true model.

3.3.1.

_Population

under random selection

Table VI summarises the results when the

population

was under random

selection. The estimated additive variance showed the same trend to increase when more loci were assumed in the model of

_analysis.

The best estimates were

obtained when

_using

20

_loci,

which

_corresponds

to the true

_genetic

model used

to simulate the data. For this case, the variance

component

estimates were the same as the values used to simulate the data. The correlations between EBV obtained when

_using

different numbers of loci have a correlation

greater

than 0.99

_(results

not

_shown).

3.3.2.

_Population

under truncation selection

Table VII shows the results of FIN-EXP when the

_population

was

(15)

surprisingly,

the

_magnitude

was smaller than that observed with random

selection. The correlation between EBV calculated

_assuming

different numbers of loci was

_always

_greater

than 0.97

_(results

not

_shown).

4. DISCUSSION

In this paper a

_genetic

model

_assuming

a finite number of loci

affecting

a

quantitative

trait was

_implemented

_using

Gibbs

_sampling.

The

behaviour

of the

results when

_changing

the number of loci and the distribution of the gene effects assumed on the model of

analysis

were studied

_using

stochastic simulation.

The use of

_genetic

models

_assuming

a finite number of loci has so far been

hardly

studied. Chevalet

_[3]

_proposed

a

_genetic

model which allows the

estima-tion of the effective number of loci

_affecting

a

_{quantitative trait,}

but this model is still based _uponthe same Gaussian

_assumptions

made with the infinitesimal

model. A model which does not

_depend

on normal

_theory

was

_{proposed by}

Fernando et al.

_[6]

and is known as the

hypergeometric

model

[20].

In this

model the calculation of the multilocus

_genotype

_probability

is

_{simplified by}

not

_treating

the

genotype

at each locus

_{independently,}

but

_{by identifying}

their combined

_genotypes

as the total number of favourable alleles

_present

across all

loci. This

_{simplification,}

_however,

forces the

_assumption

that all loci must have the same

effect,

and the model is not

strictly

consistent with Mendelian trans-mission

_{[6, 20!.}

_However,

the _{main purpose}of the

_{hypergeometric}

model has been to mimic the results of the infinitesimal model but with a lower

complex-ity

when

_calculating

the

likelihood,

thereby greatly reducing

the

_{computational}

difficulties of

_segregation

and

_{linkage analyses}

of

_single

_major

_genes

_(28!.

More

recently,

Goddard

_[14]

and

_Pong-Wong

et al.

_[24]

studied the

_feasibility

of

es-timating

dominance

_using

a finite locus model

_assuming

that the _geneeffects were

uniformly

distributed.

The results from this

_study

show a remarkable interaction between the

distribution of gene effects and the number of loci assumed in the model of

analysis.

When the _geneeffects were assumed to follow a uniform distribution

(FIN-UNI),

the estimate of the additive variance

_sharply

increased when

_adding

more loci to the model of

_analysis.

A less

marked

trend was also observed

when

_assuming

that the _geneeffects were

_{exponentially}

distributed

(FIN-EXP).

When the model of

_analysis

assumed the allelic effects to be

_normally

distributed

_(FIN-NOR)

or constant over all loci

(FIN-CON),

the results were

the same

_regardless

of the number of loci assumed in the model.

_However,

despite

the

_similarity

in the trend of the additive

_variance,

the results from

FIN-UNI and FIN-EXP are

_{qualitatively}

different.

The

_slight

increase in the

additive variance observed with FIN-EXP was

only

due to differences in the

partition

of the total

_variance,

whereas with FIN-UNI there was also an increase in the total

_phenotypic

variance observed in the

_system.

From this

_point

of

_view,

the behaviour of FIN-EXP is more similar to FIN-NOR than to FIN-UNI.

This difference in the behaviour of FIN-UNI

compared

with FIN-EXP or

FIN-NOR

_is,

_perhaps,

not

_surprising

when

_examining

the statistical

_meaning

of these models. From a

strictly

statistical

_point

of

_view,

it can be seen that

the _geneeffects in FIN-UNI are treated as fixed effects while with FIN-EXP

(16)

the gene effects

using

FIN-UNI is

_expected

to

_yield

different answers to those obtained with FIN-NOR and

_FIN-EXP,

and

_thereby,

the total additive variance which is calculated as a linear combination of the gene effect estimates.

The consequences of

adding

more loci to the model of

_analysis

when

_treating

their effects as fixed _appearsto create an

_{overparameterised}

model. The extra

gene effects

(fixed effects)

that need to be estimated in the model result in

some of them

_being

confounded and

_explaining

_spurious

effects.

_Thus,

the more

loci fitted in the

_model,

the more

_spurious

effects are

_{estimated, increasing}

both the

_genetic

and the total variance.

_Hence,

the reduction in the number of

parameters

to be estimated

_by

_assuming

that all the loci have the same effects

(only

one effect is estimated

compared

with L effects estimated when

assuming

all loci

_having

different

_effects)

_mayavoid this

_{overparameterisation,}

which would

_{explain why}

the results of FIN-CON are insensitive to the number of loci. On the other

hand,

the

_possibility

of

_spurious

effects

_arising

when

_increasing

the number of loci in FIN-EXP or FIN-NOR is better controlled since the estimates

of the gene effects

(random

effects)

are

regressed

towards _zero,

restricting

their

dispersion

accordingly

to their scale

_parameter

_(A

_a

_).

The difference between

treating

a variable as fixed or as random is well known in animal

breeding.

For

example,

the variance of the estimated sire effects obtained after

_treating

them

as fixed would be

_greater

than the estimated inter-sire variance when

_assuming

them to be random normal variables.

The trend of

_0’

_;

_when

_using

FIN-EXP was also observed when the true model also assumed the gene effects to be

_{exponentially}

distributed.

_{Surprisingly,}

this trend was smaller when the

_population

was

_undergoing

selection. This

consistency

of results across simulated data sets

_assuming

different

_genetic

models

suggests

that the overall trend observed with FIN-EXP is more

likely

to be a true characteristic of this model of

_analysis

rather than

_being

a Monte

Carlo error due to a small number of

_replicates.

Another

_interesting

result is the

fact that the mean estimate of the

_genetic

variance when

_assuming

ten loci was

marginally

closer to the true simulated value than when 20 loci were assumed

(table V7).

Intuitively,

the latter would be

expected

to

_yield

better answers as

it

_corresponds

_exactly

to the model used to simulate the data.

_However,

the

difference in results is too small to

_firmly

conclude which is the better model of

_analysis,

so their

_rating

should not be based

_only

on the _averageestimate

(across

the

_replicates)

relative to the true simulated value. The estimation of the

_Bayes

factor to assess the

_goodness

of fit of these models should also be

considered before

_concluding

which one better describes the data.

The difference in the results between FIN-EXP and FIN-NOR

prompts

the need for further studies to evaluate the behaviour of finite locus models

assum-ing

other distributions of gene effects. The

assumption

of a normal distribution appears to

_yield

_{robust/consistent}

_results,

but

_ideally

the distribution to be assumed should be one

closely reflecting

the

reality

of the trait in

question.

Although

the characterisation of the distribution of gene effects for

economi-cally

important

traits in farm animals is still

_incomplete,

some

knowledge

in

this area _{may be}obtained from studies of mutation effects in

_Drosophila.

For

instance,

Keightley

[19]

suggested

that the _gammadistribution _maybe suitable for

_modelling

the _geneeffects since it

_depends

on few

_parameters

(note

that

(17)

be

_{parameterised}

to

_{display leptokurtosis.}

Other alternatives have also been

proposed by

Caballero and

_Keightley

_!2!.

An alternative _wayto avoid the

_problem

of

_uncertainty

on the true distri-bution of _geneeffects may be to select the distribution

_during

the

_analysis.

Using

the Markov chain

_framework,

Green

_[15]

_proposed

a

_technique,

called the reversible

_jump,

which allows for model choice

_during

the

_analysis.

For this

particular situation,

a set of distributions may be

predefined

and a Markov

process built

allowing

the chain to move _amongthese distributions

_according

to their

_{probabilities. Using}

the same

_principle,

one _mayalso be able to

_sample

the number of loci

_!27!.

_Obviously,

the

_complexity

of a Markov chain

implemen-tation

_allowing

for model choice in several of the

_parameters

would be

_higher,

and

_greater

care should be taken when

_assessing

the _convergenceof the chain as well as when

_interpreting

the results. Another alternative means to conclude which set of

_parameters

_(e.g.

distribution of gene

effects,

number of

loci)

fit best the data would be the estimation of

_Bayes

factors

_!9!.

One of the consequences of

assuming

other distributions of _gene

_effects,

such

as gamma, is that the

resulting

full conditional distribution may be of unknown form with no standard

sampling

routine available. The full conditional

_density

of the gene effects

resulting

from the three distributions examined in this

_study

are

_proportional

to a truncated

_normal,

for which standard

_sampling

routines

are available. The use of

_techniques

such as

_{adaptive rejection}

_sampling

_{[12, 13]}

and

_{’slicing-the-density’}

_[23]

allow

_sampling

from non-standard

_{distributions,}

but the

_{computational}

cost is also

_expected

to increase.

Because of the

_{computational}

demand of Gibbs

_{sampling implementations,}

the

_study

of the

_properties

of finite locus models should also be

_complemented

with the

_proposal

of efficient

_algorithms

to

_improve

the

_mixing

and convergence of the Markov chain. Several

_approaches

to

_improving

the

efficiency

of

_sampling

the

_genotype

structure in

_{complex pedigree}

are now available

(e.g.

[10,

_21,

22]),

and their use _{may prove}beneficial in

_reducing

the

_{computational}

demand of a

finite locus model.

In this

_study,

allele

_frequencies

were not estimated but were assumed to be 0.5.

_However,

the

_frequencies

are

only

fixed in the base

_population,

so the model

is able to account for

_changes

in the

_genetic

level due to drift or directional selection.

_Although

the estimation of the allele

_frequencies

does not add much extra

_complexity

to the

_{model, practical problems}

in the Gibbs

_{implementation}

were encountered

(unpublished).

_Allowing

variable allele

frequencies

can result in slow

_mixing

with

_{problems arising}

from loci

_{becoming temporally}

fixed. Since inferences from MCMC are valid

_only

when _convergencehas been

_obtained,

poor

mixing requires

the

length

of the chain to be

_considerably

_increased,

with _{consequences in}

_computing

time. A

_{preliminary analysis}

of a non-selected

population

where the allele

_frequencies

were estimated but restricted to between 0.2 and 0.8

_(to

avoid the Gibbs

_sampling

_problem

due to

_fixation)

_yielded

_analysis

was

_{performed assuming}

the

_frequencies

to be 0.5.

_Obviously,

this restriction on the _gene

_frequency

_{estimation may}need to

be relaxed when

_{considering populations}

with

_{deeper pedigree}

structure and

undergoing

selection.

A

_positive

characteristic of the finite locus model

_proposed

here is its

_ability

to account for the

_{linkage disequilibrium}

between loci built _up

_during

selection

(18)

generation

which results in a reduction of the total observed

genetic

variance

(i.e.

the

_genetic

variance in the

_offspring

_generation

estimated

_using

their total

genetic

effects across all loci is smaller than the sum of the variances

_explained

by

each individual

_locus).

For the case of selection

_presented

in this

study,

the loss in variance due to

_{disequilibrium}

for the first three

_generations

from selected

parents

was

_6.3,

7.2 and 9.5

%,

respectively.

Conversely,

it is also desirable that the

genotypes

of individuals from the base

_{population being sampled}

are in

linkage equilibrium

to avoid

_potential

bias in the estimation of

₀

_{’; (as}

the formula used to estimate it assumes no

correlation between

_loci).

_Considering

the

_impact

on the results if

linkage

disequilibrium

in the base

_population

_being

built _updue to

_sampling,

_Q

_a

_was

also estimated

_using

the total

_breeding

value of each individual reconstructed

given

their

_genotypes

and the gene effects across all loci

(thus

_accounting

for

any correlation

appearing

due to

sampling).

With the

exception

of the cases of

FIN-UNI

_assuming

20 and 30

_loci,

the estimate of

₀

_{’; was}

the same as when not

accounting

for _any

_potential

correlation. For

_instance,

_{or estimated}

from the total

_breeding

values

_using

FIN-EXP

_{assuming 10,}

20 and 30 loci were

21.58,

23.36 and

_25.12,

_{respectively,}

which are _verysimilar to the values

_reported

in table IV. When the

_analysis

was

_performed

_using

FIN-UNI with 20 or 30

loci,

the estimation of

_{0’; from}

the total

_breeding

values

yielded

smaller results than when

_assuming

the loci

_{being independent}

_(i.e.

53.34 and

_67.4,

_{respectively,}

compared

with results from table

_7),

but these differences

explain

less than 10

%

of the total overestimation of

_0’

_;.

_Thus,

the conclusion that the estimates of both the

_genetic

and the total variance when

_using

FIN-UNI are

_largely

dependent

on the assumed number of loci still remains valid

_(i.e.

FIN-UNI is still a

_questionable

model to be

_used).

Here we considered

_only

the case of a

_{complete-additive}

_genetic

model where

the results from the mixed model are

_expected

to be

_robust,

and a finite locus

model would add little

_improvement

to

_practical

_genetic

evaluations.

_However,

there are other situations where

_departing

from the infinitesimal model _may

prove to be beneficial. For

_instance,

a finite locus model _may

_provide

a more

natural

_approach

to

_extending

marker-assisted selection

_(MAS)

_accounting

for

multiple

quantitative

trait loci

_(QTL).

Genetic maps in most farm animals are

becoming

very dense so that the use of MAS to

_exploit

information of

_only

one

_QTL

at a time seems to be a waste of resources and time.

_Approaches

to

studying linkage

between a

QTL

and a

_genetic

marker

_using

Gibbs

sampling

have been

_suggested

in the literature

_(e.g.

_{[16, 18!),}

and their

_{implementation}

in a finite locus model seems to be

_straight

forward. From the mixed model

approach, multiple QTL

may also be accounted for

by extending

the MAS

method

_using

a BLUP framework

[5].

This

_{method, however,}

_presents

the

problem

that it does not account for

_changes

in gene

frequency

due to selection.

Additionally,

since each

_QTL

is modelled with two normal

_variables,

the method becomes

computationally complex

as the rank of the

_resulting

linear model increases

_by

twice the number of individuals per each

QTL

included in the model.

Another

potential

use of a finite locus model is the estimation of dominance.

Although

the mixed model has been used to estimate dominance

_deviance,

the

assumptions

justifying

this

_approach

are not well understood

_[25].

_Despite

the

(19)

the results of a mixed model

_analysis

_maybe difficult to

_interpret

_(for

an

example,

see

!26!).

Preliminary

results have shown that inclusion of dominance

in a finite locus model adds _verylittle extra

_complexity

to the model whilst

maintaining

a

_relationship

between both the dominance variance and the

inbreeding

depression

(24!.

Finally,

another benefit of finite locus models is that

_they

offer a more

’biologically

appropriate’ genetic

model which would

provide

a

_greater

under-standing

of

_quantitative

traits. For

_example,

it would be

_possible

to examine

characteristics of these traits

_(e.g.

the distribution of the _gene

_effects,

effective number of

_loci).

_Moreover,

unlike the infinitesimal

model,

some of the

newly

gained knowledge

about the architecture of these traits could

_easily

be included in a finite locus model to

_improve

the

prediction

in

_genetic

evaluations.

ACKNOWLEDGEMENTS

The authors

_acknowledge

financial support from the

_Ministry

of

_Agriculture,

Fisheries and

Food,

the

Biotechnology

and

Biological

Research Council and the

Pig

Improvement

Company.

We thank S.C.

Bishop

and _{two anonymous}readers for useful

comments on the

_manuscript.

REFERENCES

!1!

Bulmer

_M.G.,

The effect of selection on

_{genetic variability,}

Am. Nat. 105

₍₁₉₇₁₎

201-211.

[2]

Caballero

_{A., Keightley P.D.,}

A

_pleiotropic

nonadditive model of variation in

quantitative traits, Genetics 138

₍₁₉₉₄₎

883-900.

[3]

Chevalet

C.,

An

approximate theory

of selection assuming a finite number of quantitative trait

_loci,

Genet. Sel. Evol. 26

₍₁₉₉₄₎

379-400.

[4]

Falconer

D.S.,

Introduction to

Quantitative Genetics,

3th

ed., Longman

Sci-entific and

Technical, Essex,

1989.

[5]

Fernando

_R.L.,

Grossman M., Marker-assisted selection

_using

best linear

un-biased

prediction,

₍₁₉₈₉₎

467-477.

[6]

Fernando R.L. Stricker

_C.,

Elston

_R.C.,

The finite

_polygenic

mixed model: an

alternative formulation for the mixed model of

_inheritance,

Theor.

_Appl.

Genet. 88

(1994)

573-580.

[7]

Firat

_{M.Z., Bayesian}

methods in selection of farm animals for

_breeding,

PhD

thesis, University

of

_Edinburgh,

1995.

[8]

Fournet-Hanocq F.,

Elsen M., On the relevance of three

genetic

models for

the

_description

of

_genetic

variance in small

_{population undergoing selection,}

₍₁₉₉₈₎

59-70.

[9]

Gelman

A.,

Carlin

J.B.,

Stern

H.S.,

Rubin

D.B., Bayesian

Data

_Analysis,

Chapman

and Hall.

_{London, UK,}

1995.

[10]

Geyer C.J.,

Thomson

_{E.A., Annealing}

Markov chain Monte Carlo with

ap-plication

to ancestral

_inference,

J. Am. Stat. Assoc. 90

₍₁₉₉₅₎

909-920.

[11]

Gianola

_{D., Foulley J.L.,}

Variance estimation from

_integrated

likelihood

(VEIL),

₍₁₉₉₀₎

403-417.

[12]

Gilk

_W.R.,

Wild P.,

Adaptive rejection sampling

in Gibbs

_{sampling, Appl.}

Stat. 41

₍₁₉₉₂₎

337- 348.

[13]

Gilk W.R., Best

_N.G.,

Tan

_{K.K.C., Adaptive rejection Metropolis sampling,}

(20)

[14]

Goddard

_M.E.,

Gene based model for

_genetic

evaluation - An alternative to

BLUP?,

Proc. 6th World

Cong.

Genet.

_Appl.

Livest. Prod. 26

₍₁₉₉₈₎

33-36.

[15]

Green

_P.J.,

Reversible

_jumping

Markov chain Monte Carlo

_computation

and

Bayesian

model

determination,

Biometrika 82

₍₁₉₉₅₎

711-732.

[16]

Guo

S.W., Thompson E.A.,

A Monte Carlo method for combined

_segregation

and

_{linkage analysis,}

Am. J. Hum. Genet. 51

₍₁₉₉₂₎

111-1126.

[17]

Hill

_{W.G., Quantitative genetic theory:}

Chairman’s

_{introduction,}

Proc. 5th World

_Cong.

Genet.

_Appl.

Livest. Prod. 19

₍₁₉₉₄₎

125-126.

[18]

Janss

_{L.L.G., Thompson R.,}

Van Arendonk

J.A.M., Application

of Gibbs

sampling

for inference in a mixed

_{major gene-polygenic}

inheritance model in animal

populations,

Theor.

Appl.

Genet. 91

₍₁₉₉₅₎

1137-1147.

[19]

Keightley P.D.,

The distribution of mutation effects on

_viability

in

_Drosophila

meLa!,ogaster,

Genetics 138

₍₁₉₉₄₎

1315-1322.

[20]

Lange K.,

An

_approximate

model of

_{polygenic inheritance,}

Genetics 147

(1997)

1423-1430.

[21]

Lin

_{S., Thompson E.A., Wijsman E.,}

An

_algorithm

for Monte Carlo estima-tion of genotype probabilities on

_{complex pedigrees,}

Ann. Hum. Genet. 58

₍₁₉₉₄₎

343-357.

[22]

Lund

M.,

Jensen

C.S.,

Multivariate

updating

of genotypes in a Gibbs

sam-pling algorithm

in the mixed inheritance

model,

Proc. 6th World

Cong.

Genet.

Appl.

Livest. Prod. 25

₍₁₉₉₈₎

521- 524.

[23]

Neal

_R.M.,

Markov chain Monte Carlo methods based on

_{’slicing’}

the

density

function,

Technical report 9722, Dept. Stat, Univ., Toronto, 1997.

[24]

Pong-Wong R., Shaw F., Woolliams J.A.,

Estimation of dominance variation

using

a finite-locus

_model,

Proc. 6th World

_Cong.

Genet.

_Appl.

Livest. Prod. 26

(1998)

41-44.

[25]

Robertson

_A.,

Hill

_{W.G., Population}

and

_{quantitative genetics}

of _many linked loci in finite

_populations,

Proc. R. Soc. Lond. B 219

₍₁₉₈₃₎

253-264.

[26]

Shaw F., Woolliams

J.A.,

Variance _{component analysis}of skin and

weight

data for

_{sheep subjected}

to

rapid inbreeding,

₍₁₉₉₉₎

43-59.

[27]

Stephens D.A.,

Fish

_{R.D., Bayesian}

_analysis

of

_quantitative

trait locus data

using

reversible

_jump

Markov chain Monte

_Carlo,

Biometrics 54

₍₁₉₉₈₎

1334-1347.

[28]

Stricker

_{C., Fernando R.L., Elston R.C., Linkage analysis}

with an alternative

formulation of the mixed model of inheritance: the finite

_polygenic

mixed

_model,

Genetics 141

₍₁₉₉₅₎

1651- 1656.

[29]

Wang C.S., Rutledge J.J.,

Gianola

_{D., Marginal}

inferences about variance

components in mixed linear model

_using

Gibbs

_Sampling,

₍₁₉₉₃₎

41-62.

Behaviour of the additive finite locus model

HAL Id: hal-00894262

https://hal.archives-ouvertes.fr/hal-00894262

Submitted on 1 Jan 1999

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Behaviour of the additive finite locus model

Ricardo Pong-Wong, Chris S. Haley, John A. Woolliams

To cite this version:

Original

article

Behaviour of the

additive

finite

locus model

Ricardo

Pong-Wong*

Chris S.

Haley,

John A.

Woolliams

(Edinburgh),

Roslin,

9PS, Scotland,

(Received

September 1998; accepted

April

1999)

breeding

implemented using

sampling.

i)

effects,

ii)

having equal effects,

iii)

exponential,

iv)

study

analysis.

assumption

uniformly

according

analysis, causing

breeding

exponentially distributed,

dependent

analysis,

assuming

they

normally distributed,

regardless

analysis.

significantly

using

approach

priori,

assuming

equal

normally

&copy;

Inra/Elsevier,

model / gene

/

utilisé,

l’échantillonnage

Gibbs,

génétiques

génétiques.

gènes

_Pong-Wong*

_Haley,

_(Edinburgh),

_Roslin,

_{9PS, Scotland,}

_{September 1998; accepted}

_April

₁₉₉₉₎

_breeding

_sampling.

_effects,

_ii)

_iv)

_analysis.

_assumption

_uniformly

_according

_{analysis, causing}

_breeding

_dependent

_assuming

_regardless

_analysis.

_{significantly}

_using

_approach

_priori,

_©

_{Inra/Elsevier,}

_{model / gene}

_/

_{l’échantillonnage}

_Gibbs,

_{génétiques.}

_gènes

_variables,

_ii)

_égaux,

_iii)

_iv)

_stochastique

_reprints

_supposée

_{L’hypothèse}

_augmentait,

_génétique

_dépendante

_{degré. Quand}

_supposé

_gènes

_quand

_distribués,

_{même, quel}

_indiquent

_{supposé d’après}

_priori,

_plus

_qui

_égaux

©

_/

_/

_/

_{traditionally}

_using

_by

_{infinitesimally}

_handling

_genetic

_normally

_techniques

_applied.

_{approximation}

_enough

_practical

_provided

_by

_loci,

_generations

_frequencies