Genetic evaluation for a quantitative trait controlled by polygenes and a major locus with genotypes not or only partly known

(1)

HAL Id: hal-00894013

https://hal.archives-ouvertes.fr/hal-00894013

Submitted on 1 Jan 1993

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Genetic evaluation for a quantitative trait controlled by

polygenes and a major locus with genotypes not or only

partly known

A Hofer, Bw Kennedy

To cite this version:

(2)

Original

article

Genetic

evaluation

for

a

quantitative

trait controlled

by polygenes

and

a

major

locus

with

_genotypes

not

or

only partly

known

A Hofer

BW

_Kennedy

₂

1 Department

of

Animal

_Sciences,

Federal Institute

_{of Technology}

_(ETH),

CH-8092

_{Zvrich, Switzerland;}

2 Centre

for

Genetic

_{Improvment of Livestock, University of Guelph,}

Guelph, Ontario,

N1 G 2W1, Canada

(Received

4 March 1992;

accepted

5

_August

₁₉₉₃₎

Summary - For a _quantitativetrait controlled

by polygenes

and a

_major

locus with 2

alleles,

equations for the maximum likelihood estimation of

major

locus genotype effects and

polygenic breeding values,

as well as _majorallele

frequency

and _majorlocus _genotype

probabilities,

were derived. Because the

resulting

expressions are

computationally

un-tractable for

_{practical application, possible approximations}

were

_compared

with 2 other

procedures suggested

in the literature

_using

stochastic _computersimulation.

_Although

the

frequency

of the favourable allele was

seriously

underestimated when _majorlocus

geno-types were

_{entirely unknown,}

the

proposed

method compares

favourably

with the 2 other

procedures

under certain conditions. None of the

_{procedures compared}

can

_{satisfactorily}

separate

major genotypic

effects from

_polygenic

effects.

_However,

the

_proposed

method has some

potential

for

_improvement.

major locus

_/

_geneticevaluation / segregation analysis

Résumé - _{Évaluation génétique}pour un caractère quantitatif contrôlé par des

polygènes et un locus majeur à _génotypesinconnus ou seulement partiellement

connus. Pour un caractère contrôlé par des

polygènes

et un locus

_majeur

à 2

allèles,

les

équations

pour l’estimation du maximum de vraisemblance des

effects génotypiques

au locus

majeur

et des valeurs

génétiques polygéniques

ont été

dérivées,

permettant aussi d’estimer la

_fréquence

de l’allèle majeur et les

_{probabilités}

des

_génotypes

à ce locus. Les _expressions

obtenues étant incalculables en

_pratique,

des

_{approximations possibles}

ont été

_comparées

par simulation

stochastique

à 2 autres

_{procédures proposées}

dans la littérature. Bien _que la

_fréquence

de l’allèle

_favorable

soit sérieusement sous-estimée

_lorsque

les

_génotypes

au

locus

_majeur

sont entièrement

_inconnus,

la méthode

_proposée

a

_quelques

_avantagessur

(3)

satisfaisante

pour

séparer

l’efJet

des

génotypes majeurs

des

_{effets polygéniques. Cependant,}

la méthode

_proposée

est

_susceptible

d’être améliorée.

locus _majeur

_/

évaluation _génétique

_/

_analysede _{ségrégation}

INTRODUCTION

Statistical methods based on the infinitesimal

_model,

the

_assumption

of _many

un-linked loci all with small effects

_controlling

_quantitative

traits,

have been

success-fully applied

in

animal

_breeding.

An

_increasing

number of

studies, however,

have

reported

single

loci

_{having large}

effects on

_quantitative

traits. Such loci are referred

to as

_major

loci.

_Examples

are the

_prolactin

_(Cowan

et

_al,

₁₉₉₀₎

and the weaver

loci

_(Hoeschele

and

_Meinert,

₁₉₉₀₎

in

_dairy

_cattle,

and the halothane

_sensitivity

locus

_(Eikelenboom

et

_al,

₁₉₈₀₎

and a locus

_acting

on

&dquo;Napole&dquo;

_yield

(Le

_Roy

et

_al,

1990),

a

pork quality

_trait,

in

_pigs.

Only

in the case of the halothane locus has the

responsible

gene been identified and

procedures

for its

_genotyping

become available

(l!TacLennan

and

Phillips,

1992).

There is no

_difficulty

with

_genetic

evaluation for traits controlled

_by

a

_major

locus and

_polygenes

when

_major

locus

_genotypes

are known. A fixed

_major

locus effect has to be added to the linear model and

_major

locus effects and

_polygenic

breeding

values can be estimated

_by

the usual mixed model

_equations

_(Kennedy

et

al,

1992).

When

genotypes

are

unknown, however,

_satisfactory

statistical methods

are still

lacking.

Selection decisions could

_possibly

be based on animal models that

include the

_major

locus effects in the

_polygenic

part of the model. In cases where

the

allele

has some

_positive

effect on 1 trait but

_negative

effects on

_others,

it would

be desirable to have

_separate

estimates of the

_major

locus and

_polygenic

effects available. The 2 estimates would then be combined

_according

to the

_breeding

objective.

Because

_genotyping

of all the animals of a

_population

is

_likely

to be too

_expensive

if at all

possible,

statistical methods are

_required

that estimate

major

locus

_genotype

effects as well as

_polygenic

effects and

_major

locus

_genotype

probabilities

for each candidate.

Such a method was first

_proposed

in human

_genetics

_by

Elston and Stewart

(1971).

The unknown parameters of the model are estimated

_by

_maximizing

the

likelihood of the data. For models with both

_major

locus and

_polygenic

effects exact calculations are _very

_expensive

and become unfeasible for

_pedigrees

with more than

! 15 individuals. Several studies

compared

the

power of different

approximations

of the likelihood function to detect a

_major

locus in half-sib

_family

structures in animal

_breeding

data

_(Le

_Roy

et

_{al, 1989;}

Elsen and Le

_Roy,

_1989;

Knott et

al,

1992a).

Hoeschele

₍₁₉₈₈₎

_developed

an iterative

procedure

to estimate

_major

locus

_genotype

_{probabilities}

and effects as well as

_{polygenic breeding}

values. The

equations produced

for the estimation of

_genotype

_{probabilities}

were derived for

simple population

structures and were based on an

_{approximation}

of the likelihood

function.

_Kinghorn

et al

₍₁₉₉₃₎

used the iterative

_algorithm

of van Arendonk

(4)

regression

on

_genotype

probabilities.

A method was

proposed

to correct for the bias inherent in such

_analyses.

The

_objectives

of this

_study

were:

i)

to derive exact maximum likelihood equa-tions to estimate

_major

locus

_genotype

_{probabilities}

and effects for a

_quantitative

trait with mixed

_major

locus and

_polygenic

inheritance without _any

restrictions

on

population

structure;

ii)

to examine

_possible

_{approximations;}

and

_iii)

to _compare these

_{approximations}

with the methods of Hoeschele

₍₁₉₈₈₎

and

_Kinghorn

et al

(1993)

by

stochastic

_computer

simulation.

METHODS

Model

Consider a

_quantitative

trait which is controlled

by

1 autosomal

_major

locus with

2

_alleles,

A and a, and many other unlinked loci with alleles of small effects. Mendelian

_segregation

is assumed for all alleles at all loci. The allele with the

major effect, A,

has a

frequency

of _{p in}the base

_population,

which is assumed to be

unselected,

not inbred and in

_{Hardy-Weinberg}

and

_{gametic equilibria.}

In the base

population

the 3

possible

genotypes

at the

_major

locus

_(AA,

Aa and

_aa),

which will be denoted as

_1,

2 and 3

throughout

this _paper,are therefore

expected

to occur in

_frequencies

of p

2 ,

2p(1-p)

and

(1-p)

2 ,

_{respectively.}

Because

_genotyping

of animals

_might

be

_impossible

or too

_expensive,

we assume for the moment that the

_genotypes

at the

_major

locus are not known. With 1 observation per animal the

_following

mixed linear model can be formulated:

where y = observation vector

b = vector of

_non-genetic

fixed effects

g = vector of fixed

_major

locus

_genotype

effects

_[g

₁

₉₂

_g3!!

a = vector of random

_polygenic

_breeding

values

e = _vectorof random _errors

X,Z =

known incidence matrices

T = unknown incidence matrix

_indicating

true

_major

locus

_genotypes

of all the animals in the

population

The

expectation

and variance of the random variables are assumed to be:

(5)

action of the

_polygenes

is assumed but dominance is allowed for at the

_major

locus. In order to

_keep

the model

_simple,

it is further assumed that the variance

components

Q

a

and Q

e

are

known. This

_assumption

_implies

that the

_genetic

variance caused

_{by polygenes}

is known but not the

_genetic

variation caused

_by

the

_segregating

_{major allele,}

which is determined

_by

the

_major

_genotype

effects and

_frequencies.

This critical

_assumption

has to be

_kept

in mind when

_discussing

tlte simulation results.

Likelihood function

The likelihood for mixed model

_[1]

was first discussed

by

Elston and Stewart

(1971).

The likelihood can be written as:

is a normal

density

and

Pr(Tlp)

is the

probability

of T

given

the allele

frequency

p and the

pedigree

information. Because variance

components

are assumed to be

known,

cl =

(27r)&dquo;°’!&dquo; -

!V !

.ol e 21-1.1,

with _noas the number of

observations,

is a constant.

_Following

Elston and Stewart

_{(1971), Pr(Tlp)}

can be

_computed

as a

product

of

_{probabilities:}

,,

where N

is

the total number of animals in the

_population

and

_{Pr(! !s!d)}

is the

probability

of animal

_{i having}

_genotype

indicated

_by

_t

_i

_,

the ith row of

_{T, given}

the

_genotypes

of its

_parents

s and

d,

and is assumed to be known. Elston and Stewart

₍₁₉₇₁₎

_give

_Pr(ti!t9,td)

for autosomal and sex-linked loci. When the

parents

are unknown

Pr(tz!ts,td)

is

replaced

by

the

frequency

of the

genotype t

i

in the

base

population.

Known

_major

locus

_genotypes

can be accomodated

_{by setting}

Pr(! !,!)

to zero whenever ti conflicts with the known

_genotype

of animal i. With the base

_population

_(animals

with unknown

_parents)

in

_{Hardy-Weinberg}

equilibrium,

Pr(Tlp)

can be written as:

where nl, n2 and n3 are the number of base animals of

_genotype

AA,

Aa and aa,

respectively,

and nb = _n_l ₊_n₂ ₊_n₃ is the total number of base animals.

With 3

_possible

_genotypes

the sum in

_[2]

is over

3 N

elements. For 20 animals the sum is

_already

over 3.5 x

10

9 _possible

incidence matrices T. Whenever T conflicts with the

_pedigree

information

_Pr(Tlp)

is zero.

Therefore,

depending

on the

_pedigree

structure,

a

large

number of the elements to sum are _zero,but there remains a

(6)

As

_pointed

out

_by

Elston and Stewart

₍₁₉₇₁₎

the 3 likelihoods conditional on an

animal’s

_{genotype t}

_i

are

_proportional

to the

_{probabilities}

of animal

_{i having}

1 of the 3

_possible

genotypes.

The conditional likelihoods can be obtained

_{by skipping}

animal i in the summation over all

_possible

incidence matrices T. Maximum likelihood estimation

In order to maximize

_L(y),

we need the first derivatives with

_respect

to

_b,

_{g and}_p:

The

_probability

of T

_given

the data and the

_parameters

of

the

model will be denoted _w_T and can be

_computed

as

where c2 is the

product

of cl and a

_scaling

factor such that

_E

_WT

= 1. Note that

T

without

_scaling

this sum is

_equal

to the likelihood

_L(y).

After

_setting

to zero and

rearranging

we

_get

the 2

_following

_equations:

Solving

for

_p

in the last

_equation

leads to:

This

_equation

can be rewritten

_by

_replacing

_2n

₁

+ n2

by

v!.

T.

_{[2 1 0!’,}

with

_v’

a row vector of

_length

N with ones for base animals and zeros for the other animals.

Because _m_T

_depends

on

_b,

_gand _p,

_equations

[3]

and

_[4]

have to be solved

(7)

true values and

_{Q’ =}

_L

_wTT.

Note that the ikth element of

_Q!

at convergence is

T

an estimate of the

_{probability that}

animal i is of

genotype

k

_given

the data and the estimates for the fixed

effects b,

the

_major

locus effects

_g

and the allele

_{frequency p.}

As mentioned

_above,

the same estimate can be obtained

_{by calculating}

likelihoods

conditional on an animal’s 3

_genotypes.

Using

these

definitions,

_equations

[3]

and

[4]

can be written as:

The solutions for

b

T

,

_i’

and

_p

_r

_convergeto maximum likelihood

_(VIL)

estimates. Local maxima in

_L(y)

could _posea

problem

and will be discussed later. Hoeschele

(1988)

estimated the allele

_frequency

from the

_genotype

_{probabilities}

of all animals with records whereas

_[6]

considers

_only

base

_animals,

which is in

_agreement

with Ott

_(1979).

Because

_genotype

_{probabilities}

of base animals take information from their descendants into account, all information on the allele

frequency

in the base

populations

is

_properly

used

_by

_!6J.

Animal breeders are not

_only

interested in

_{estimating major}

locus effects _gand allele

_frequency

p but also in

predicting polygenic breeding

values a. This is

_usually

done

_by

_regressing

_phenotypic

observations corrected for fixed effects:

where

_Q

is

_Q!

at _convergence.

_Using

V-

1

=

1 [ZAZ

>.-

,

₊

1]!!

= I -

ZMZ’,

where

M =

[Z’Z

₊

A-

I

>.]-

1 (Henderson, 1984),

a _canalso be

computed

_as:

The same solutions for

_b,

g

and a are obtained

by

_iterating

on the

following

equations

together

with

_[6]

instead of

_using

_{(5!, [6]}

and

_!7!:

Note that

_{2.:: wTT’Z’ZT}

=

diag(v§ .

q[)

=

D

r

,

where

vb

is a row vector

T

(8)

difficulty

with this

_approach

is that it is not feasible to

_compute

_Q’

and

_!

_{tUy -}

*

T

T’Z’ZMZ’ZT

for

_{large populations.}

Approximations

Above

_Q

_r

was defined as:

There are 2

_problems

associated with the

computation

of

C!’’. Firstly,

the

summation is over all

_possible

incidence matrices T

_and,

_secondly,

a

quadratic

form

_involving

V-’

has to be

_computed

for each element in this sum. It can be

shown that the

_following

is an

equivalent

expression

not

involving

V-

1 :

where

_£11

=

MZ’(y -

Xbr -

)

r

ZTg

(Le

Roy

et

al,

1989).

Because

aT

depends

on

T,

we would have to

_compute

fill

for every

possible

T,

which is not feasible. In order to

_simplify

the

_{computations,}

we could

replace

*11

by

M which does not

depend

on T. Note that âr =

L wT’

âT.

This

_{approximation}

was also considered

T

by

Hoeschele

_(1988).

The

_approximated

Q!

is then:

Instead of

_using

a

_single

estimate of the

_polygenic

breeding

value for each animal

irrespective

of its

_genotype,

we could use 3 values for each animal

depending

on

its

_genotype

but

_independent

of the

genotypes

of all the other animals. A similar

approximation

was considered

_by

Elsen and Le

Roy

(1989)

and Knott et al

(1992a,

1992b)

for a sire model and was found to be

_superior

to

_[9].

We considered the

following approximation:

(9)

where xi and

t

ik

are the ith rows of X and

ZT,

a

?3

is the

_ijth

element of

A-

1 ,

and _c_ii is the

_diagonal

element of the coefficient matrix in

_[8]

_pertaining

to the ith animal

_equation.

The summation over all

_possible

incidence matrices T in

_[9]

or

[10]

can be avoided

by

using

algorithms developed

to estimate

_genotype

_{probabilities.}

_Here,

the iterative

algorithm

of van Arendonk et al

₍₁₉₈₉₎

was

applied.

This

procedure

will be

briefly

described in the next section.

As with

_Q!

the

_difficulty

with

_expression

_{E w’ -}

T’Z’ZMZ’ZT

is

_two-fold;

the sum is over all

possible

T,

and the

_computation

of each element in that sum is

expensive.

Let _m2!be the

_ijth

element of

_Z’ZMZ’Z,

and

_t

_ik

_(t

_jl

₎

be the elements of T for animal

_i(j)

and

_genotype

_/c(l).

_Now,

the klth element of

_L

_{wTT’Z’ZMZ’ZT}

can be calculated as:

Note that at _convergence

W’ -

tik .

_{<_,;}

is an estimate of the

_probability

that

T

animal i is of

_genotype

k and

_{animal j}

_of

_genotype

_L,

_given

the data. For

_independent

animals this

_quantity

is

_equal

to

_q’

_ik

_qj’l

the

_product

of the

_{corresponding}

elements in

Q’’

and, therefore,

the contributions of

_{L wTT’Z’ZMZ’ZT}

and

Q&dquo; Z’ZMZ’ZQ’

T

to B’’ cancel out. For

_dependent

animals the contributions to the klth element of B’ are:

Now if we

_neglect

the

_dependencies

between animals for the

computation

of

L

w2..

tik .

_t

_jl

we

_get:

T

and

_[8]

becomes identical to the mixed model

_equations

_given

_by

Hoeschele

_(1988).

Another _wayto

_approximate

B’’ is to assume that A = I. We then

_get:

(10)

Estimation of

_genotype

_{probabilities}

Van Arendonk et al

₍₁₉₈₉₎

_developed

an iterative

algorithm

to estimate

_genotype

probabilities

for discrete

_{phenotypes. Kinghorn}

et al

₍₁₉₉₃₎

_applied

this

_algorithm

to continuous traits. The

_comparison

of this

_algorithm

with non-iterative methods revealed some errors in the formulae

_given

in the

_original

_paper

_(LLG

Janss and JAM van

Arendonk, 1991;

C

Stricker,

_1992;

_personal

communications).

We

_applied

a corrected version of this

_algorithm.

For each

_animal,

_genotype

_{probabilities}

from 3 different sources of information are

_{computed using approximation}

_[9]

or

[10].

One round of iteration involves 3

steps.

First

_genotype

_{probabilities}

are

_computed

_using

information from

_parents

and

collateral relatives

_proceeding

from the oldest to the

_youngest

animal. In the second

step, genotype

probabilities

are calculated

_using

information from the _progeny

proceeding

from the

_youngest

to the oldest animal.

_Finally,

_genotype

_{probabilities}

using

information from each individual

_performance

are calculated and the 3 sources

of information combined. The _{iteration process is}

_stopped

when the solutions for

genotype

probabilities

reach a

_given

_{convergence criterion.}

The

_algorithm

works for

_{simpler pedigree}

structures as simulated in this

_study

but does not allow for

loops

in the

_pedigree,

also known as

_cycles

(Lange

and

Elston,

1975).

Loops

in a

pedigree

occur

_through

_{genetic paths}

_{(inbreeding loops),}

_mating

paths,

or a combination of the 2

_{(marriage loops),}

eg, a sire mated to 2

_genetically

related dams. Both

_inbreeding

and

_marriage

_loops

are common in animal

breeding

data. A non-iterative

_algorithm

for

_pedigrees

without

_loops

was

recently proposed,

which should be more efficient than the one used in this

_study

_(Fernando

et

al,

1993).

Method of Hoeschele

₍₁₉₈₈₎

Hoeschele

₍₁₉₈₈₎

used a

_{Bayesian approach}

to derive an iterative

_procedure

to estimate

_genotype

_{probabilities Q,}

allele

_frequency

_pand

_major

locus effects g for

simple

pedigree

structures. The

_genotype

_{probabilities}

were estimated

_by

formulae that were

_developed

for the

_{specific pedigree}

structures considered

_using

approximation

[9].

In contrast to

_[6],

Hoeschele

₍₁₉₈₈₎

estimated p from the

genotype

probabilities

of all animals with records:

where no is the number of animals with records and

vo

is a row vector with ones for

animals with records and zeros otherwise. The

_equations

that estimate the effects

of model

_[1]

are the same as

[8]

_approximated

with

[11].

We

_applied

this method

in the simulation

_study

_using

the iterative

_algorithm

described above but with

approximation

[9]

to estimate

_genotype

_{probabilities}

instead of the formulae

_given

by

Hoeschele.

Method

_{of Kinghorn}

et al

₍₁₉₉₃₎

(11)

the

_{least-squares}

estimates are biased

_(see,

for

_example,

_{Johnston, 1984,}

_p

_428).

Kinghorn

et al

₍₁₉₉₃₎

treated the unknown incidence matrix T as the unknown

true

_independent

variable and the

_genotype

_{probabilities}

_Q

as an estimate for T associated with some errors.

_Using

_Q

instead of T in the model leads to biased estimates of

_g

_*

_.

_Kinghorn

et al

₍₁₉₉₃₎

derived a correction matrix

_W,

such that

g

=

W!!§* .

Given _certain

assumptions,

they

showed that W =

V!V(,

where

V

t

is a 3 x 3 covariance matrix of elements in the 3 columns of T and

_V

₉

is the

_{corresponding}

covariance matrix of elements in the 3 columns of

_Q.

Because

(co)variances

in

_V

_Q

are

generally

smaller than

_{(co)variances}

in

_V

_t

_,

_major

locus effects are overestimated in absolute terms when

_using

_Q

instead of T. The

(co)variances

in

_V

₉

were calculated from the actual solutions for estimates of

genotype

probabilities

of all animals with records. Covariances in

_V

_t

were

_computed

as:

where

_q

_.k

is the _average

_genotype

_probability

for

_genotype

k of all animals with records and can be

_regarded

as an estimate of the

_frequency

of that

_genotype

in the

_population.

_Genotype

_{probabilities}

were estimated with the

_algorithm

of

van Arendonk et al

_(1989).

This

_algorithm

_requires

the allele

_frequency

_pas an

input

parameter.

Kinghorn

et al

₍₁₉₉₃₎

_kept

the initial value for _pconstant over all

iterations,

ie

_regarded

the initial _pas the true value. But if _pwas

known,

Cov(t

k

,t¡)

could also be derived from the

_expected

_frequencies

of the 3

_genotypes.

In our

implementation

Cov(t!,tl)

was

_computed

with

[14]

and the allele

_frequency

_pwas

estimated with

_(13!,

which is a natural deduction from

_!14!.

The linear model can be written in matrix notation as:

Kinghorn

et al

₍₁₉₉₃₎

assumed that

_Var(a

_*

₎

=

Var(a)

= A -

Q

a

and

Var(e

*

) =

Var(e)

= I -

Q

e.

The matrices

Q

and W are _notknown and have

to

be estimated

from the data as described above.

Therefore,

the

_following

_system

of

_equations

has to be solved

_iteratively:

Estimates for g should be unbiased but estimates for b and a are still biased. We

attempted

to correct for

_{the bias}

in

b by

adding

(X’X)-

l

X’ZQ(W -

I)g’’

+1

,

the

expected

difference between

b

r+1

and

b

*r+1

under the

_assumptions

_E(T)

=

E(Q),

(12)

Simulation

The methods of Hoeschele

₍₁₉₈₈₎

and

_Kinghorn

et al

₍₁₉₉₃₎

were

compared

with

the method

_developed

in this

_{study applying}

approximations

[10]

and

_[12]

_using

stochastic

computer

simulation.

_Phenotypic

observations were

generated by

using

the

_following

mixed model:

where

_hys

_i

is the fixed effect of herd x _yearx sex

_i,

_g!is the fixed effect of

major

locus

_genotype

_j,

_a2!!is the

_polygenic

breeding

value _{and e2!!}is the random residual effect. The effects in the model were

_sampled

as follows:

f hys

i

N(0,I

J

fI)

fa

ijk

} -

N(0,A

J§

)

and

_{{e2!! } N}

_N(0,I

_J§

_{) .}

_Major

locus

_genotypes

were simulated

with 2

_segregating

alleles.

Genotypes

of base animals were

generated

by sampling

2 alleles from a uniform distribution between 0.0 and 1.0 with threshold p, the

frequency

of allele A.

_Genotypes

of _progenywere determined

according

to mendelian

segregation.

The effect of

_genotype

3 was set to zero as there is a

dependency

between fixed herd x _yearx sex and

_major

locus effects.

Three different sets of

_parameters

were used

(table I).

Only

additive effects of

the

_major

locus were

considered,

although

all of the methods

compared

allow for

dominance. In the first set of

_{parameters, 50%}

of the

_phenotypic

variance

_(variance

due to

_major

locus +

polygenic

variance + residual

_variance)

is due to

_genetic

effects,

75%

of the

_genetic

variance is due to the

_{major locus,}

and

25%

is due to the

_polygenes.

The

frequency

of allele A with

_major

effect is

25%

in the base

population,

which results in an allele substitution effect a of

_1.0,

ie

_genotype

effects of 2.0

_(AA),

1.0

_(Aa)

and 0

_(aa).

In

_parameter

set

_2,

the allele

frequency

p is

0.5,

but the

_genotype

effects and all the other

parameters

are the same as in set 1. Thus the variance due to the

_major

locus is increased from 0.375 to

_0.5,

and the

phenotypic

variance

_changes

from 1.0 to 1.125. In

_parameter

set

3,

the allele

_frequency

_{p is}

0.25 and

50%

of the

_phenotypic

variance is due to

_{genetic effects,}

as in

_parameter

set

_1,

but the

_proportion

of

genetic

variance due to the

polygenes

is increased from 25 to

40%,

which results in an allele substitution effect a of

0.8.

(13)

simple.

In each of 10

_herds,

20 base dams each had a record in year 1. A _groupof 20 base sires each with their own record in a common herd x _year

_(eg

test

_station)

was mated with these base dams. Each sire was

_randomly

mated with 1 dam in each herd. Each

_mating

_produced

5 progeny in year 2. The sex of each _progeny was determined

_{by sampling}

from a uniform distribution between 0.0 and 1.0 with

threshold 0.5. The

_population

size was

1220,

made _upof 220 base animals and

1 000 progeny.

In each of the

alternatives,

the same _sequenceof random numbers was used.

Therefore,

identical data sets were

_analysed

with each of the 3 methods considered. Each alternative was

replicated

25 times.

With each of the 3

_methods,

final solutions are obtained

_by

_repeatedly

_computing

genotype

probabilities

and

_solving

a

_system

of

_equations

to

_get

new solutions for

major

genotype

effects and

_{polygenic breeding}

values. A

_stopping

criterion of the form:

was used for

_major

_genotype

effects _{g and}the allele

frequency

_p.

RESULTS

When the

_genotypes

of all animals with records are

known,

the estimates for

_major

locus effects _gare identical for all 3 methods considered

(table II).

Estimates for the

allele

_frequency

p,

however,

differed

slightly.

Using

formula

[13]

(Hoeschele,

1988;

Kinghorn

et

at,

1993)

the standard deviations

_(SD)

of estimated p were

_larger

than estimates

_by

_[6].

The estimates for _gand p agree well with the true values. Estimates of _gacross

_parameter

sets are

_{consistently slightly larger}

than the true

values,

which can be

explained by sampling

effects and the fact that for each of the

25

_replicates,

data for the 3

parameter

sets were

_generated

with the same set of random numbers. As

_expected

from the

_{heritabilities,}

the correlations between true and

_{predicted breeding}

values were the same for

_parameter

sets 1 and 2 and

_slightly

higher

for

_parameter

set 3. The correlations between

_{predicted breeding}

values and estimated

_major

locus effects were close to _zero,

_showing

that the 2 effects were

well

_separated

in all cases.

Table III shows the simulation results for the 3

_parameter

sets

_using

all 3

procedures

when

_major

locus

_genotypes

were unknown. For

_parameter

sets 1 and

2,

estimates of

_major

locus effects g were close to the true values or

slightly

underestimated with

_approximated

maximum likelihood

_(AML),

underestimated

by

about

20%

with the method of Hoeschele

₍₁₉₈₈₎

and overstimated

_by

25 to

30%

with the method of

_Kinghorn

et at

_(1993).

For

parameter

set

_3,

estimates of

_major

locus effects _gwere zero for 2

_replicates

_using

AML and for 21

_replicates

using

the method of Hoeschele

_(1988).

Non-zero estimates

_{of g}

were biased

upwards

by

14%

with A1VIL and

_by

47%

with the method of

_Kinghorn

et at

_(1993).

Both ANIL and the method of Hoeschele

₍₁₉₈₈₎

showed a

_{large variability}

of the non-zero estimates of

_major

locus effects for

_parameter

set 3. When the true allele

(14)

AML,

but estimated

_quite

well with the other 2 methods. Correlations between true and

predicted

breeding

values were similar for AML and the method of Hoeschele

(1988),

but zero for the method

of Kinghorn

et al

(199_3).

For

_parameter

sets 1 and

2,

the correlations between true

_(Tg)

and estimated

(Qg)

_major

locus effects were

similar for all 3 methods. When

_major

locus effects were smaller

(parameter

set

₃₎

these correlations were

_largest

with the method

of Kinghorn

et al

(1993).

_Predicted

breeding

values were

positively

correlated to estimated

_major

locus effects

Qg

with AML and to a

larger

extent with the method of Hoeschele

_(1988).

Using

the method of

_Kinghorn

et al

₍₁₉₉₃₎

these correlations were

strongly

_negative.

Because _{poor estimation}of p also affects all the other

estimates,

additional simulations were done with the allele

_frequency

fixed at the true

_(expected)

value. Results are

_reported

in table IV for A1VIL and the method of Hoeschele

₍₁₉₈₈₎

for

parameter

sets 1 and 3. All other results were close to those of table III and are

therefore not shown.

_Major

locus effects g were underestimated less with AML and

the correlations were similar for both

methods.

For

_parameter

set

_3,

the number of

replicates

with estimates of zero for

_major

locus effects was

_again

much

_larger

with

_(1988).

Table V _comparesthe 3 methods for the case where all sires and

50%

of the

dams are

gendtyped

at the

_major

locus. There was still a

_tendency

for AML to underestimate the allele

frequency

p when the true

_frequency

was 0.25. The method of Hoeschele

₍₁₉₈₈₎

underestimated

_major

locus effects

considerably

more than

AML

₍₉

to

31%

ver.sus 1 to

_11%),

whereas these effects were overestimated

by

22 to

43%

with the method of

_Kinghorn

et al

_(1993).

The accuracies of

_predicted

breeding

values were

_again

similar for AML and the method of Hoeschele

(1988)

but

much lower for the method of

_Kinghorn

et al

_(1993).

The accuracies of estimated

genetic

values at the

_major

locus were similar for all 3 methods with a

tendency

of

lower accuracies for the method of

_Kinghorn

et al

_(1993).

When all the sires but

(15)

intermediate between the 2 cases of no animals and all sires

_plus

50%

of the dams

genotyped.

So

_far,

final solutions have been

_reported

for iterations where

_starting

values

were

equal

to true

_(expected)

values. Table VI shows the number of

replicates

that

converged

to the same solutions

_using

different

_starting

values. Low

_starting

values

were half the true values and

_high

_starting

values were 1.5 times the true values of

_major

locus effects _gand allele

_frequency

p. When

major

locus

genotypes

were

not

_known,

none to a few

_replicates

_converged

to a

_single

set of solutions with all 3 different

_starting

values. For the method of Hoeschele

₍₁₉₈₈₎

with

parameter

set

_3,

most of the

_replicates

that

_converged

to the same solutions

_converged

to an estimate of zero for

_major

locus effects _{g. For AML}and the method of Hoeschele

(1988),

all

(16)

the sires

_(but

none of the

dams)

were known. The

_largest

number of

_replicates

with all 3 solutions different was found with the method

of Kinghorn

et al

_(1993).

DISCUSSION

The method

_proposed

here

_(AML)

_generally

_slightly

underestimates

_major

locus effects _gand

_seriously

underestimates allele

_frequency

p when the true

_frequency

is 0.25. The underestimation

_{of p}

leads to increased estimates

_{of g, although}

not to the extent that the variance

_{explained by}

the

_major

locus

_stays

constant

_(tables

III and

_IV).

This variance is

_higher

when the allele

_frequency

is fixed at the true value. The allele

_frequency

was still

_considerably

underestimated for

_parameter

set 1 when the

_pppulation

size was 10 times

larger

than considered here

(results

not

shown).

The allele

_frequency

was estimated

_by

(6!,

which was derived

by

_maximizing

the likelihood of the

data,

whereas the other 2 methods used

_[13].

Additional simulation runs with

_parameter

sets 1 and 3 and

_{approximations}

_[9]

and

_[11]

together

with

_[6]

showed

_considerably

lower estimates of p and

higher

estimates of _gthan results for the same 2

_{approximations}

_{applied together}

with

_[13],

_{(1988) (results}

not

_shown).

There seems to be a

problem

in

applying

[6]

together

with

_{approximations [10]}

and

_[12]

or, to a lesser

_extent,

with

[9]

and

_(11!.

Nevertheless

_[6]

is the correct

_equation

for the estimation of the allele

(17)

The method of Hoeschele

₍₁₉₈₈₎

_consistently

underestimated

_major

locus effects g which is in

_agreement

with the simulation results of the same author. For smaller

allele effects

_(parameter

set

_3),

_although

still

_quite

_large,

most of the estimates of _gwere zero,

_indicating

that the

genotype

effects have to be

_large

in order to be

_recognized.

The same is true for

_A1VIL,

but to a lesser extent. There was a

tendency

for the

_accuracies

of

_{predicted polygenic breeding}

values

_(a)

and estimated

major

locus effects

_(6g)

to be

_slightly

_higher

with AML than with the method of Hoeschele

_(1988).

In an unselected

population

as simulated here the

expected

correlation between true

_polygenic

and

_major

locus effects is zero. The correlations

(18)

(gametic disequilibrium)

which will make

_separation

of the 2 effects more difficult.

For AML and the method of Hoeschele

_(1988),

the mean correlations _ra,_a were

lower and

r- o-

were

_higher

when the allele

_frequency

was 0.5

_(parameter

set

₂₎

than when the same allele had a

frequency

of 0.25

(parameter

set

_{1) (tables}

III and

_V).

_Although

the

_proportion

of variance

_{explained by}

the

_major

locus is

_higher

with

_parameter

set 2 it seems to be more difficult to _separate

_polygenic

and

_major

locus effects with intermediate allele

_frequencies.

This was also found

by

Knott et

al

_(1992a)

for similar

_{approximations.}

For

_parameter

sets 1 and

2,

both methods showed a

large

reduction of 35 to

40%

_{for ra,}_a and 25 to

32%

for

7 -

T

g,

Q

g

when

genotypes

were unknown rather than known

_(tables

II and

_III).

With the method

_{of Kinghorn}

et al

_(1993),

estimates of the allele

_frequency

_{p were}

generally

closer to the true values than with the other 2

_{procedures. However, major}

locus effects were overestimated and the correlations between true and

_predicted

breeding

values were close to zero which is in

_agreement

with their simulation results. The method

_attempts

to correct for the bias inherent in

_major

locus estimates

_by

_regression

on the

independent

variable

ZQ!,

an estimate from the

data,

which is associated with some error. The term

_ZQ!

is

_{postmultiplied}

_by

the correction matrix W!.

_ZQ’’W

_r

is then used the same _wayas a usual incidence

matrix in the mixed model

_equations.

_{Multiplication by}

wr increases the variance of the

_independent

variable to the variance

_expected

for the unknown term ZT. Because wr is calculated over all animals with

records,

the new variance is correct

only

on the _average.For an animal with known

_genotype,

the elements in

_Q!

are

(19)

closer to an

identity

matrix in

_comparison

to dams. In

_{addition, breeding}

values

estimated

_by

_[15]

are still biased. These 2

_problems

are

probably responsible

for

the overestimation of g and very poor

prediction

of

_{polygenic breeding}

values. The

_performance

of the method was,

however,

less affected

by

smaller allele effects

(parameter

set

₃₎

than the other 2

_procedures.

For all 3

_procedures

there was a

problem

of different solutions with different

starting

values when

_genotypes

were unknown. For AML and the method of

Hoeschele

₍₁₉₈₈₎

the cause could be the

multimodality

of the likelihood function.

It seems to be necessary to

_compute

_approximated

likelihoods which then can be

used to select the solutions with the

_highest

likelihood. This could of course also

be done with the method of

_Kinghorn

et al

₍₁₉₉₃₎

but this method has no direct

relationship

with maximum likelihood.

In this

_study

variance

_components

were assumed to be known but in

_practice

have to be estimated.

_Using

incorrect values could lead to biased estimates of

_major

genotype

effects and

_frequencies.

For

_{example, using}

an underestimated

_genetic

variance

_might

result in an overestimation of the

_major

_genotype

effects. If a

_major

allele is known to be

_segregating

variance _{components free of}

_major

_genotype

effects would have to be estimated with model

_!1!.

This could be _verydifficult because even

when the true variance

_components

were

used,

all 3 methods

performed poorly

when

no animals were

_genotyped.

Clearly,

none of the methods is

satisfactory

for a

_separate

_genetic

evaluation for

the

_major

locus and the

_polygenes.

In this

_{study only}

_large

effects were considered.

AML

and,

especially,

₍₁₉₈₈₎

were unable to detect smaller effects than used with

_parameter

set 3. For

_example,

the effects estimated for the

prolactin

locus in a Holstein sire

_family

_(Cowan

et

_al,

₁₉₉₀₎

were much smaller

than considered here. The method

proposed

has some

potential

for

_improvement.

Future research should focus on the

_development

of

_algorithms

to estimate

_genotype

probabilities

without _{any restriction}on

_pedigree

structures. The estimation of

joint

genotype

probabilities

for _{any 2}

_pairs

of animals

_together

with sparse matrix

techniques

to

_compute

the elements of M could avoid the need for some of the

approximations

made in this

_study.

ACKNOWLEDGMENT

This research was conducted while AH was a

_visiting

scientist at the

_University

of

Guelph.

Financial _supportfrom the Schweizerischer

_{Nationalfonds,}

_Switzerland,

is

_gratefully

acknowledged.

REFERENCES

Cowan

CM,

Dentine

_1VIR,

Ax

_RL,

Schuler LA

₍₁₉₉₀₎

Structural variation around

prolactin

gene linked to

_quantitative

traits in an elite Holstein sire

_family.

Theor

Appl

Genet

_79,

577-582

Eikelenboom

G,

Minkema

D,

van Eldik

P,

Sybesma

W

₍₁₉₈₀₎

Performance of Dutch Landrace

_pigs

with different

_genotypes

for the halothane-induced

_malignant

(20)

Elsen

JVI,

Le

_Roy

P

₍₁₉₈₉₎

_Simplified

versions of

_segregation

_analysis

for detection of

_major

_{genes in}animal

_breeding

data.

_40th

Annual

_{Meeting of}

_{EAAP, Dublin,}

27-31

_August,

1989

Elston

RC,

Stewart J

₍₁₉₇₁₎

A

_general

model for the

_genetic

_analysis

of

_pedigree

data. Hum Hered

_21,

523-542

Fernando

_RL,

Stricker

_C,

Elston RC

₍₁₉₉₃₎

Scheme to

_compute

the likelihood of a

pedigree

without

_loops

and the

_{posterior genotypic}

distribution for _everymember of the

_pedigree.

Theor

_Appl

_Genet,

_{in press}

Henderson CR

₍₁₉₈₄₎

_{Applications of Linear}

Models in. Animal

_{Breeding. University}

of

_{Guelph, Guelph,}

Canada

Hoeschele I

₍₁₉₈₈₎

Genetic evaluation with data

_presenting

evidence of mixed

_major

gene and

polygenic

inheritance. Theor

Appl

Genet

76,

81-92

Hoeschele

_I,

Meinert TR

₍₁₉₉₀₎

Association of

_genetic

defects with

_yield

and

_type

traits: The weaver locus effect on

_yield.

J

_Dairy

Sci

_73,

2503-2515

Johnston J

₍₁₉₈₄₎

Econometric Methods.

McGraw-HiIl,

New York

Kennedy

BW,

Quinton M,

van Arendonk JAM

(1992)

Estimation of effects of

single

genes on

_quantitative

traits. J Anim Sci

_70,

2000-2012

Kinghorn

BP, Kennedy BW,

Smith C

₍₁₉₉₃₎

A method of

_screening

for genes of

major

effect. Genetics

134,

351-360

Knott

_{SA, Haley CS, Thompson}

R

_(1992a)

Methods of

_segregation

_analysis

for animal

_breeding

data: a

_comparison

of _power.

_Heredity

_68,

299-312

Knott

_SA,

_Haley

_CS,

_Thompson

R

_(1992b)

Methods of

_segregation

_analysis

for animal

_breeding

data: parameter estimates.

_Heredity

_68,

313-320

Lange K,

Elston RC

₍₁₉₇₅₎

Extensions to

_pedigree

_analysis.

I. Likelihood calcula-tions for

_simple

and

_{complex pedigrees.}

Hum Hered

_{25, 95-105}

Le

Roy P,

Elsen

JM,

Knott S

₍₁₉₈₉₎

_Comparison

of four statistical methods for detection of a

_major

_{gene in}a _progenytest

_design.

Genet Sel

_{Evol 21,}

341-357 Le

_Roy

P,

Naveau

J,

Elsen

JNI,

Sellier P

₍₁₉₉₀₎

Evidence for a new

_major

_gene

influencing

meat

_quality

in

_pigs.

Genet Res Camb

_55,

33-40

VIacLennan

DH, Phillips

MS

₍₁₉₉₂₎

_{Malignant hyperthermia.}

Science

_{256, 789-794}

Morton

NC,

MacLean CJ

₍₁₉₇₄₎

_Analysis

of

_family

resemblance. III.

_Complex

segregation

of

_quantitative

traits. Am J Hum Genet

_26,

489-503

Ott J

₍₁₉₇₉₎

Maximum likelihood estimation

_by

_counting

methods under

_polygenic

and mixed models in human

pedigrees.

Am J Hum Genet

31, 161-175

Van Arendonk

_JAM,

Smith

_{C, Kennedy}

BW

₍₁₉₈₉₎

Method to estimate