Random model approach for QTL mapping in half-sib families

(1)

HAL Id: hal-00894267

https://hal.archives-ouvertes.fr/hal-00894267

Submitted on 1 Jan 1999

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Random model approach for QTL mapping in half-sib

families

Mario L. Martinez, Natascha Vukasinovic, Freeman Gene (a.E.)

To cite this version:

(2)

Original

article

Random model

_approach

for

_QTL

mapping

in half-sib families

Mario L.

_Martinez,

Natascha Vukasinovic*

Gene

_(A.E.)

Freeman

Department

of Animal

Science,

Iowa State

_{University, Ames,}

IA _50011,USA

(Received

7

_April

_1998;

_accepted

9 June

₁₉₉₉₎

Abstract - An interval

_{mapping procedure}

based on the random model

approach

was

_applied

to investigate its

_{appropriateness}

and robustness for

QTL

mapping

in

_populations

with

_prevailing

half-sib

family

structures. Under a random

_model,

QTL

location and variance components were estimated

using

maximum likelihood

techniques.

The estimation of _parameterswas based on the

_{sib-pair approach.}

The

proportion

of genes

identical-by-descent

(IBD)

at the

QTL

was estimated from

the IBD at two

_flanking

marker loci. Estimates for

_QTL

_parameters

_(location

and

variance

_components)

and power were obtained

_using

simulated

data,

and

varying

the

number of

families, heritability

of the trait,

proportion

of

_QTL

variance, number of marker alleles and number of alleles at

_QTL.

The most

_important

factors

influencing

the estimates of

_QTL

_parametersand _powerwere

heritability

of the trait and the

proportion

of

genetic

variance due to

QTL.

The number of

QTL

alleles neither influenced the estimates of

_QTL

parameters nor _{the power}of

QTL

detection. With

a

_{higher heritability, confounding}

between

QTL

and the

polygenic

component was

observed. Given a sufficient number of families and informative

polyallelic markers,

the random model

approach

can detect a

_QTL

that

_explains

at least 15 % of the

genetic

variance with

high

power and

provides

accurate estimates of the

QTL position.

For fine

_QTL

_mappingand proper estimation of

QTL

variance, more

sophisticated

methods are, however,

required. ©

Inra/Elsevier,

Paris QTL

/

random

_{model / interval}

_mapping

_/

sib-pair method

Résumé - Approche en modèle aléatoire pour la détection de QTL des familles de

demi-frères

_(soeurs).

Une

_procédure

de

cartographie

basée sur

_l’approche

en modèle

aléatoire a été

_appliquée

de manière à examiner sa

_pertinence

et sa robustesse _pour

la détection de

_(aTLs

dans les

populations

où

_prévaut

la structure en familles de

demi-frères. Dans un modèle aléatoire, la _positiondu

QTL

et les composantes de

variance ont été estimées en utilisant les techniques de maximum de vraisemblance.

*

Correspondence

and

_reprints:

Animal

Breeding Group,

Swiss Federal Institute of

Technology,

Clausiusstr. 50, 8092

_Zurich,

Switzerland

(3)

L’estimation des

_paramètres

a été basée sur

l’approche

_parles _paires

d’apparentés.

La

_proportion

de

gènes identiques

par descendance

(IBD)

au

QTL

a été estimée à

partir de l’IBD à deux loci de _marqueurs

_flanquants.

Les estimées des

paramètres

pour le

QTL

(position

et composante de

_variance)

et la puissance ont été obtenus en

utilisant des données simulées et en faisant varier le nombre de

_familles,

l’héritabilité du

_caractère,

la

_proportion

de variance au

_QTL,

le nombre d’allèles au _marqueur

et le nombre d’allèles au

_QTL.

Les facteurs les

_plus

importants

influençant

les estimées de

paramètres

au

QTL

et la puissance ont été l’héritabilité du caractère

et la

_proportion

de variance

_génétique

due au

QTL.

Le nombre d’allèles au

QTL

n’a

influencé ni les estimées des

paramètres

au

_QTL

ni la puissance de détection du

_QTL.

À

une héritabilité _élevée,on a observé une confusion entre la _composante

_QTL

et la

composante

polygénique.

S’il y a un nombre suffisant de familles et de _marqueurs

polyallèliques informatifs, l’approche

du modèle aléatoire _permetde détecter avec une _puissanceélevée un

_QTL

_qui

_explique

au moins 15 % de la variance

_génétique

et d’estimer

_{précisément}

la _positionde ce

_QTL.

Pour une détection

_précise

et une

estimation correcte de la variance au

QTL,

des méthodes

_{plus sophistiquées}

sont

cependant

nécessaires.

_©

_{Inra/Elsevier,}

Paris

QTL

/

modèle aléatoire

_/

_cartographiepar intervalle

/

méthode des paires d’apparentés

1. INTRODUCTION

The

_development

of

_linkage

_mapswith

_large

numbers of molecular markers has stimulated the search for methods to _{map genes involved in}

_quantitative

traits. The search for

QTL

has been most successful in

_plants

and

_laboratory

animals for which data are available for backcross and

_F

₂

_generation

from

inbred lines. With such

data,

the

parental

genotypes,

the

linkage phases

of the

_loci,

and the number of alleles at the

_putative

_QTL

are known

_precisely.

Additionally,

data from

_designed

_experiments

can be considered as one

_large

family,

because all individuals share the same

_parental

_genotypes.

As a

_result,

the effect of

_QTL

substitution and dominance can be

_directly

estimated

[14,

18, 24! .

In most livestock

_species,

_especially

in

_{dairy cattle,}

data from inbred lines and their crosses are not available. An outbred

_population

is assumed to be

in

_{linkage equilibrium.}

In the absence of

_{linkage disequilibrium,}

the

_linkage

phase

between the

_QTL

and the markers will differ from

_family

to

_{family, and,}

therefore,

the

_analysis

of the

_{marker-(aTL linkage}

has to be made within a

family

[17].

The

_family

_size,

_however,

is

_usually

not

_{large enough}

to enable

accurate

_analysis

within a

_{single pedigree.}

_{Additionally,}

the number of

_(aTLs

affecting

traits of interest is

_uncertain,

as well as the number of alleles at each

QTL.

With the _presenceof a biallelic

QTL

with codominant

inheritance,

the

distribution of

_genotypic

values is a mixture of three normal distributions.

But,

with more alleles at the

_QTL,

the number of

_possible

_genotypes

increases

and the

_analysis

becomes

_complicated

and tedious. With an unknown number

of

_QTL

alleles it is

_impossible

to determine the exact number of

_genotypes,

i.e. the number of normal distributions that build up the overall distribution

of

_genotypic

values. In such

_situations,

the detection of

_{linkage relationships}

between a

_{putative QTL}

and the marker loci can

_only

be based on robust

model-free

_{(non-parametric)}

and

_{computationally rapid linkage methods,}

such

(4)

The random model

_approach

is based on the

_{phenotypic similarity}

(or

covariance)

between

_genetically

related individuals. The covariance between

two relatives

_comprises

a

polygenic

and a

QTL

_component.

The

polygenic

component

depends

on the

_{genetic relationship}

between

animals,

whereas the

QTL

component

depends

on the

_proportion

of alleles

_{identical-by-descent}

(IBD)

that two individuals share at the

QTL.

The

polygenic

component

consists of _{many genes}with small effects.

_Thus,

it is assumed that the

average

proportion

of alleles IBD shared

by

two individuals

equals

the

genetic

relationship

coefficient between the

relatives,

i.e.

_1/2

for full-sibs and

_1/4

for half-sibs. For the same kind of

_{relationship, however,}

the IBD

_proportion

at the

QTL

differs from one

_pair

of relatives to another. Because the actual

_proportion

of alleles IBD at the

QTL

is not

_observable,

the

proportion

QTL

shared

_by

two relatives

₍

₇

_rq)

must be inferred from the observed

_genotypes

at linked marker loci.

Haseman and Elston

_[16]

_proposed

a robust

_{sib-pair approach}

based on

simple

linear

_regression

of

_{squared phenotypic}

differences between two sibs within a

_family

on the

_proportion

of alleles IBD shared

_by

the two sibs at

the

_QTL.

The Haseman-Elston

sib-pair

method has been

_proved

to be robust

against

a

_variety

of distributions of data and

_independent

of the actual

_genetic

model of the

_{QTL. However,}

this method is

_limited,

because the

_genetic

effect of the

_QTL

and the recombination fraction between the

QTL

and a marker locus are confounded. It can

_only

detect

_linkage

between a marker and a

QTL,

but cannot estimate whether this is due to a

QTL

with a

large

effect at a

_large

distance,

or to a

QTL

with a small effect

closely

linked to the marker.

Fulker and Cardon

_[8]

_developed

a

_sib-pair

interval

_mapping

_{procedure using}

two markers to

_separate

the location of a

_QTL

from its effect and to estimate

the

specific position

of a

QTL

on a chromosome. This results in a

higher

statistical _power,but it is still a

least-square-based

method

and, therefore,

does not

_optimally

utilize all information that could be extracted from the distribution of the

specific data,

as a maximum likelihood

(ML)

method would

do.

Goldgar

[10]

developed

a

_multipoint

IBD method based on the ML

_approach

to estimate the

_genetic

variance

_{explained by}

a

_particular

chromosomal

_region.

This method has been extended

_by

Schork

_[19]

to

_{simultaneously}

estimate variances of several chromosomal

_regions

and the common environmental effect

shared

_by

all sibs. Both methods take

_advantage

of the distributional

_properties

of the data

_{and, therefore,}

are more

_powerful

than the Haseman-Elston method.

However, they only

estimate variance of

_QTL

and not the exact

_{QTL position.}

Xu and

_Atchley

_[22]

extended the

_Goldgar’s

ML method to interval

_mapping.

They

developed

an efficient

general

QTL mapping procedure, assuming

a

single

normal distribution of

_{QTL genotypic}

values and

_fitting

a

_QTL

as a random

effect

_along

with a

polygenic

_component.

They

showed

that,

using

the random

model

_approach,

a

QTL

can be

successfully

mapped

and its variance estimated

in full-sib families.

The ML-based random model

_approach

for

_{QTL mapping using}

the

_sib-pair

method has been well established for

linkage analysis

in humans

_{[3, 22]}

and

(5)

a)

to extend the random model

_approach

for

_{QTL mapping}

based on a

sib-pair

method to half-sib

families;

b)

to test the

_{appropriateness}

and robustness of a random model

approach

for

_{QTL mapping}

in half-sib families with different

_sample

_sizes,

heritabilities of the

_trait,

_QTL

_variances,

number of alleles at marker loci and number of alleles at the

_{QTL using}

stochastic simulation.

2. THEORY

2.1.

_Estimating

the

_proportion

of IBD in half-sib families

If the markers are

_{fully informative,}

the

proportion

of alleles IBD

₍

₇

_i)

shared

by

two sibs at a locus can be

_0,

1/2

or 1 if

they

share zero, one or two

_parental

alleles,

respectively.

For

_half-sibs,

the

_proportion

of alleles IBD at a locus can

be either 0 or

_1/2,

since

_they

_only

have one common

_parent

and

_therefore,

assuming

unrelated

_{dams, they}

can share either zero or one

_parental

allele.

If the markers are not

_{fully informative,}

the _!risat the markers cannot

be observed and must be

_{replaced by}

their

_expected

values conditional on

marker information available on sibs and their

parents.

Haseman and Elston

[16]

proposed

a

_simple

method to calculate !r.l as

where

_f

_i2

and

_f,,

_l

are the

_{probabilities}

that the sibs share two or one allele at

a

locus, respectively,

conditional on observed

_genotypes

of the sibs and their

parents.

Analogously,

7r,

for two half-sibs can be estimated as

The

_proportions

of alleles IBD at marker loci are used to calculate the

proportion

_QTL,

because two

_offspring

that receive the

same marker allele are

_likely

to receive the same allele at a linked

_QTL.

Haseman and Elston

_[16]

showed that the

_{expected proportion}

of IBD at one

locus is a linear function of the

_proportion

of IBD at another locus. Fulker and Cardon

_[8]

used the

_proportions

of IBD at two

_flanking

markers to calculate the conditional mean of the

_proportion

of IBD at the

_QTL

₍

₇

_q),

which is also

a linear function of %s at two

_flanking

markers:

where 7rl and !r2 are IBD values for two

_flanking

markers. The

_/

₃

_weights

are

_{given by}

the normal

_equation:

(6)

2 and the

_{putative QTL, respectively, replacing}

all 7rS with

1/4,

all variances

(V(

7 r

;

))

with

_1/16,

and all covariances

_(Cov(!ri,

_!r!))

with

_{(1 —}

_2!)!/16,

and

solving

(4),

the estimates of

(

3

values can be obtained as follows

[2,

7,

8!:

2.2.

_{Mapping procedure}

under the random model

A

_general

form of the random model has been defined

_by

_Goldgar

_[10]

as

where _y_ij is the

_phenotypic

value of the trait in the

_jth

_offspring

of the ith half-sib

_family;

_pis the

_population

mean; _g_ij_{is the random additive}

genetic

effect of the

_QTL

with mean = 0 and variance =

or2;

aij is the random additive

polygenic

effect with mean = 0 and variance =

er!;

e2! is the random environmental deviation with mean = 0 and variance =

u!.

All random effects in the model are assumed to be

_normally

distributed.

However,

if

_Q

_a

_and

_af

are

_{large enough}

to make the distribution of the data

normal,

the normal distribution of the

QTL

effects is not

_{absolutely required.}

In a half-sib

family,

the variance of y2!

_assuming

a

linkage equilibrium

is:

and a covariance between two non-inbred

_{half sibs j}

_and

_j’

is:

with _!rq= the

proportion

putative QTL

shared

by

two

half-sibs.

The coefficient of the

_polygenic

variance is

_1/4

_because,

_by

_expectation,

two

non-inbred half-sibs share

_1/4

alleles IBD. The

_proportion

of IBD at the

_QTL

(!rq)

will be different for each half-sib

_pair.

₇_rqis a variable that _rangesfrom 0

to

_1/2

in half-sib families.

For the estimation of variance

_components,

₇_rqin

_equation

₍₉₎

is

_{replaced by}

its estimated value

_trq

from

equation

(3).

(7)

With k sibs in each

_family,

_C

_i

is a k x k matrix.

We define

_h9

=

u.!/ U2

as the

_heritability

of a

_putative

_QTL,

h’

=

u;/ u2

as

the

_heritability

of a

_polygenic

_component,

and

_{ht =}

_(!9

₊

_{u;) / u}

₂

as the total

heritability. Assuming

a multivariate normal distribution of the data

₍

_Yij

_),

we

have a

_joint

_density

function of the observations within a half-sib

_family:

where _y₂ =

[

Y

i

l

Yi2 yZ3...

yZ!;!!

is a k x 1 vector of observed

phenotypic

values

for k half-sibs within the ith

_family,

and 1 = k x 1 vector with all entries

_equal

to 1.

The overall

log

likelihood for n

_independent

families is

The likelihood function relates to the

_position

of the

_QTL

flanked

_by

two

markers

_through

ri. The unknown

parameters

that have to be estimated are

p,

Q

z

,

h9,

ha

and

_q.

₁

₀

In

maximizing

L,

the common

_practice

in the interval

mapping procedure

is to treat the recombination fraction between the first marker and a

_putative

_QTL

₍₀

₁

_,)

first as a known

_constant,

then

_gradually

increase

₀

₁

_,

and decrease the distance between the

_QTL

and the

_right

marker

(0q2 )

throughout

the entire interval between the

_flanking

_markers,

and

_repeat

the

_procedure

_{in every}interval

_{until, eventually,}

the whole genome is screened.

The maximum likelihood estimate of the

_QTL

_position

is determined

_by

the value of

₀

₁

_,

in the

_appropriate

interval that maximizes L

_through

the entire

chromosome.

The null

_hypothesis

is that

_{h! =}

_0,

i.e. that no

_QTL

is

_present

in the tested interval. The ML under null

_hypothesis

is denoted

_by

_L

_o

_.

The likelihood ratio

(LR)

test statistics is

The LR statistics under

_H

_o

follows the

_x2

distribution

with a number of

degrees

of freedom

_(df)

between 1 and 2. With a

single QTL,

one df is due to

fitting

_{h9 and}

the

_remaining

df for

_fitting

the

_{QTL position.}

The

_remaining

df

depends

on the distance between two markers and is less than one because

we search for the

_{QTL only}

within an

_interval,

rather than in the entire genome

(chromosome).

If the

_H

_o

is that no

_QTL

is present in the whole _genome

(chromosome)

covered

_by

the

_markers,

the df under

_H

_o

is =N 2

!22!.

3. SIMULATION AND ANALYSES

The Monte Carlo simulation

_technique

was used to

_generate

_genotypic

and

phenotypic

data.

_{Mapping QTL}

were considered in a 100 cM

_long

chromosomal

segment

covered

_by

six

_{markers, equally}

distributed

_along

the chromosome

(8)

same

_frequency.

A

_{single QTL}

with several codominant alleles with the same

frequency

and additive effects was simulated in the middle of the chromosomal

segment

(i.e.

at 50

_cM).

Parents were

_generated

_by

the random allocation of

_genotypes

at each

locus

_assuming

a

Hardy-Weinberg equilibrium.

Parental

linkage phases

were

assumed unknown.

_Offspring

were

generated

assuming

no

_{interference,}

so that a recombination event in one interval does not affect the occurrence of a

recombination event in an

_adjacent

interval. Recombination fractions for each

locus were calculated

_using

the Haldane _mapfunction

!13!.

Normally

distributed

phenotypic

data with mean = 0 and variance = 1 were

generated

according

to the

following

model:

where _y2!is the

_phenotypic

value of the

_{individual j in}

the half-sib

_family

_i;

p

is the

population

mean; qi! is the effect of the

QTL

genotype

of individual

j;

s

i is the sire’s contribution to the

_{polygenic value;}

_d

_ij

the dam’s contribution

to the

_polygenic

_value;

_4>ij

is the effect of Mendelian

_sampling

on the

_polygenic

value;

and _e_ij the residual error.

Phenotypic

values were assumed

pre-corrected

for fixed environmental

ef-fects.

Family

structure was chosen to accommodate a

typical

situation in a

commercial

dairy population.

For

_simplicity,

sires were assumed to be

unre-lated. Each sire was mated to 25

_randomly

chosen unrelated dams to

_produce

one

offspring

_per

mating.

The values of the simulated

_parameters

varied

_depending

on the

_major

purpose of the simulation.

To test the behavior of the random model

approach

under different heri-tabilities of the trait and different

_proportions

of variance

_explained

_by

the

QTL

(i.e.

different size of the

_(aTL),

seven different values of

_heritability

were

assumed: the

_heritability

of the trait was varied from 0.10 to 0.70 in

_steps

of 0.10. The total

_genetic

variance consisted of a

_QTL

_component

and an unlinked

polygenic

component.

The additive allelic effect of the

QTL

was set so that the

QTL

variance accounted for

_10,

50 and

100 %

of the total

genetic

variance.

The number of alleles at the

_QTL

was 5. All of the six markers had six alleles with the same

_frequency.

To test the influence of marker

_polymorphism

on the

performance

of the

random model

_approach,

each of six marker loci was assumed to have _two,

four,

six or ten alleles with an

_{equal frequency.}

Two different heritabilities of

the trait were considered: 0.10 and 0.50. The number of alleles at the

QTL

was

five. The total

_genetic

variance was accounted for

by

the

QTL,

i.e. no

_polygenic

component

was simulated.

To test the robustness of the random model

approach against

the number of alleles at the

_QTL,

the

_QTL

was simulated with

two,

five or nine

_equally

frequent

alleles with additive effects.

_Again,

the

_phenotypic

trait was simulated

assuming

two different heritabilities: 0.10 and

_0.50,

with the

_{complete genetic}

variance due to the

_QTL.

Each of six marker loci had six

_equally

_frequent

alleles.

In each simulation two different

_sample

sizes were considered: 50 and 100

(9)

The ML interval

_{mapping procedure}

was

applied

to the simulated data. The chromosome was searched in

_steps

of 2 cM from the left to the

_right

end. Unknown

_parameters

_h!,

_h!

and

u

2

were estimated

_{simultaneously.}

The likelihood function was maximized with

_respect

to these

_parameters

_using

the

_{simplex algorithm provided by}

Xu

_{(pers. comm.).}

The test

_position

with the

_highest

LR was

_accepted

as the most

_likely

_position

of the

_QTL.

For each

_parameter

combination the simulation and

_analysis

were

repeated

100

times. The accuracy of estimation was

_judged

according

to an

empirical

95 %

symmetric

confidence

_interval,

estimated from the observed

_{between-replicate}

variation and calculated as

_2t,

_/2

_,

₉₉

times the

_empirical

standard error.

The

_empirical

distribution of the LR test statistics was

_generated

in the same manner for each

_parameter

combination under the null

hypothesis,

i.e.

_assuming

no

_QTL

in the entire

_segment.

A

_significance

level of 0.95 was chosen for all

analyses.

The

_empirical

threshold value was defined as the 95th

percentile

of the

empirical

distribution of the LR test statistics under

_H

_o

_.

The _powerwas defined as a

_percentage

of

_replications

in which the null

_hypothesis

was

_rejected

at the

5 %

significance

level. The distribution of the maximum LR values obtained under

_H

_o

for

_heritability

of the trait 0.10 and 0.50 is illustrated in

_figure

1.

4. NUMERICAL RESULTS

4.1.

_Heritability

of the trait and

_proportion

of

_QTL

variance

Estimates for the

_QTL

_location,

_averaged

over 100

_replicates,

with

(10)

When the

_{QTL explained}

the entire

_genetic

_variance,

the estimates for the

QTL

position

were close to the true

_parameter

value of 50 cM. When the

_QTL

explained

50 %

of the

_genetic

variance,

the estimates were close to the true

QTL position

when the

_heritability

of the trait was 0.30. When the

_QTL

explained only

10 %

of the

_variance,

the _{average estimates}were biased and close to the true value

_only

with a _very

high heritability

of the trait and a

sample

of 100 families.

When the

_genetic

variance is

_completely

due to the

_QTL,

the _accuracyof the

_{QTL position}

_estimates,

_given

as a width of the 95 %

_empirical

confidence

interval,

was

strongly

influenced

by

the

heritability

number of families. When

_heritability

increased from 0.10 to

_0.20,

the _accuracy of the estimates increased

_by

_{approximately}

40 %

_(the

confidence interval decreased from 10.9 to 6.3 cM and from 7.9 to 4.9 cM for 50 and 100

_families,

respectively).

With a further increase in

_heritability

to

_0.70,

the confidence interval decreased to 1.8 and 0.6 cM for 50 and 100

families, respectively.

Relative

_improvement

_{in accuracy}was smaller when the

_{QTL explained}

a

smaller

_proportion

of the

_genetic

variance. When

50 %

of the

_genetic

variance

was

explained by

the

QTL,

the increase in

heritability

of the trait from 0.10 to

0.20 resulted in a reduction of the confidence interval

_by

20

%.

With a

_QTL

explaining

only

10 %

of the

genetic

variance,

the

_improvement

_{in accuracy}with increased

_heritability

of the trait was _very

small,

regardless

of the

sample

size.

However,

generally,

more accurate estimates of the

_{QTL position}

were obtained

(11)

Estimates for

_QTL

_{(h 2),}

_polygenic

_(h’)

and total

_(hn

_heritability

are

_given

in table Il. Estimates for total

_{heritability,}

which

_represents

a sum of

_QTL

and

polygenic heritability,

were

equal

or _veryclose to the true

_parameter

values. When the

_{QTL explained}

10

%

of the total

_{genetic variance,}

the estimated

h2

was

relatively

close to the true value or

_only

_slightly

overestimated for the

heritability

of the trait = 0.10. With an increase in

heritability

from 0.10 to

0.40,

_h9

_was

overestimated. With further increase in

_heritability

_{(over 0.40),}

the bias became

_smaller,

so that the estimated

_{hy was}

close to the true value. This

_pattern

is visible in

_figure

2a. When 50

%

of the

_genetic

variance was

explained by QTL,

the estimates of

_{h9 followed}

a different

_pattern

_{(figure 2b).}

For low

_heritability

of the

_trait,

0.10 and

_0.20,

the estimates were close to the

true values of the

_parameter.

With further increase in

_{heritability,}

the estimates

became

_biased,

and

_finally

_considerably

underestimated when the

_heritability

of the trait reached 0.70. Even more severe downward bias was encountered

in the

_parameter

combinations in which

_QTL

accounted for the entire

_genetic

variance

_{(figure 2c).}

As the

_heritability

of the trait

_increased,

the estimated values of

_h9

_became

more and more biased. This

inability

of the random model

to

_{’pick up’}

a

_larger

_QTL

variance was observed

_{independently}

of the

_sample

size.

The

_empirical

_powerof

_{QTL detection,}

defined as the

_percentage

of

(12)

obtained

_by

data simulation under

_H

_o

_,

is

_given

in table 111. The _powerto detect

QTL

was

_{highly dependent}

on the

_heritability

of the trait. With a

_heritability

of

_0.10,

the maximum power was 32

%

(with

100 families and the

complete

genetic

variance accounted for

_by

the

_QTL).

With

_increasing

_heritability

of the

_trait,

the power increased

rapidly.

A further factor with a

_strong

influence

on _powerwas the

_proportion

of

_genetic

variance due to

_QTL.

When the

QTL

explained

only

10

%

of the total

genetic variance,

the power increased from 6 to 27

%

and from 6 to 34

%

for

_samples

of 50 and 100

families,

respectively,

(13)

explained

50 %

of the total

_{genetic variance,}

the _powerincreased much faster and reached over 90

%

_already

with a

heritability

of the trait of 0.40-0.50. Even

faster increase in power could be observed in

_parameter

combinations in which the

_{QTL explained}

the entire

_genetic

variance.

Figure 3

shows the LR

_{profiles averaged}

over 100

_replicates

for different

proportions

of

_genetic

variance due to

_QTL,

_heritability

of the trait = 0.10 and

sample

size = 50 families. The LR

_profiles

for different

_QTL

effects with the

same _parametercombination and

_heritability

of the trait = 0.50 are shown in

figure

4.

Both

_figures

show a flat

_profile

when

_QTL

accounts for

_only

10 %

genetic

variance,

regardless

of the

heritability

of the trait. With a

_higher

heritability

and

_greater

_proportion

of

_genetic

variance due to the

_QTL,

the LR

_profile

indicates the

_QTL

location _very

_precisely.

With the

_heritability

of the trait =

0.10,

the location of the

QTL

_is

_clearly

indicated

only

when the

_QTL

accounts for the

_complete

_genetic

variation.

_But,

the _averageLR in

this situation did not exceed a value of

_2.3,

which is far below our

_empirical

threshold value of 5.47.

4.2. Number of alleles at marker loci

The effect of the number of alleles at marker loci on the estimates of the

(14)

sizes and heritabilities of the

trait,

assuming

the

_{complete genetic}

variance due

to

_QTL,

is shown in table IV. The mean estimates for

QTL

location were

consistent for all

_parameter

combinations and close to the true

_parameter

value

₍₅₀

_cM),

_regardless

of the number of marker alleles. The confidence intervals _were,

_however,

narrower for

polyallelic

than for biallelic

markers,

which

indicated more accurate estimates when markers were

_polymorphic.

_Increasing

the number of alleles from four to six and ten did not affect the confidence interval. The heritabilities of the trait showed a

significant

influence on the

(15)

was

considerably

wider with the low

heritability

of the trait.

_Increasing

the

number of families also resulted in narrower confidence intervals and thus more

accurate estimates for the

_QTL

location.

Estimates for

QTL,

polygenic

and total

_heritability

for different numbers of marker

_alleles,

_heritability

of the trait and

_sample

size are

_given

in table V.

Estimates for total

_heritability

were close to simulated values for almost all

parameter

combinations,

except

for the situations with biallelic markers in

which ht was

overestimated. For

_heritability

of the trait =

_0.10,

estimates for both

_QTL

and

_{polygenic heritability}

were

_relatively

close to the true

_values,

regardless

of the number of marker alleles and other

_parameters.

For

_heritability

of the trait =

0.50, QTL

heritability

was

_again

_severely

biased downwards. The

estimated

_polygenic

component,

although

not

_simulated,

accounted for almost

(16)

The

_empirical

power for the same

_parameter

combinations is

_given

in

table VI. As

expected,

the _powerto detect

_{QTL strongly depended}

on the

heritability

of the trait. For a

_heritability

of the trait =

_0.50,

power was close

to 100 for all

_parameter

combinations.

_Therefore,

differences in _powerto detect

QTL

caused

_by

_parameters

other than

_heritability

could be observed

_only

for

parameter

combinations with

_heritability

of the trait = _0.10.The

power

mostly

increased when the number of marker alleles increased from two to four. With

a further increase in the number of marker

_alleles,

the power did not

_change

considerably.

Power was also

_{significantly}

increased with increased

_sample

size.

With 100

_families,

the _powerwas almost twice that with 50 families for all

parameter

combinations. A

_drop

in power from 42 to 32

%

when the number of marker alleles increased from four to six

_might

be due to the

_higher

threshold value obtained for this

parameter

combination.

Figures

5 and 6 show the LR

_profile

_averaged

over 100

replicates

for

_two,

four and ten marker

_{alleles, sample}

size of 50 families and

_heritability

of the

trait = 0.10 and

_0.50,

respectively. Figure

5 shows that the

QTL

location _was

not

_clearly

indicated with a low

_heritability

and a low number of marker alleles.

Increasing

the number of marker alleles to ten

improved

the estimate of the

QTL

location. With the

heritability

of 0.50

_(figure

_6),

the estimates of the

QTL

position

were

_{significantly improved.}

LR also increased with

_increasing

marker

polymorphism, especially

when the number of marker alleles increased from

two to four.

4.3. Number of alleles at

QTL

(17)

trait and

_{sample sizes, assuming}

the

_complete

_genetic

variance as due to the

QTL,

are shown in table VII. For all

parameter

combinations, regardless

of

any

parameter,

the estimates for the

QTL

position

were close to the simulated value of 50 cM.

_Empirical

confidence intervals

_depended

on the

heritability

of

the trait and

_sample

size. The confidence interval was

_considerably

decreased

by increasing

heritability

of the trait from 0.10 to 0.50.

_{Increasing sample}

size from 50 to 100 families also had a certain

_positive

influence on the _accuracy

of estimation. The number of alleles at the

QTL

does not seem to have _any

(18)

Table VIII shows estimates for

_h9,

_ha

_and

_{h; for}

different numbers of

QTL

alleles,

different heritabilities of the trait

and

different

_sample

sizes. As in

the

_previous

_analyses,

a severe downward bias in the estimates for

h9 and

a

_{corresponding upward}

bias in the estimates for

_ha

_{were encountered with}

a

_heritability

of the trait = 0.50.

Obviously,

this bias was not caused

_by

QTL,

because it was found in all

_parameter

combinations in which the simulated true

_heritability

of the trait was 0.50.

Power of

_QTL

detection for different numbers of

_QTL

_alleles,

different heritabilities of the trait and different

sample

sizes is

_given

in table IX. The

power was 100

%

with the

_heritability

of the trait =

_0.50,

_regardless

of _any

(19)

5. DISCUSSION

In the first

_part

of this

_study,

we

_investigated

the effects of the

_proportion

of

_genetic

variance due to

_{QTL, heritability}

of the trait and

_sample

size on

the estimates of

_QTL

_{parameters -}

_QTL

location and variance

_components,

and _power.The results of the simulation

_study

showed

_significant

effects of

proportion

of

_genetic

variance due to

_QTL

on the estimates for

_QTL

_location,

heritabilites and power. A

QTL

with a small

_effect,

which accounts for

_only

1

%

of the total

_{phenotypic variance,}

is _very

_unlikely

to be

_{precisely located,}

especially

when the

_sample

_comprises

_only

50 families. The location of the small

_QTL

cannot be

_{clearly indicated,}

as the estimates for

_QTL

location are

distributed

_along

the

_chromosome,

and the _{average estimate}over the

_replicates

takes almost a random value. On the

_contrary,

a

_QTL

with a

_{large effect,}

accounting

for 10

%

of the total

_{phenotypic variance,}

can be

_accurately

located

with

_only

50 families.

Estimation of

_QTL

_position

_yields

better results with a

_larger

_proportion

of

genetic

variance

_{explained by QTL}

and a

_{higher heritability}

of the trait. The

empirical

confidence interval for

_QTL

location shows that _accuracydecreased

significantly

when the

_proportion

of

_genetic

variance due to

_QTL

decreased from 100 to 10

_%,

_especially

with

_{high heritability}

of the trait.

_Sample

size has little influence on the _average

_estimates,

but

_{larger samples}

enable somewhat

more accurate estimates. These results are consistent with those obtained

_by

Xu and

_Atchley

_!22!,

who used the same

_approach

to estimate

_QTL

location and

_genetic

_parameters

in full-sib families.

The estimates of variance

_components,

herein

_given

as heritabilities

(ht ,

hg

2

and

_ha)

_{highly depend}

on the

_heritability

_proportion

of

genetic

variance

_explained

_by

the

_{QTL. Although}

the estimates of the total

genetic

variance

_(expressed

as

h t

2 )

are _veryclose to the true

_parameter

values

in all

_parameter

_{combinations,}

the _proper

_partition

of the

_QTL

and

_polygenic

(20)

a

_larger

bias

_accompanying

a

larger QTL.

However,

an underestimated

QTL

variance is

_always

_{accompanied by}

an overestimated

polygenic

variance,

so that

the sum of

h9 +

ha is

conserved at a value _veryclose to a simulated true value of total

_{heritability, indicating}

a successful

partitioning

of

_genetic

and residual

variance.

Confounding

between

_{h) and}

_{ha has}

been observed

_by

Gessler and Xu

_!9!,

who

_explained

this

_{phenomenon by}

differences in the models used for data simulation and estimation.

_They

simulated data

_using

a

_{monogenic model,}

and,

because the simulated

_h2

_was_zero,a

_partitioning

into

ha and

h9 under

the conserved

_{sum h9 +}

_{ha tended}

to reduce

_{h 9 2,}

and thus the estimates for

h2

were

biased downwards. In another

study,

Xu and Gessler

[23]

found an

overestimation of the

QTL

component

under a model

including

a non-zero

polygenic

component.

Their

_finding

_is,

_therefore,

_opposite

of what we found in our

study. Nevertheless, confounding

between variance

_components

has been

considered to be a

_general

_difficulty

of the

sib-pair

approach

!1,

4,

9!.

Recently,

Xu

_[21]

_proposed

a method to correct the bias in the estimates of the

_QTL

variance

_using

a

quadratic approximation

of the LR test statistic. This

problem,

however, requires

further research.

The power of

QTL

detection,

in

general, depends mostly

on the

_heritability

_proportion

of

_genetic

variance

_{explained by QTL.}

A small

QTL

in a small

_sample

_{is very}difficult to detect with

_certainty.

A

_{large QTL}

can,

however,

be detected with a

_high

_power,even in a small

_{sample. Increasing}

the number of families does not

_{significantly improve}

the _powerwhen the

_QTL

is small.

_Generally,

it can be concluded that a

_QTL

that

_explains

at least 30

%

of the

_phenotypic

variance can be detected with 100

%

_{power in any}

experimental

design.

To reach a

_satisfactory

_powerof 70-80

%

in a

_sufficiently

large sample,

a

QTL

must account for at least 15

%

of the

_phenotypic

variance. The second

_part

of this

_study

focused on the influence of marker

polymor-phism

on

_QTL

_parameter

estimates and _power,

_assuming

low and

high

her-itabilities of the trait

_(h

₂

= 0.10

and h

2

=

0.50),

and

using

small and

large

samples

(50

and 100 families with 25 half-sibs

_each).

The results showed that the mean estimates of the

_QTL

location were not

affected

_by

any of the

parameters

in the

_study

_{(heritability}

of the

_trait,

_sample

size and number of alleles at each of six marker

_loci).

_However,

the _accuracy of

_estimation,

_given

as a 95

%

empirical

confidence

interval,

was

_markedly

influenced

_by

the

_heritability

of the trait and the number of

_families,

and also

partially by

the number of marker alleles.

Several

_previous

studies found that the _accuracyof

_QTL

location is

_mostly

influenced

_by

the size of the

_QTL

effect and

_sample

size. Other

_parameters,

such as marker _map

resolution,

have little effect

!6!.

The results from our

_study

also showed

_positive

effects of

_{larger samples}

on the confidence interval in all

parameter

combinations.

Furthermore,

the accuracy of estimates for

QTL

location

improves

with an

increased number of marker loci. For markers with

_four,

six or ten

_equally

frequent

alleles and low

_heritability

of the trait

_(h

₂

=

0.10),

50 half-sib families

give

the same _accuracyas for markers with two alleles and double the number of families. An increased accuracy of the estimates for the

QTL

location with

polyallelic

markers was also

_{reported by}

Knott and

Haley

(17!.

This indicates

(21)

polymorphic

markers are used in the

_analysis.

The reduction of the number of animals to be

_genotyped

would

_{significantly}

reduce the costs of

_{QTL analysis,}

one of the

_major

limitations in

_mapping

and

_{utilizing QTL}

!5!.

In

_general,

estimated values for heritabilities are similar to those from the first

_part

of the

_{study. Only}

for biallelic

_markers,

the value of

_h

_is

biased

_upwards,

which indicates that biallelic markers do not

_provide

_enough

information to _{infer 7}_r

_properly.

_{For markers with > four}

_alleles,

the estimated heritabilities

_h9,

_h2

and

_ht

are

close to the simulated values in all

_parameter

combinations when the

_heritability

of the trait = 0.10. With the

heritability

of

the trait =

0.50,

hv and

h2 are

strongly

confounded,

and the sum of

h! +

ha is

relatively conserved,

for all

_parameter

combinations.

’

Apart

from the

_heritability

of the trait and the number of

_families,

another factor that influences power,

especially

when the

heritability

of the trait is

low,

is the

_{heterozygosity}

of marker loci. With an

_increasing

number of alleles at

marker

_loci,

one can

_expect

a

_higher

_powerof

_QTL

detection

_!11,

_17,

_20!.

The results of this

_study

indicate that _{power increases}

_{approximately by}

20

%

when the number of marker alleles increases from two to four. This is consistent

with the

expectation

that a linked

QTL

can be detected

_only

if the

parent

is

_heterozygous

for the marker locus. With biallelic

_{markers, only}

_1/2

of the parents is

_expected

to be

_{heterozygous.}

On the other

_hand,

with four marker

alleles,

the

proportion

of

parents

heterozygous

for individual marker loci will be

0.75,

which results in an increased

_proportion

of informative

_sib-pairs.

A further

increase in marker

_{heterozygosity}

_(from

four to six to ten

_alleles)

does not

result in a

significant

increase in power, because the

proportion

of

_heterozygous

parents

and informative half-sib

_pairs

does not

_{change drastically.}

Variations in power with four marker alleles found in our

_study

can be

_regarded

as random.

The third

_part

of the

_study

focused on the influence of the number of

_QTL

alleles on estimates for

QTL

_position,

variance

_components

and _power.The

results of the simulations

proved

the

_{insensitivity}

of the random model

_approach

against

_QTL.

The estimates of the

_{QTL position}

are _verysimilar for biallelic and for multiallelic

QTL. Also,

the _accuracyof the

estimates is affected

_{only by}

the

_heritability

of the

_trait,

the

_proportion

of the

genetic

variance

_explained

_by

_QTL

and

_sample

_size,

but not

_by

the number of

QTL

alleles.

Other authors who

compared performance

of the random model

_approach

in

_analyses

of biallelic and multiallelic

QTL

in full-sib families

_[22]

and

multigenerational pedigrees

[12]

reported comparable

results. This underlines the main

_advantage

of the random model

_approach

over other

_parametric

methods: its

_{flexibility regarding}

the actual number of alleles at the

_QTL.

The estimates of the variance

_components,

_expressed

as

hy,

ha and h t , 2

are

very similar to those from the

previous analyses.

With a

_{higher heritability}

of the

_trait,

_{hy is}

_severely

biased

downwards,

and

_{ha is,}

_accordingly,

biased

upwards.

The same bias can be observed for

_QTL

with _two,five and nine

alleles. This shows that the bias in estimates of the

_QTL

variance is not caused

by

deviation of the distribution of

_QTL

effects from

_normality,

as in the case of

a

biallelic, and,

partly,

five-allelic

QTL.

Even with nine

_{QTL alleles,}

when the

assumption

of the normal distribution of the

_QTL

effect

_fully

holds

_(with

nine

codominant alleles there are 45 different

_genotypes),

the bias in the estimates

(22)

is

_obviously

due to a

_general

_frailty

of a random model based on the

sib-pair

approach. Grignola

et al.

_[12]

who used a residual maximum likelihood method based on a

multigenerational pedigree

did not obtain biased estimates of

_QTL

and

_polygenic

variances.

Also,

the _powerto detect a

_QTL

shows little differences _among

designs

with

a

_QTL

with

_two,

five or nine alleles and

_{depends only}

on the

_heritability

of the

trait,

proportion

of

_QTL

variance and

_sample

size.

6. CONCLUSIONS

In this

_study

we showed that the interval

_{mapping procedure}

based on

the random model

approach,

initially designed

for

_{QTL mapping}

in human

populations

(22!,

can be

applied

to

_dairy

cattle

_populations

with

large

half-sib families.

_QTL

with

_{relatively large}

effects can be detected with

_high

power and

accurately located, especially

if a

_larger

number of families and

_polymorphic

markers are used.

The random model based on a

sib-pair approach requires

marker data

only

on _progenyand their

_parents,

which can be seen as an

advantage

when marker

data on older ancestors are not available.

However,

the method can be

_easily

extended to make use of available data from

general pedigrees.

This would

provide

better estimates of !rs because information from all relatives would be

jointly

used rather than

_{just using}

data from a

_pair

of individuals and their

parents.

The

_{relationships}

_amonganimals and

_inbreeding

would be taken into

account.

_Furthermore,

in the case of

_{missing parental}

_genotypes,

it would be

possible

to infer 7rS from the information available on other relatives.

Because of its robustness and

_simplicity,

the random model

_approach

is

recommended for

_{rapid screening}

of the whole genome, followed

by

a refined

analysis applied

to those chromosomal

_segments

that show some

_signals

of

QTL

presence,

using

more

_{sophisticated}

methods.

Also,

more

sophisticated

methods

should be used to estimate

_QTL

_variance,

because the random model

_approach

cannot

_{partition QTL}

and

_polygenic

variance

_properly.

_Furthermore,

certain

recently

developed

methods based on residual maximum likelihood

[12]

_maybe

considered as a

_possible

alternative to

_sib-pair

based methods.

ACKNOWLEDGMENTS

The authors want to thank the EMBRAPA and

CNPq,

Brazil

_(M.L.M.)

and the

Swiss National

_Foundation,

Switzerland

_(N.V.)

for financial _support,the ISU

Com-putational

Center for

_providing

resources, and Dr S. Xu for

providing

programs and invaluable

_suggestions.

This is Journal

Paper

no. J-17132 of the Iowa

_Agriculture

and Home Economic

_{Experiment Station, Ames, Iowa, Project}

no. _3146,and

_supported

by

the Hatch Act and State of Iowa funds.

REFERENCES

[1]

Amos

_C.I.,

Robust variance-components

approach

for

_{assessing genetic linkage}

in

_pedigrees,

Am. J. Hum. Genet. 54

₍₁₉₉₄₎

535-543.

[2]

Amos

C.I.,

Elston

R.C.,

Robust methods for the detection of

Random model approach for QTL mapping in half-sib families

HAL Id: hal-00894267

https://hal.archives-ouvertes.fr/hal-00894267

Submitted on 1 Jan 1999

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Random model approach for QTL mapping in half-sib

families

Mario L. Martinez, Natascha Vukasinovic, Freeman Gene (a.E.)

To cite this version:

Original

article

Random model

approach

for

QTL

mapping

in half-sib families

Mario L.

Martinez,

Natascha Vukasinovic*

Gene

(A.E.)

Freeman

Department

Science,

University, Ames,

(Received

April

1998;

accepted

1999)

mapping procedure

approach

applied

appropriateness

QTL

populations

prevailing

family

model,

QTL

using

techniques.

sib-pair approach.

proportion

identical-by-descent

(IBD)

QTL

flanking

QTL

(location

components)

using

data,

varying

families, heritability

proportion

QTL

QTL.

important

influencing

QTL

heritability

proportion

genetic

QTL.

QTL

QTL

QTL

_approach

_QTL

_Martinez,

_(A.E.)

_{University, Ames,}

_April

_1998;

_accepted

₁₉₉₉₎

_{mapping procedure}

_applied

_{appropriateness}

_populations

_prevailing

_model,

_{sib-pair approach.}

_flanking

_QTL

_(location

_components)

_using

_QTL

_QTL.

_important

_QTL

_QTL

_{higher heritability, confounding}

_QTL

_explains

_QTL

required. ©

_{model / interval}

_/

_(soeurs).

_procédure

_l’approche

_appliquée

_pertinence

_(aTLs

_prévaut

_reprints:

_Zurich,

_paramètres

_proportion

_flanquants.

_variance)

_familles,

_caractère,

_proportion

_QTL,

_QTL.

_plus

_proportion

_génétique

_QTL

_QTL.

_QTL

_QTL

_explique

_génétique

_{précisément}