Attenuating effects of preferential treatment with Student-t mixed linear models: a simulation study

(1)

HAL Id: hal-00894228

https://hal.archives-ouvertes.fr/hal-00894228

Submitted on 1 Jan 1998

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Attenuating effects of preferential treatment with

Student-t mixed linear models: a simulation study

Ismo Strandén, Daniel Gianola

To cite this version:

(2)

Original

article

Attenuating

effects

of

_preferential

treatment

with

Student-t

mixed

linear

models:

a

simulation

study

Ismo

Strandén

a

Daniel Gianola

a

Department

of Animal

_{Sciences, University}

of

_Wisconsin,

Madison,

WI 53706, USA

b

Animal Production

_{Research, Agricultural}

Research Centre -

_MTT,

31600

_Jokioinen,

Finland

(Received

14

_May

_1998;

_accepted

16 October

₁₉₉₈₎

Abstract - Preferential treatment of cows in four herds of a

_multiple

ovulation and

embryo

transfer scheme under selection was simulated. Prevalence and amount of

preferential

treatment

depended

on a function correlated with true

_breeding

value. Three mixed effect linear models were

_compared

in terms of their

_ability

to handle

preferential

treatment: the classical Gaussian

_model,

a model with multivariate

t-distributed errors clustered

_{by herd,}

and a model with

_independent

t-distributed

errors. In the models with t-distributed _errors,both the scale _parametersand the

degrees

of freedom were considered unknown. A

_{Bayesian analysis}

was carried out

for all three models via the Gibbs

_sampler,

and posterior means were used to infer

about _genetic

_{variance, herd-year effects, breeding}

values and realised response to

selection. Performance over _repeated

_sampling

was assessed via Monte Carlo mean

squared

error. In the absence of

preferential

treatment, the three models had a

similar

_performance.

When

_preferential

treatment was

_prevalent

and _strong,the

univariate t-model was the

best; hence,

the Gaussian

_assumption

for the errors was

_{clearly inappropriate.}

It _appearsthat some robust linear models can handle

preferential

treatment of animals better than the standard mixed effect linear model with Gaussian

_assumptions.

_©

_{Inra/Elsevier,}

Paris

dairy cattle

_{/ preferential}

treatment

_{/ simulation / Bayesian}

statistics

_{/ Student-t}

distribution

_{/ Gibbs}

_sampling

*

Correspondence

and reprints

E-mail: [email protected]

Résumé - Atténuation des effets de traitement _{préférentiel}dans un modèle

linéaire mixte à distribution de Student

_(t).

Étude

de simulation. On a simulé le

traitement

_{préférentiel}

de certaines vaches dans _{quatre troupeaux}de sélection utilisant

(3)

ont

_dépendu

d’une fonction corrélée à la valeur

_génétique

vraie. On a

comparé

trois modèles linéaires mixtes _pourleur

aptitude

à

_prendre

en _comptele traitement

préférentiel :

le modèle

classique Gaussien,

un modèle avec des erreurs t-multivariates

groupées

par troupeau et un modèle avec des erreurs t-distribuées

indépendantes.

Dans le modèle où les erreurs suivaient une distribution _t,les

paramètres

d’échelle et les

_degrés

de liberté ont été considérés inconnus. Une

analyse

bayésienne

a

été effectuée pour les trois modèles à partir de

l’échantillonnage

de Gibbs et les moyennes a _posterioriont été utilisées _pouren inférer au

_sujet

de la variance

génétique,

des effets

troupeau-année,

des valeurs

_génétiques

et des

_réponses

réalisées à la sélection. La

_performance

des modèles a été évaluée au travers des erreurs

quadratiques

moyennes. En l’absence de traitement

préférentiel,

les trois modèles

ont eu une

_performance

similaire.

_Quand

le traitement

préférentiel

a été

_fréquent

et

d’effet important, le modèle t-univariate a été le meilleur et le modèle Gaussien a été

clairement

_inadapté.

Il

_apparaît

_quedes modèles linéaires robustes _peuvent

_prendre

en

compte les traitements

_{préférentiels}

mieux _queles modèles linéaires mixtes Gaussiens

classiques. @

Inra/Elsevier,

Paris

bovins laitiers

_/

traitement _{préférentiel}

_/

simulation / statistique bayésienne /

distribution de Student

1. INTRODUCTION

Preferential treatment is _any

_management

_practice

that is

_applied

non-randomly

to animals within a

_contemporary

_group

[9].

For

_example,

better

housing

and

_feeding,

hormonal

_treatment,

_{longer milking}

intervals on test

_day

and

_{feeding according}

to

_production

are known to be

_{applied selectively}

in

dairy production.

Preferential treatment occurs in

_{dairy cattle, presumably}

to

increase the economic value of a cow or the

_probability

that it will be chosen as

a bull-dam. Several studies

(e.g. [17,

20!)

have found that

_genetic

evaluations

for milk

_yield

are inconsistent with

expectations

based on

_theory.

This _may

be due to

_inadequate

statistical

assumptions

or failure to account

_properly

for selection or

_preferential

treatment of cows.

Preferential treatment is often

_suspected

when no

_apparent

reasons exist for

such

_{discrepancies.}

Kuhn et al.

_[9]

simulated effects of

preferential

treatment

on ’animal model’

_genetic

evaluations. Mean

squared

error of

prediction

of

breeding

values increased as the extent of

_preferential

treatment increased. Kuhn and Freeman

_[10]

found that when the dam of a sire was treated

preferentially,

more than 30

_daughters

with untreated records were needed to

offset the bias in

_prediction

of

_breeding

value caused

_by

the dam’s information. Bias increased as the

_proportion

and number of

_{daughters receiving}

preferential

treatment increased. Bias decreased when all

_{daughters given preferential}

treatment were in the same

_herd;

this is so because the

_{’herd-year’}

effect in the model

_{captures part}

of the

preferential

treatment administered in a

_particular

herd-year.

In order to account for

preferential

treatment,

Harbers et al.

_[7]

included an

environmental correlation between related females in a

_genetic

evaluation model

for a MOET

(multiple

ovulation and

embryo

transfer)

scheme. This

improved

accuracy of cow evaluations when

preferential

treatment was mild.

_Weigel

et al.

[29]

simulated different

_strategies

of

_preferential

treatment and found that it

(4)

within a herd is treated

preferentially.

Burnside and

_Meyer

[3]

simulated effects

of bovine

_somatotropin

_(bST).

Sire evaluations were least accurate when bST administration was

targeted

to the best

producing

cows.

In the context of

prediction

(e.g. !8]),

a bias takes

place

when the

_expected

values of the

_predictand

and of the

_predi!tor

differ. Evaluation of bias

_requires

knowledge

of the true model

_but,

in

_practice,

this is not

_available,

so ad hoc

assessments of bias have been

_suggested.

Several studies

_[15,

_{16, 27,}

_28]

found

upward

’biases’ of cow’s

_pedigree

indexes for

_protein

or milk

yield

in Finnish

Ayrshire.

It is unclear if this

_discrepancy

is due to

_chance,

but

_preferential

treatment of dams of cows _maybe a

_culprit.

On the other

_hand,

Powell

and Norman

_[19]

found that

_pedigree

indexes understated the first estimated

breeding

values of

_daughters

of _{proven sires}mated to lower

_producing

dams. Little work has been undertaken on how _{to cope}with

_preferential

treatment in

_practice,

at least from a statistical

_point

of view. Kuhn and Freeman

[11]

studied power transformations of records but this was, at

best, slightly

effective

in

_reducing

bias due to

_preferential

treatment. An alternative

_approach

is to

consider an error distribution with thicker tails than the

normal,

to allow for

more variation. A

_commonly

used one is the

_{t-distribution,}

which is

_symmetric

and

_leptokurtic.

It has been advocated because of its

_simplicity

_[12],

and because

_only

one

_parameter

(the

degrees

of

freedom)

is needed to describe robustness. A suitable robust distribution _maybe

_capable

of

_attenuating

the

impact

of outliers on data

_{analysis. Many}

authors have

_employed

statistical

models with t-distributed residuals

_[4,

12,

13, 25,

31]

in linear and non-linear

regression models,

with

_varying

_degrees

of success. Use of the t-distribution in

the context of mixed effects or hierarchical models is

_relatively

recent

_!1,

_2,

_5,

6, 22-24, 26,

30].

Our

objective

was to assess

frequentist properties

of

Bayesian point

estima-tors obtained from mixed linear models where residuals were assumed to be either Gaussian or t-distributed. Milk

_production

records obtained in herds in

which some

preferential

treatment was

practised

were simulated. The

analy-sis focused on mean

_squared

error of estimation of

_{genetic variance,}

_herd-year

effects,

breeding

values and

_genetic

_responseto selection.

2. STRUCTURE OF THE SIMULATION

2.1.

_{Conceptual population}

Milk

_production

records in a

_hypothetical

’adult’ MOET nucleus scheme

[18]

were simulated. The scheme extended the

_simple

hierarchical

_mating

structure

of Stranden et al.

_!21].

Our modification allowed bulls of the

_{previous generation}

to mate current

_generation

females. The nucleus consisted of 32 cows and

eight

bulls _{in every}

_generation.

In each

_generation,

_everynucleus cow

_produced

(by

multiple

ovulation and

_embryo

transfer to

_recipients)

_eight

_offspring,

four females and four males. An animal could be selected

only

once into the nucleus

as a

_parent

and unselected animals were culled. The females were selected

among those

offspring

to the nucleus that had

completed

a first lactation. Males

were selected within those that had been born in the

_{preceding generation.}

In

(5)

Thus,

males within a full-sib

_family

had the same estimated

_breeding

value and

three such males were

randomly

discarded. Each selected male was mated to

four cows, chosen

randomly

from those that had been selected as

replacements.

. 8 1 32 1 .

Selection pressure in males and females

_was

₃₂

₄

and

2g

= -,

respectively,

32 4 128 4

per

generation.

With this scheme carried out for four

_generations,

the data included 544 cows with records

(32

in the base

_plus

32 x 4 x 4 = 512 female

progeny)

and 32 sires with

_daughters

in

_production,

i.e. a total of 576 animals.

A

_diagram

of the simulated

population

is shown in

_figure

1.

Base

_generation

cows were

_assigned

to four herds in

_{equal numbers,}

i.e.

eight

cows _perherd. Female

_offspring

of a cow remained in the same herd as her

dam,

whereas sires were used across herds.

_Breeding

values of base

animals were drawn at random from

_N(0,0.25)

distribution. Records of the base animals were

_generated

_{by adding}

a

_herd-year

effect

_{(independently,}

normally

distributed)

to a

_breeding

value and to an

_{independently}

drawn

residual from

_N(0,0.75)

distribution. Records in

_{subsequent generations}

were

simulated

_similarly,

_except

that the

_breeding

value of an individual was formed

(6)

segregation

residual,

where

_a

₂

_is

the additive

_genetic

variance and

F

is

the _average

_inbreeding

coefficient of the

parents.

The selection criterion in

the

_breeding

scheme was BLUP of

_breeding

value with the true variance

components.

The statistical model included

_{the herd-year}

as a fixed effect and animal as a random effect

(but

ignored

preferential

treatment,

as discussed

later)

using

all

_{genetic relationships}

available up to the time of selection.

2.2. Preferential treatment

In

_practice,

_preferential

treatment takes

_place

in the course of a selection

programme so this is the _waythat the

present

simulation

_proceeded.

None of the base

population

cows were treated

preferentially,

so there were 512 cows

eligible

to receive

_preferential

treatment. A scheme in which the

_preferential

treatment

_{assigned depended}

on the

’perceived’

breeding

value of an animal

(e.g.

based on a

_genetic

evaluation available before the animal

_produces

the

record)

was

_adopted.

The records were

_generated

as

where yij is the record of

animal j made

in

herd-year

i, h

i

is a

herd-year

effect,

Uj is the

breeding

value of animal

j,

and eij is an

independent

residual. The

preferential

treatment

_02!

was a stochastic effect

_taking

the values:

where

_!(.)

is the standard normal cumulative distribution

function,

pmin is a

constant smaller than the

_herd-year

effect

_h

_i

_,

_{and u/j =}

_!+(u!+v!)

/

_afl

+

_w

is a ’value’ function such that _Ui_rv

A!(0,er!),

_-_j_v

N(0,

Q

v),

_{Cov(u_,,f}

_j

₎

=

_0,

so _Wj_rv

N(À, 1).

In the

_preceding,

0’

;

is the variance of

_breeding

values and

w

j2

is an

’uncertainty’

variance. The ratio

O’! 2 describes

the

_uncertainty

the herd

!u

manager has about the true

_breeding

value of animal

_j.

For

_example,

if the breeder _{is very}uncertain about the

_breeding

value of the

_animal,

this ratio of variances should be

_high.

Three values of the

_uncertainty

were

considered,

0’2

1

1 =

——,

1,

100. The correlation between Wj and the

breeding

value Uj is

/ 100

j2

!l/2

C1

1 + ;! 2)-1/2

giving 0.995, 0.71

and 0.10 at values of the

_{uncertainty equal}

B

!/

to 100’ 1 and

₁₀₀

100,

respectively.

The

_preferential

treatment scheme in

_equation

₍₂₎

induces a correlation

be-tween related animals

_{Corr(u/j , Wjl)}

= a JJ ’10’2

u +

2 Cov( 2 v

J’ VI) J

_where

_ajj,is the

tween related animals

_{Corr(iu,,w,’) =}

—&dquo;—!—!——&dquo; v

, where a,,’ is the

a! + a!

additive

_relationship

between

_{animals j and}

_j’.

If the _Vj deviates are

inde-pendent,

then

_{Corr(u/j ,}

_u

_{/j, )}

=

a!!!!!!(!u

₊

o!)’

For

example, if j and

j’are

full-sibs and

_Q

_z

=

0’

;,

(7)

the

_breeding

value the

_higher

the amount

_preferential

treatment and the chance of

_receiving

it.

The constant pmin in

equation

(2)

controls the range of

production

associated

with

_preferential

treatment. It was set

_equal

to

-5

Qh

where ahis the standard

deviation of

_herd-year

effects. These were drawn from a normal distribution

with mean zero and variance

a

2

and two different values of the

herd-year

variance were considered:

afl

=

au

and

2 U

2

The _constantA controls the

proportion

of cows to be

_{preferentially}

treated. Normal distribution

_theory

can

be

used

to find a value of A such that a desired

_proportion

of cows receives

preferential

treatment. The

_proportion

of

preferentially

treated cows increases

with

.!,

because

_Pr(u/j

>

₀₎

increases

_{concomitantly.}

Three different

prevalences

of

_preferential

treatment were considered: 1 out of

10,

1 out of

32,

and 1 out

of 64 cows. These

_correspond

to A values of

_-1.2816,

-1.8627 and

-2.1539,

respectively.

It was intended to

_keep

the

_proportion

of

preferentially

treated animals

roughly

constant from

_generation

to

_generation.

To do _so,it must be noted that selection is

_expected

to increase mean

_breeding

value and to reduce

genetic

variance over time. In order to account for these

_effects,

the formula for w was

changed

to:

where u is the mean

_breeding

value of animals available for

preferential

treatment in the

_generation

to which

_{animal j belongs,}

and

_Su

_is

the additive

genetic

variance for individuals born in that

_generation.

The

_probability

distribution of the amount of

_preferential

treatment

_(Di!)

depends

on the values of _Qh and A as shown in the

Appendix.

The _average

amount of

preferential

treatment

_actually

_applied

was assessed via a

simula-tion of 1000

_replicates

of the MOET scheme. Mean increase

_(mean

of

₀₎

in

production

due to

_preferential

treatment under

_{varying prevalence}

of

prefer-ential treatment and amount of

_herd-year

variance is in table I. As

_intended,

production

increased with

_prevalence

of

_preferential

_treatment,

and with

_{or h* 2}

Average

value of

_preferential

treatment was not affected

_by

level of

_uncertainty

j

2 2 .

This is not shown in table

_1,

but it was

_expected

because the distribution

Q!

of

_A

_zj

does not

_depend

on this ratio.

Table I.

_Average

increase in simulated lactation

production

due to

preferential

treatment as a function of

herd-year

variance

(a

h

)

and of

prevalence

of

preferential

treatment

_(values

in

_parenthesis

are Monte Carlo standard errors from 1 000

_replicates

(8)

2.3. Statistical models and

_computations

Three linear statistical models were

_compared,

both with and without

pref-erential treatment

_incorporated

in the simulation. The

_objective

was to assess

the relative

_ability

of these models to handle

_{perturbations}

caused

_by

unknown

preferential

treatment. In all three

models,

the linear structure for the records included an unknown

_herd-year

effect

(treated

as fixed

_{computationally),}

the unknown

_breeding

value of the cow and a

_residual,

distributed

_according

to an

appropriate

error

_{distribution,}

as noted below. In the three

_models,

a

multivari-ate normal distribution

_N(0,

_Aa!),

where A is a 576 x 576

_{relationship matrix,}

was used for the

_{genetic effects,}

so there was no difference in this

_respect.

The

three

_{models, differing only}

in the error distribution were the

_following.

1)

G: a

_purely

Gaussian model with errors

Niid(0,

Q

e ).

2)

t-l: errors were

independently

and

_identically

distributed as

_{univariate-t,}

t

i

(0,

0’ e 2,

v,).

Here,

the variance of the distribution is

_U2V

_,I(V,

_e

_-

_2),

where

_Q

_e

is

a scale

_parameter

and v, are the unknown

_degrees

of freedom.

3)

t-H: within herd

_{i (i}

=

1,

2, 3,

4),

the error _{vector e}_ihad the

multivariate-t distribution

_t

_ni

_(0,

_I

_ni

_U2

_,

₎

_e

_v

where ni is the number of records in herd i.

Here,

Var(e

i

)

=

I!o!fe/(fe - 2).

Although

the _{errors are}

_{uncorrelated,}

they

are not

independent,

this

_being

a

_property

of the multivariate t-distribution. Error

vectors in different herds were

_{mutually independent, however,}

but with the

same

or and

_v_e

_parameters.

We refer to this model as a ’herd-clustered’ one.

The G model is the usual _one;model t-1 _{discounts outliers y}_zj on a ’case’

by

’case’

basis,

and model t-H discounts

_outlying

vectors _y_z for the entire herd i. Because the ’value function’ _w!used to

_generate

_preferential

treatment

does not

_depend

on the

herd,

there is no

_apparent

reason

_why

model t-H should

outperform

model t-1. It should be noted that as Ve -j _oo,the two

t-distributions tend towards the Gaussian one.

A

_Bayesian

structure was

_adopted

for inference. Prior distributions were the same for all three models. Herd effects were

assigned

a uniform

_{prior and,}

as

noted,

a multivariate normal _processwas used as a

_prior

distribution for the

breeding

values. The

_dispersion

_components

_Q

_u

_{and Q}

_e

_were

_assigned

inde-pendent

scaled inverted

_chi-square

distributions with four

_degrees

of freedom and mean

_equal

to the true variance

_component,

i.e. 0.75 for the residual

vari-ance and 0.25 for the

_genetic

variance. In the

_t-models,

the

_prior

for

_a

_is

for the scale of the distribution and not for the residual

_variance,

which is

a e

2v

,l(v, -

2)

as noted before. In the two models

_involving

the

_{t-distribution,}

the residual

_degrees

of freedom

_parameter

v, was considered unknown.

_Degrees

of freedom values allowed in the herd-clustered t-model were

_{4, 10,}

100 or 1

000,

all

_{equally likely,}

a

_priori.

In the univariate t-distribution

_model,

the _space

of v

e

was

_{4, 6, 8, 10,}

12 or

_14,

all

_{receiving equal prior}

_probability.

These values

were chosen

_arbitrarily.

It is

_possible

to use a continuous

_prior

for _v,

[23]

but

the discrete distribution

_employed

here facilitated

_{implementation.}

A Gibbs

sampler

was used to _{carry out}the

_{Bayesian computations employing}

the full conditional distributions described in Strandén

_[22].

Tests made in several sim-ulations with

_{varying starting}

values indicated that a burn-in

period

of 7 000

iterates with 70 000 Gibbs iterates thereafter

_(all

_samples

_kept)

was

_enough

(9)

About 60 min of CPU time were

required

to

_perform

70 000

_iterations,

for _any of the

models,

in an HP

_{9 000(3)}

_computer.

2.4.

_Frequentist

_comparison

Each

_replicate

of the simulation consisted of a data set

_generated

as _perthe

scheme in

_figure

1 under the

appropriate assumptions

of

_preferential

treatment.

A

_{Bayesian analysis}

of the data set

according

to each of the three models was

carried out in each

_replicate.

Mean

_squared

errors of

_posterior

mean estimates

were

_computed,

over

replicates,

for:

a)

genetic

variance,

b)

herd-year effects,

and

_c)

breeding

values. Mean

squared

errors were also

_computed

for three

classes of

_breeding

values:

_sires,

cows who had been

preferentially

treated and cows without

preferential

treatment.

d)

An additional

end-point

of interest was mean

_squared

error of estimated _responseto

_selection,

assessed

_by

_predicting

breeding

values

_{using posterior}

means from the three models contrasted. ’True’

response was the mean difference in true

breeding

value

(due

to selection

using

BLUP)

between animals born in the last

_generation

and those born in the first

generation.

Differences in mean

_squared

errors between models should reflect

the relative accuracy of estimation of

genetic

trend.

A

_’pilot

run’

_[14]

was conducted to assess the number of

replicates

needed to

attain

_enough

_precision

for a

_parameter

of interest. The

approximate

number

of

_{replications required}

to achieve an absolute

_precision

r for the confidence

interval

_given

a

_pilot

run of n

replicates

was found

_using:

where

_t

_i

_-

₁

_,

₁

_-

_a/2

is the value of a t-distribution with i -1

degrees

of freedom at

the

_{100(1 -}

_a)

_percentile

_{(’confidence’).}

Our

pilot study

consisted of

_carrying

n = 20

_replicates

for each of the three models. The number of

replications

re-quired

to achieve 0.05

_precision

with 95

%

confidence for the

genetic

variance

was less than 60 for most cases.

_Hence,

it was decided that all cases would

be

_replicated

60 times. Absolute

_precision

was recalculated after

60 replicates,

and a further 40

_replicates

were made for the schemes

involving

1/10

preva-lence of

preferential

treatment. One scheme

92

2 =

3, -—

2 = 100 )

required

an

(

a2 a2

u u

1)

100

/

additional 40

_replicates

to achieve the

_required

_precision.

Table II indicates the schemes and number of

replicates

performed..

Because of its

_{heavy computing requirements,}

the

_analysis

was

performed

using

a network of machines administered

by

Professor Miron

Livny

of the

Department

of

_Computer

_{Science, University}

of Wisconsin at Madison. This cluster was accessed

using

the Condor

_system,

which allows

running jobs

simultaneously

at _many

_computers

while the data and _programreside in one

computer.

Each

_replicate

of each model was a _processto be executed in this network of

_computers.

There were between 10 and 15

computers

available at

(10)

3. RESULTS AND DISCUSSION

3.1. Absence of

_preferential

treatment

The

_objective

here was to examine

_possible

losses in

_efficiency

due to

_using

the two t-distribution models when there is no

preferential

treatment and the

Gaussian

_assumption

holds

_{throughout. Averages}

and mean

squared

errors of

estimates of additive

_genetic

variance are

_given

in table III. The

_posterior

means

of

_afl

_for

each of the three models were

practically unbiased,

in

light

of the

Monte Carlo variation.

_However,

the mean

_squared

error was

_larger

for the two

t-models than for the Gaussian one.

_Hence,

if the Gaussian

_assumption

holds,

posterior

means of additive

_genetic

variance for the t-models are less accurate

than those from the G-model. The increase in mean

_squared

error over the

Gaussian model was about 5-6

%

for the t-H

model,

and 7-18

%

for the t-1

model.

Tables IV and

_Vgive

the

posterior

distributions of the

_degrees

of freedom for the two t-models in the absence of

_preferential

treatment. The

analysis

carried

out with the herd-clustered t-model

clearly

favoured a model with Gaussian

errors, as indicated

_by

a

posterior probability

of about 90

%

for the

degrees

of freedom

being

larger

than 10.

_Also,

the univariate t-model

_assigned

the

highest posterior probability,

about 40

_%,

to the

_largest

value of the

_degrees

of freedom

_(v

_e

=

14)

considered. The

posterior

distributions were not

_sharp,

this

_being

a function of the low informational content the data have about

v e

.

However,

both

analyses

favoured the

larger

values

of v

e

or,

equivalently,

the

Gaussian

assumption

for the errors. For

_example,

in the herd-clustered

t-model,

the

_posterior

odds ratio of v, = 1000 relative to v, = 4 was 17.7 and 29:7 for

2 =

1 and

ah

2 =

3,

respectively.

In the univariate

_t-model,

the odds ratio

(

(11)

of v, = 14 relative _{to v}_e = 4 was 384 and 404 for the two values of the ratio

between herd and additive

_genetic

variances.

Mean

_squared

errors of estimates of location

_parameters

were similar in

all models

_{(table VI! ,}

_although

_slightly

smaller for the G-model. As

_expected,

mean

_squared

errors were

larger

for

breeding

values of cows

(smallest

amount of

(12)

In summary, in the absence of

_preferential

treatment and with the Gaussian

assumption

holding throughout,

the t-models were less accurate for estimation

of

_Q!,

but were as

_competitive

as the Gaussian model for estimation of

breeding

values and of

_genetic

trend.

3.2.

_{Preferentially}

treated data

3.2.1. Additive

_genetic

variance

Mean

_squared

error of estimates of additive

_genetic

variance are in

_figure

2.

Differences between models were clearest when

_preferential

treatment was more

_prevalent

(1/10)

and when the

herd-year

variance was

high

(this

affects

the distribution of

_0).

_Also,

differences between models were

_largest

when

uncertainty

about true

_breeding

values was

_low,

so the value function is a

high

correlate of

_breeding

value. There was little difference between the G and

the t-H

models,

but the univariate t-model had the best

_performance

when

prevalence

of

preferential

treatment was medium

(1 out

of 32

cows)

or

_high

(1 out

of 10

_cows).

The univariate t-model was robust to variation in the

uncertainty

parameter;

this was not the case for the G and the t-H

_models,

whose

performance

was

hampered

under severe forms of

preferential

treatment.

3.2.2. Posterior distribution of the

degrees

of freedom

Posterior

_{probabilities}

of the

_degrees

of freedom under the herd-clustered t-model were often

_higher

for the

_larger

values of _v_e_,thus

_supporting

the

Gaussian

_model,

_especially

when

_preferential

treatment was uncommon, or

uncertainty

was

high. Only

under medium

(1/32)

or

_high

(1/10)

_prevalence

of

_preferential

treatment and a

_{high herd-year}

variance the

largest

values of

(13)

this

_depended

on the level of

_uncertainty

and on the amount of

_herd-year

variance. For

_example,

when

_prevalence

of

_preferential

treatment was

_1/10

and with

₍

_T2

=

3a’,

low values of the

degrees

of freedom had

_higher

_posterior

probabilities

when

_uncertainty

was

low; however,

as

_{uncertainty increased,}

the

posterior

distribution tended to favour

_larger

values of the

_degrees

of freedom. Posterior

_{probabilities}

of v, for the univariate t-model are

_given

in

_figure

3.

Here,

posterior

distributions tended to be flat.

_{Higher probabilities}

were

as-signed

to the

_largest

values of the

_degrees

of freedom

_only

when

_preferential

treatment was rare and the

_herd-year

variance low. As in the herd-clustered

t-model,

high

uncertainty

often resulted in

_{higher probabilities}

_assigned

to the

(14)

of freedom values also received

_relatively

high

probabilities.

When

preferen-tial treatment was

prevalent

(1/10)

and the

herd-year

variance was

large,

the

posterior

distribution was

_sharp,

with a modal value of ve = 4 at all levels of

uncertainty.

This

_points

_awayfrom a Gaussian distribution of the residuals.

With a small data set such as the one in this simulated MOET

scheme,

one

should not

_expect

the

_posterior

distribution of the

degrees

of freedom

param-eter to be

_{highly peaked. Nevertheless,}

the univariate t-model

recognised

the

non-Gaussian situation even when

prevalence

was rare

(1/64),

provided

that

the variance between herds was

_{relatively large.}

This is because the

expected

value of the

_preferential

treatment,

E(Di!),

increased with

_{or2 h,}

as illustrated in

(15)

3.2.3. Estimates of

_{herd-year effects, breeding}

values and

_genetic

response to selection

Average

of mean

_squared

error of estimates of

_herd-year

effects was similar for the three models

_except

when

_preferential

treatment was

prevalent

or

herd-year variance was

_high,

but it was

always

smallest for the univariate t-model

(figure l!).

When

preferential

treatment was common

(1/10),

the univariate

(16)

The _averageof mean

_squared

error of estimates of all

_breeding

values is

shown in

_figure

5. This criterion was about the same with all models

_except

when

_preferential

treatment was common and the

_herd-year

variance

_high.

Here,

when

_uncertainty

was

_high,

there were no differences between the

_models,

but at low levels of

_uncertainty,

the univariate t-model was

markedly

_superior.

The

_picture

for mean

_squared

errors of estimates of sire and cow

_breeding

values and for

_genetic

_responsewas similar to that for of all

_{breeding values,}

so the

_figures

are not

_presented.

In all _cases,differences between models were

clear, favouring

the univariate t-model when

_preferential

treatment was more

prevalent

(1/10).

The same was true for

_{preferentially}

treated cows

_{(figure 6),}

but mean

_squared

errors were

_larger

than for

_breeding

values of cows that were

(17)

performance

than the Gaussian or herd-clustered models when

preferential

treatment was rare or

_{mildly prevalent,}

but it was

_superior

when such treatment

was common. In

_particular,

at the lowest level of

_uncertainty

and at the

_highest

herd-year variance,

the univariate t-model gave

predictions

of

_breeding

value of

_{preferentially}

treated cows that had a mean

_squared

error of about a third of that observed with the Gaussian model. In this

_situation,

the herd-clustered

.

(18)

4. CONCLUSIONS

In the absence of

_preferential

_treatment,

the t-models were as

good

as the

Gaussian model for

_estimating

_breeding

values and _responseto selection. When

preferential

treatment was

_{mildly prevalent}

(1/32)

the models

performed

simi-larly.

However,

when

_preferential

treatment was common

(1/10)

and

especially

when the

_herd-year

variance was

large

relative to the additive

genetic

variance,

the univariate t-model was

_clearly

the

_best,

at least in terms of mean

_squared

error. Under

preferential

_treatment,

the

posterior

distribution of the

degrees

of freedom in the univariate t-model

_pointed

_awayfrom the correctness of the Gaussian

assumption.

The univariate t-model was

_quite

robust to variation in

the simulation

parameters,

but it is unknown whether this robustness holds

across different forms of

_preferential

treatment.

This simulation could not differentiate

clearly

between the Gaussian and the herd-clustered

_t-models,

_although

the latter was

always slightly

better under

preferential

treatment. A reason for the lack of difference between these two

models may be the low number of herds in the simulation. With a few clusters

(herds)

the statistical information about the

degrees

of freedom is

_low,

so the

posterior

distribution of this

_parameter

cannot be estimated

_accurately.

In

_conclusion,

it appears that the univariate t-model can attenuate adverse effects of

_preferential

treatment as

applied

here. It leads to better inferences

about

breeding

values and

_genetic

trends than those obtained with the Gaussian

model,

especially

when

_preferential

treatment is

_prevalent,

at least under the conditions of the

_{study. If,}

on the other

hand, preferential

treatment is

non-existent,

or the

assumption

of a Gaussian distribution of the residuals seems

to be

_true,

there is little loss in

_efficiency

from

using

a robust

_model,

such as the univariate t. It is

encouraging

that a

_symmetric

error

distribution,

such as Student

_t,

improved

_uponthe Gaussian one under a

single-tailed

form of

_preferential

treatment as in

_equation

(2).

This

_suggests

that a robust

asymmetric

distribution _maydo even

_better,

but

_perhaps

at the _expenseof

conceptual

and

_{computational simplicity.}

ACKNOWLEDGEMENT

We wish to thank W.G.

_{Hill, University}

of

Edinburgh,

for some useful comments.

REFERENCES

[1]

Albert

J.H., Chib S., Bayesian analysis

of

_binary

and

_{polychotomous}

response

data,

J. Am. Stat. Assoc. 88

₍₁₉₉₃₎

669-679.

[2]

Besag

J., Green

P.,

Higdon D., Mengersen K., Bayesian computation

and stochastic _systems,Stat. Sci. 10

₍₁₉₉₅₎

3-66.

[3]

Burnside

E.B., Meyer

K., Potential

_impact

of bovine _somatotropinon

_dairy

sire evaluation, J.

_Dairy

Sci. 71

₍₁₉₈₈₎

2210-2219.

[4]

Geweke J.,

Bayesian

treatment of the

_independent

Student-t linear model,

J.

_Appl.

Econometrics 8

₍₁₉₉₃₎

S19-S40.

[5]

Gianola D., Stranden I.,

Foulley

J.L., Modelos lineales con distribuciones t:

potencial

en

_{genetica quantitativa, Actas,}

5ta Conferencia

Espanola

de Biometria,

(19)

[6]

Gianola D., Sorensen D., A mixed effects threshold model with a t

distri-bution,

47th Annual

_Meeting

of the

_European

Association for Animal Production,

Lillehammer, Norway,

1996, 15 _p.

[7]

Harbers

_A.G.F.,

Lohuis

_M.M.,

Dekkers

_J.C.M.,

Correction for

_preferential

treatment of MOET families

_{by including}

an environmental correlation in

_genetic

evaluation,

in:

_Proceedings

of the 5th World

_Congress

on Genetics

Applied

to

Livestock

_{Production, Guelph, Canada,}

_1994,vol. 17, pp. 11-14.

[8]

Henderson

C.R., Applications

of Linear Models in Animal

_{Breeding, University}

of

_{Guelph, Guelph,}

Ontario,

Canada,

1984.

[9]

Kuhn

_M.T.,

Boettcher P.J., Freeman

_A.E.,

Potential biases in

_predicted

transmitting

abilities of females from

preferential

treatment, J.

_Dairy

Sci. 77

₍₁₉₉₄₎

2428-2437.

[10]

Kuhn M.T., Freeman

_A.E.,

Biases in

_{predicted transmitting}

abilities of sires

when

_daughters

receive

_preferential

treatment, J.

_Dairy

Sci. 78

₍₁₉₉₅₎

2067-2072.

[11]

Kuhn

_M.T.,

Freeman _A.E.,Power transformations for

_reducing

bias in ge-netic evaluation caused

by preferential

treatment, J.

_Dairy

Sci.

_{(Abstr.) (1996)}

suppl.

1, 143.

[12]

Lange K.L.,

Little

R.J.A., Taylor J.M.G.,

Robust statistical

_{modeling using}

the t

_{distribution,}

J. Am. Stat. Assoc. 84

₍₁₉₈₉₎

881-896.

[13]

Lange K.,

Sinsheimer

J.S.,

Normal/Independent

distributions and their

ap-plications

in robust

_regression,

J.

_{Comp. Graph.}

Stat. 2

₍₁₉₉₃₎

175-198.

[14]

Law

_A.M.,

Kelton

W.D.,

Simulation

Modeling

and

Analysis, McGraw-Hill,

New

_York,

1982.

[15]

Lidauer

M., Mdntysaari E.A.,

Detection of bias in animal model

_pedigree

indices of

_{heifers, Agric.}

Food Sci. Finland 5

₍₁₉₉₆₎

387-397.

[16]

Mdntysaari E.A., Sillanpaa

M., Bias in

_pedigree

indices of

dairy

bulls: Should

the _management_{group effects}be fixed and should we use smaller

_{heritability?,}

44 th Annual

Meeting

of the

European

Association for Animal

_{Production, Aarhus,}

Denmark,

1993, Abstracts

_I,

_pp.236-237.

[17]

Murphy P.A., Everett

R.W., Van Vleck

L.D., Comparison

of first lactations

and all lactations of dams to

_predict

sons’ milk

_evaluation,

J.

Dairy

Sci. 65

₍₁₉₈₂₎

1999-2005.

[18]

Nicholas

_F.W.,

Smith

_C.,

Increased rates of

_{genetic change}

in

_dairy

cattle

_by

embryo

transfer and

splitting,

Anim. Prod. 36

₍₁₉₈₃₎

341-353.

[19]

Powell R.L., Norman H.D.,

Accuracy

of cow indexes

_according

to

repeata-bility,

evaluation, herd

_yield,

and registry status, J.

_Dairy

Sci. 71

₍₁₉₈₈₎

2232-2240.

[20]

Rothschild M.F.,

Douglass L.W.,

Powell R.L., Prediction of son’s modified

contemporary

comparison

from

_{pedigree information,}

J.

_Dairy

Sci. 64

₍₁₉₈₁₎

331-341.

[21]

Stranden L, Mdki-Tanila

_{A., Mdntysaari}

E.A., Genetic _{progress and}rate of

inbreeding

in a closed adult MOET nucleus under different _mating

_strategies

and

heritabilities,

J. Anim. Breed. Genetics 108

₍₁₉₉₁₎

401-411.

[22]

Stranden

_L,

Robust mixed effects linear models with t distributions and

application

to

_dairy

cattle

_breeding,

Ph.D. thesis,

University

of Wisconsin, Madison,

1996.

[23]

Stranden I., Gianola

D., Gaussian

versus Student-t mixed effects linear

mod-els for milk

_yield

in

_{Ayrshire cattle,}

48th Annual

_Meeting

of the

_European

Association for Animal

Production, Vienna, 1997,

Abstracts 1, pp. 262-263.

[24]

Stranden I, Gianola D., Inferences about variance _componentsin the

univari-ate mixed linear t model

_{using Laplacian-t approximations,}

in:

_Proceedings

of the 6th World

Congress

on Genetics

Applied

to Livestock Production,

Armidale, Australia,

1998, vol.

25, pp. 537-540.

[25]

Sutradhar

B.C.,

Ali M.M., Estimation of the parameters of a

_regression

model with a multivariate t error

_variable,

Comm. Stat.

Theory

Meth. 15

₍₁₉₈₆₎

(20)

[26]

Tempelman R.J.,

Firat _M.Z.,

_Beyond

the linear mixed model:

perceived

versus real

benefits, Proceedings

of the 6th World

Congress

on Genetics

Applied

to Livestock _{Production, Armidale,}

_Australia,

1998, vol. _25,pp. 605-612.

[27]

Uimari

_{P., Mantysaari E.A., Repeatability}

and bias of estimated

breeding

values for

_dairy

bulls and bull dams calculated from animal model evaluations, Anim. Prod. 57

₍₁₉₉₃₎

175-182.

[28]

Uimari

P., Mantysaari E.A., Relationship

between bull dam herd character-istics and bias in estimated

_breeding

values of _bull,

_Agric.

Sci. Finland 4

₍₁₉₉₅₎

463-472.

[29]

Weigel D.J.,

Pearson

_R.E.,

Hoeschele

_{I., Impact}

of different

_strategies

and

amounts of

_preferential

treatment on various methods of bull-dam

selection,

J.

Dairy

Sci. 77

₍₁₉₉₄₎

3163-3173.

[30]

West

_M.,

Outlier models and prior distributions in

_Bayesian

linear

_regression,

J.

_Roy.

Statist. Soc. B 46

₍₁₉₈₄₎

431-439.

[31]

Zellner

_{A., Bayesian}

and

_{non-Bayesian analysis}

of the

regression

model with

multivariate Student-t error terms, J. Am. Stat. Assoc. 71

₍₁₉₇₆₎

400-405.

APPENDIX: Distribution of the

preferential

treatment variable When _w!is

_positive

and very

large,

A

ij

tends

to hi - pm,n,

so in this case

equation

(1)

becomes:

When _w!is

_negative,

_6.

_ij

=

_0,

_asindicated in

(1)

so _y_ij

_{= h}

₂

+ _u_j + _e!.

Hence,

given h

i

,

the _{range in}

_production

records due to

_preferential

treatment

is

_expected

to be

_{hi -}

_p_m_in

_{= hi}

+

Qh

5

=

(

Z

i

₊

5)&OElig;

h

where _ziN

N(0, 1).

Unconditionally,

the

_expected

_{range is}then

_5o

_h

_.

For

_6.

_ij

defined in

_equation

(2),

the average

preferential

treatment

applied, conditionally

on

h

i

would be

where

₀

_,

_B

_(.)

is normal

_density

with mean A and variance 1. For A =

_0,

OA

(-)

is the standard normal

_density

_0(.).

Because

I

f

z

00 4)(w)o(w)dw

=

24)’(z) I

and

!(0)

=

1,

it follows that 2 so

E(A

jj

)

=

E(E(02!!hi)) _ -!Jmin·

Likewise,

_E(!7jlhi)

=

24 (h -Pmin)

2

for A = 0.

Thus,

!

With pm;&dquo; _

-5(]’

h

,

we have

E(Di!)

=

1.875(]’

h

and

Attenuating effects of preferential treatment with Student-t mixed linear models: a simulation study

HAL Id: hal-00894228

https://hal.archives-ouvertes.fr/hal-00894228

Submitted on 1 Jan 1998

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Attenuating effects of preferential treatment with

Student-t mixed linear models: a simulation study

Ismo Strandén, Daniel Gianola

To cite this version:

Original

article

Attenuating

effects

of

preferential

treatment

with

Student-t

mixed

linear

models:

a

simulation

study

Ismo

Strandén

a

Daniel Gianola

Department

Sciences, University

Wisconsin,

Madison,

Research, Agricultural

MTT,

Jokioinen,

(Received

May

accepted

1998)

multiple

embryo

preferential

depended

breeding

compared

ability

preferential

model,

by herd,

independent

degrees

Bayesian analysis

sampler,

variance, herd-year effects, breeding

sampling

squared

preferential

performance.

preferential

prevalent

best; hence,

assumption

clearly inappropriate.

preferential

assumptions.

&copy;

Inra/Elsevier,

/ preferential

/ simulation / Bayesian

_preferential

_{Sciences, University}

_Wisconsin,

_{Research, Agricultural}

_MTT,

_Jokioinen,

_May

_accepted

₁₉₉₈₎

_multiple

_breeding

_compared

_ability

_model,

_{by herd,}

_independent

_{Bayesian analysis}

_sampler,

_{variance, herd-year effects, breeding}

_sampling

_performance.

_preferential

_prevalent

_assumption

_{clearly inappropriate.}

_assumptions.

_©

_{Inra/Elsevier,}

_{/ preferential}

_{/ simulation / Bayesian}

_{/ Student-t}

_{/ Gibbs}

_(t).

_{préférentiel}

_dépendu

_génétique

_prendre

_degrés

_sujet

_génétiques

_réponses

_performance

_performance

_Quand

_fréquent

_inadapté.

_apparaît

_prendre

_{préférentiels}

_/

_/

_management

_practice

_applied

_contemporary

_example,

_feeding,

_treatment,

_{longer milking}

_day

_{feeding according}

_production

_{applied selectively}

_{dairy cattle, presumably}

_probability

_genetic

_yield

_theory.

_inadequate

_properly

_preferential

_suspected

_apparent