Bayesian inference in the semiparametric log normal frailty model using Gibbs sampling

(1)

HAL Id: hal-00894208

https://hal.archives-ouvertes.fr/hal-00894208

Submitted on 1 Jan 1998

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Bayesian inference in the semiparametric log normal

frailty model using Gibbs sampling

Inge Riis Korsgaard, Per Madsen, Just Jensen

To cite this version:

(2)

Original

article

Bayesian

inference

in the

_{semiparametric}

_log

normal

frailty

model

_using

Gibbs

_sampling

Inge

Riis

_Korsgaard*

Per

Madsen Just Jensen

Department

of Animal

Breeding

and

Genetics,

Research Centre

_Foulum,

Danish Institute of

_{Agricultural Sciences,}

P.O. Box 50, DK-8830

Tjele,

Denmark

(Received

16 October

1997; accepted

23

_April

₁₉₉₈₎

Abstract - In this paper, a full

_{Bayesian analysis}

is carried out in a

_{semiparametric}

log

normal

_frailty

model for survival data

_using

Gibbs

sampling.

The full conditional

posterior

distributions

describing

the Gibbs

sampler

are either known distributions or

shown to be

_log

concave, so that

adaptive rejection sampling

can be used.

Using

data

augmentation, marginal posterior

distributions of

_breeding

values of animals with and without records are obtained. As an

_example,

disease data on future AI-bulls from the Danish

performance testing

programme were

_analysed.

The trait considered was ’time from

_entering

test until first time a

respiratory

disease occurred’. Bulls

without a

_respiratory

disease

_during

the test and those tested without disease at

date of

_analysing

data had

right

censored records. The results showed that the

hazard decreased with

_increasing

age at

entering

test and with

_increasing

degree

of

_{heterozygosity}

due to

crossbreeding.

Additive effects of _gene

_importation

had no

influence. There was

_genetic

variation in

_{log frailty}

as well as variation due to herd of

origin by

period

and year

by

season.

_©

Inra/Elsevier,

Paris

survival _analysis

_/

semiparametric log normal frailty

model / Gibbs

sampling

/

animal

_{model / disease}

data on _performancetested bulls

*

Correspondence

and

_reprints

E-mail:

_{[email protected]}

or

_{[email protected]}

Résumé - Inférence Bayésienne dans un modèle de survie semiparamétrique

log-normal à partir de l’échantillonnage de Gibbs. Une

_{analyse complètement}

Bayésienne

utilisant

_{l’échantillonnage}

de Gibbs a été effectuée dans un modèle de survie

_{semiparamétrique log-normal.}

Les distributions conditionnelles a

_posteriori

mises à

profit

par

l’échantillonnage

de Gibbs ont

été,

soit des distributions connues, soit des distributions

log-concaves

de telle _{sorte que}

_{l’échantillonnage}

avec

_rejet

adaptatif

a _puêtre utilisé. En utilisant la simulation des données _manquantes,on

(3)

avec ou sans données. Un

_{exemple analysé}

a concerné les données de santé des futurs

taureaux d’insémination dans les stations danoises de contrôle de

performance.

Les

taureaux sans maladie

respiratoire

ou n’en _ayant_pasencore eu à la date de

l’analyse

ont été considérés comme _porteursd’une information censurée à droite. Les résultats

ont montré que le

risque

instantané décroissait quant

l’âge

à l’entrée en station ou le

degré d’hétérozygotie

lié au croisement croissaient. Les effets additifs des différentes sources de

gènes importés

n’ont pas eu d’influence. Le

_risque

instantané de maladie a été trouvé soumis à des influences

_génétiques

et non

_génétiques

(troupeau

_d’origine

et

_{année-saison).}

_©

_{Inra/Elsevier,}

Paris

analyse de survie

_/

modèle semi-paramétrique

/

échantillonnage de Gibbs

_/

modèle animal

_/

résistance aux maladies

1. INTRODUCTION

When survival

data,

the time until a certain event

_happens,

is

_analysed,

_very often the hazard function is modelled. The hazard

function,

A

i

(t),

of an animal

i,

denotes the instantaneous

_probability

of

_failing

at time

_t,

if risk exists.

In Cox’s

_proportional

hazards model

_[5]

it is assumed that

_A

_i

_{(t) = A}

_o

_(t)

exp{x!,6},

where,

in

_{semiparametric models,}

_A

_O

_(t)

is _any

_arbitrary

baseline hazard function common to all animals. Covariates of animal

_i,

xi, are

_supposed

to act

_{multiplicatively}

on the hazard function

_by

_exp{x!,6},

where

_,Q

is a

vector of

_regression

parameters.

In

_fully

_parametric

models the baseline hazard function is also

_{parameterized.}

The

_proportional

hazard model assumes that

conditional on

_covariates,

the event times are

_independent

and attention is

focused on the effects of the

explanatory

variables. The baseline hazard function

is then

_regarded

as a nuisance factor.

Frailty

models are mixed models for survival data. In

frailty

models it is

assumed that there is an unobserved random

_variable,

a

_frailty

_variable,

which is assumed to act

_{multiplicatively}

on the hazard function. Sometimes a

frailty

variable is introduced to make correct inference on

_regression

_parameters.

In

other situations the

_parameters

of the

_frailty

distribution are of

_major

interest. In shared

_frailty

_models,

introduced

_by

_Vaupel

et al.

_(32),

_groupsof individ-uals

_(or

several survival times on the same

individual)

share the same

_frailty

variable. Frailties of two individuals have a correlation

equal

to 1 if

_they

come

from the same _groupand

_equal

to 0 if

_they

come from different _groups.

_Mainly

for reasons of mathematical

convenience,

the

frailty

variable is often assumed

to follow a _gammadistribution. In the animal

breeding

_literature,

this method

has been used to fit sire models for survival data

_using

_fully

_parametric

models

(e.g.

[8, 10]).

Several _papersdeal with correlated _gamma

_frailty

models

_(e.g.

_[22,

_{26, 30,}

31!).

In these models individual frailties are linear combinations of

independent

gamma distributed random variables constructed to

_give

the desired variance covariance matrix _amongfrailties. From a mathematical

point

of view these

models are convenient because the EM

_algorithm

[7]

can be used to estimate the

parameters.

Because of the infinitesimal model often assumed in

_quantitative

genetics,

frailties may be

log

normally distributed;

thereby

conditional random

(4)

immediate to use the EM

algorithm

in

log normally

distributed

frailty

models as stated

_by

several authors and shown in

Korsgaard

!21!.

In this paper we show how a full

_{Bayesian analysis}

can be carried out in

a

_{semiparametric}

log

normal

frailty

model

using

Gibbs

sampling

and

adap-tive

_rejection

_sampling.

It is shown that

_{by using}

data

_{augmentation, marginal}

posterior

distributions of

_breeding

values of animals without records can be

ob-tained. The work is very much

_inspired

_by

the works of Kalbfleisch

_!19!,

_Clayton

!4!,

Gauderman and Thomas

_!11!

and

_Dellaportas

and Smith

_!6!.

Kalbfleisch

_[19]

presented

a

_{Bayesian analysis}

of the

semiparametric regression

model. Gibbs

sampling

was used

_by

_Clayton

[4]

for

_Bayesian

inference in the

semiparamet-ric gamma

frailty

model and

_by

Gauderman and Thomas

_[11]

for inference in

a related

_{semiparametric}

log

normal

frailty

model with

emphasis

on

applica-tions in

_genetic

_{epidemiology. Finally}

_Dellaportas

and Smith

_[6]

demonstrated that Gibbs

sampling

in

_conjunction

with

_{adaptive rejection sampling gives}

a

straightforward computational procedure

for

_Bayesian

inferences in the Weibull

proportional

hazards model.

The

_{semiparametric log}

normal

frailty

model is defined in section 2 of this

paper. In this

part

we show how a full

_{Bayesian analysis}

is carried out in the

special

case of the

log

normal

frailty

model,

where the model of

log

frailty

is a variance

_component

model. The full conditional

posterior

distributions

_required

for

_using

Gibbs

_sampling

are derived for a

_given

set of

_prior

distributions. In

section

_3,

we

_analyse

disease data on

performance

tested bulls as an

example

and section 4 contains a discussion.

2. BAYESIAN INFERENCE IN THE SEMIPARAMETRIC LOG NORMAL FRAILTY MODEL - USING GIBBS SAMPLING

Let

_T

_i

and

C

i

be the random variables

representing

the survival time and the

_censoring

time of animal

_i,

respectively.

Then data on animal i are

(y

2 ,

6 i

),

where y

2

is the observed value

of Y

=

min{T

i

,

C

d

and

6 i

is an indicator random

variable,

equal

to 1 if

T

i

<

,

i

C

and 0 if

C

i

<

_T

_i

_.

In the

_{semiparametric frailty}

model,

it is assumed

_that,

conditional on

frailty

Z

i

= _z_i_,the hazard

function,

Ài(t),

of

Ti;

i =

_{1, ... ,}

_n,is

_{given by}

where

_A

_h

_(t)

is the common baseline hazard function of animals that

belong

to

the hth

_stratum,

h =

1, ... , H,

where H is the number of strata.

x

i

(t)

is a vector of

possible time-dependent

covariates of animal i

_and

is the

corresponding

vector of

_regression

_{parameters. Z}

_i

is the

frailty

variable of animal i. This is

an unobserved random variable assumed to act

_{multiplicatively}

on the hazard

function. A

large value,

zi, of

Z

i

increases the hazard of animal i

throughout

the whole time

_period.

Definition:

let w =

(wl, ... , w

n

)’;

if

w I E - N

n

(0,

E)

and the

frailty

variable

Zi

in

_equation

₍₁₎

be

_given

_by

_Z

_i

=

},

exp f w

2

i.e.

Z

is

log normally

distributed;

i =

1, ... ,

n. Then the model

_given

_by

_equation

(1)

is called a

_{semiparametric}

(5)

This is the definition of a

_{semiparametric}

_log

normal

frailty

model in broad

generality.

However, special

attention is

_given

to a subclass of models where

the distribution of

_{log frailty}

is

_{given by}

a variance

_component

model:

or in scalar

_form,

_w_i =

Uj+ ai+ ei

where j

is

the class of the random

effect,

u, that animal

i belongs

to; j

E

_{{1, ... , q}.}

ai is the random additive

genetic

value and ei the random value of environmental effect not

already

taken into account. It is assumed that

_ula

_’

_-

_Nq(O,

_Iq

_U

_{’), a[a§ -}

_N

_(0,

_Aa!)

and

e!er! !!(0,In.cr!).

Q

u

and

_Q

_a

_.

_Q!

and

_Q

_a

are known

_design

matrices of

dimension n x _qand n x

_N,

_{respectively,}

where N is the total number of

animals

_defining

the additive

_genetic

_relationship

_matrix,

_A,

and n is the number of animals with records.

_Here,

_(u,

_a’),

_(a,

_or’)

and

_(e,

_U2

₎

are assumed

to be

_{mutually independent.}

Generalizations will be discussed later. From

equation

(2),

the hazard of

_T

_i

is:

assuming

that the covariates are time

_independent

and that there is no

stratification. The vector of

parameters

and

hyperparameters

of the model

is aJ =

(AoO,;3, u,a!,a,a!,e,a!),

where

_A

_o

_(t)

= It

A

o

(u)du

is the

_integrated

hazard function.

Note that

_{log frailty,}

_w_i_,of animal

_i,

is an unobserved

_quantity

which

is modelled. This is

_analogous

to the threshold model

_(e.g.

_[28]),

where an

unobserved

_quantity,

the

_liability,

is modelled. In the threshold

_model,

a

categorical

trait is

_considered,

but

_heritability

is defined for the

_liability

of the trait. In the

_{semiparametric}

_log

normal

_frailty

model the trait is a survival

time,

but

_heritability

is defined for

_{log frailty}

of the trait. The

_{semiparametric}

log

normal

_frailty

model is not a

log

linear model for the survival times

_T

_i

_,

i =

1, ... , n.

The

only log

linear models that are also

_proportional

hazards

models are the Weibull

_regression

models

_(including

_{exponential regression}

models),

where the error term is

_e/p,

with _p

_being

a

_parameter

of the Weibull

distribution and

_having

the extreme value distribution

_!20!.

Without restriction

on the baseline

hazard,

the

proportional

hazard model

postulates

no direct

relationship

between covariates

_(and

_frailty)

and time itself. This is unlike the threshold

_model,

where the observed value is determined

_by

a

_grouping

on the

underlying

scale.

2.1. Prior distributions

In order to _carryout a full

Bayesian analysis,

the

_prior

distributions of all

parameters

and

_{hyperparameters}

in the model must be

specified.

A

_priori,

it is assumed

_(by

definition of the

_log

normal

_frailty

_model)

that _u,

_given

the

hyper-parameter

(

7 u 2,

follows a multivariate normal distribution:

U

I

U2

u -

Nq(O,I9Qu).

(6)

priori

elements in

/3

are assumed to be

_independent

and each is assumed to fol-low an

_improper

uniform distribution over the real

numbers;

i.e.

p({3

b

)

oc

_1;

b =

1,...,.B,

where B is the dimension of

!3.

The

_{hyperparameters}

_{a£, a}

_§

_a and

_Q

_e

_are

assumed to follow

_independent

inverse gamma

distributions;

i.e.

a! ’&dquo; IG(¡.¿u, lIu), a! ’&dquo;

’

¿

a

,

IG(¡

v

a

)

and

_or2

_-

_e

_,

_¿

_’

_IG(¡

_v

_e

_),

where , u¿’¡ lIu, pa, va,

and,a,,

ve, are values

assigned according

to

prior

belief. The convention used for

inverse _gammadistributions is

_given

in the

_Appendix.

The baseline hazard

func-tion

_{>’0 (t)}

will be

_{approximated by}

a

_step

function on a set of intervals defined

by

the different ordered survival

times,

0 <

_t(

₁

₎

< ... <

t(

M

)

< oo:

>’o(t) =

Aom

for

_{t(,!_1)}

< t:=:;

_t(!,);

m =

_1,...,

_M,

with

t<

o

>

= 0 and M the number of dif

ferent uncensored survival times. The

_integrated

hazard function is then

con-tinuous and

_piecewise

linear. A

priori

it is assumed that

_{!oi, ... , A}

OM

are

in-dependent

and that the

_prior

distribution of

A

om

is

_given

_by

_p(A

_om

₎

oc

>’

0 ’;’;

m = 1, ... , M.

The

prior

distribution of

Ao

m

=

Ao (t<

m

> ) -

Ao(t(.,))

-M

Aom(t(m) - t(m-,))

is then

_p(A

_om

₎

a

,)-’

o

(A

and

p(Aoi, ... ,

AoM)

oc

II

A

o

m,

m=1 1

by

having

assumed

_independence

of

_{!ol, ... , >’O}

M

a

_priori.

Based on these

as-sumptions and, assuming

furthermore that a

_priori

(A

ol

, ... ,

Ao,!,l),

!3,

(u,

u u 2),

(a, a’)

and

_(e,

_Q

_e)

are

_{mutually independent,}

the

_prior

distribution

of V)

can

be written

2.2. Likelihood and

_{joint posterior}

distribution

The usual convention that survival times tied to

_{censoring times,}

pre-cede the

censoring

times is

_adopted.

_Furthermore,

as in Breslow

_[3],

it is

as-sumed that

censoring occurring

in the interval

_[t(

_m-

₁

₎

_t(m))

occurs at

_t(,,,-

₁

_);

m =

1,...,

M ₊

1,

with

t(

M+i

)

= _oo.

Under the

assumption, where,

conditional on _u,a and _e,

censoring

is

independent

(e.g. [1, 2]),

the

_partial

conditional

_{(censoring omitted)}

likelihood

(7)

(e.g.

(15!).

Under the

_{assumptions given above, equation}

₍₅₎

becomes

_ _ r _ _ i

where

_D(t(m»)

is the set of animals that failed at time

_t!&dquo;,!,

_d(t(

_m»

₎

is the number of animals that failed at time

_t!&dquo;,!,

and

_{R(t!&dquo;,!)}

is the set of animals at risk of

_failing

at time

_t(

_m

₎

_’

Furthermore

_{assuming that,}

conditional on _u,a

and _e,

_censoring

is non-informative for

_!,

then the

_{joint posterior}

distribution

of o

is,

using

Bayes’

theorem,

obtained up to

_{proportionality by}

_multiplying

the conditional likelihood and the

_prior

distribution

_{of 0}

where

_{p((y, 8}

_{) 11/i)}

is the conditional likelihood

_given

_by

_equation

₍₆₎

and

_p(qp)

is

the

_prior

distribution of

parameters

and

_{hyperparameters given by equation}

_(4).

2.3.

_{Marginal posterior}

distributions and Gibbs

_sampling

If _cpis a

_parameter

or a subset of

_parameters

of interest from

_1/i,

the

marginal

posterior

distribution of cp is obtained

by integrating

out the

remaining

param-eters from the

_{joint posterior}

distribution. If this can not be

_performed

ana-lytically

for one or more

_parameters

of

_interest,

Gibbs

_sampling

[12, 14]

can

be used to obtain

samples

from the

joint posterior distribution,

and

_thereby

also from any

marginal

posterior

distribution of interest. Gibbs

sampling

is

an iterative method for

_generation

of

samples

from a multivariate distribution

which has its roots in the

_{Metropolis-Hastings}

_algorithm

_{[17, 24!.}

The Gibbs

sampler produces

realizations from a

_{joint posterior}

distribution

_{by sampling}

repeatedly

from the full conditional

posterior

distributions of the

_parameters

in the model. Geman and Geman

_[14]

showed

that,

under mild

_conditions,

and after a

_large

number of

_iterations,

_samples

obtained are from the

_{joint posterior}

distribution.

2.4. Full conditional

_posterior

distributions

In order to

_implement

the Gibbs

_sampler,

the full conditional

_posterior

distributions of all the

parameters

in 1/i

must be derived. The

_following

notation is used: that

_1/i

_B

_<p

_{denotes 1/i}

_except

_{cp; e.g.}if _cp=

{3,

then

1/i

V3

is

(A

01’

... ,

A

om

,

u,

o’!,

a,

<r!,

e,

o, e 2)

posterior

distribution of cp

given

data and all the

remaining

parameters,

1/iB<p’

is

proportional

to the

joint

posterior

distribution

_{of 1/i given by}

_equation

_(7).

(8)

where

_Of

=

!!

exp{ai+ei+x!,8}Aom

and

_d(u

_j

₎

is the number of animals !n,a!.&dquo;,,! < y;,

that failed from the

_jth

class of u and

S( Uj)

is the set of animals

_belonging

to the

_jth

class of u. For

_i,

an animal with

records,

posterior

distribution of ai is

given by

where

_{Of =}

_L

_{exp{uj+et+x!}Aomand{!4’’-’}aretheelementsofA !.}

m:t!m! Yi

For an

_{animal, i,}

without

_record,

_posterior

distribution of

a

i follows a normal distribution

_according

to

/ B

_posterior

distribution of

ei,

i =

_{1, ... , n,}

_is,

_upto

propor-tionality,

given

by

where

_Of

=

L

exp{ Uj

_{+ a}_i

-f- xi/3!Ao!&dquo;,

and the full conditional

poste-ma!&dquo;,! < y;

rior distribution of each

_regression

_parameter

_{,!6, b =}

_{1, ... ,}

B is

_{given by}

_posterior

distribution of each of the

_{hyperparameters}

<7!, <r!

and

afl is

inverse gamma,

according

to:

and

(9)

Sampling

from _{gamma, inverse gamma}and

_normalely

distributed random variables is

_{straightforward.}

_posterior

distribution of _u!, of ai, for

i,

an animal with

_records,

of _e_i and of

_regression

_parameters,

_given

by equations

(8), (9), (11)

and

_(12),

_{respectively,}

can all be shown

_[21]

to be

_log

_concave,and therefore

_{adaptive rejection}

_sampling

_[16]

can be used

to

_sample

from these distributions.

_Adaptive

_rejection

_sampling

is useful in

order to

_{sample efficiently}

from densities of

complicated

algebraic

form. It is

a method for

_{rejection sampling}

from _anyunivariate

_log-concave

_probability

density

function,

which need

_only

be

_specified

_upto

_{proportionality.}

3. AN EXAMPLE

3.1. Data

As an

_example,

disease data on future AI-bulls from the Danish

_performance

testing

programme for beef traits of

dairy

and dual purpose breeds were

analysed.

The trait considered was ’time from

_entering

test until first time

a

_respiratory

disease occurred’. The bulls of the Danish Red breed were all

performance

tested in the

_15-year

_period

1982-1996 and entered the

_Aalestrup

test station between 23 and 74

_days

of _age.Bulls which did not

_experience

a

respiratory

disease

_during

the test

_period

or which were still

undergoing testing,

on the date of data

analysis

have

right

censored records. For these

_animals,

it

is

_only

known that the time at first occurrence of a

_{respiratory disease, T}

_i

_,

will

be

greater

than the time at

_{censoring, C}

_i

_,

that

_is,

either the time at the end of the test

₍₃₃₆

_days

of

_age)

or the time at the date of data

_analysis

or the

time at

_being

culled before end of test

_(a

_veryrare

_event).

Data on animal

_i;

i =

_{1, ... , n}

is

(y; ,

6 i

),

where y

i

is the observed value of

_Y

=

min{T

i

,

C

i

}

and

6 i

is a random indicator

variable, equal

to 1 if a

_respiratory

disease occurred

during

test,

and 0 otherwise. Data on all animals is

(y, 6).

3.2. Model

It is assumed that the hazard

_function,

_A

_i

_(t),

of

_T

_i

_,

is

_given

_by

where t is time

_{(in days)}

from

_entering

test. In

_{(17), A}

_o

_(t)

is the baseline hazard

function;

x’

=

(

X

i

l

,

X i 2

, Xi3,

!i4)

is a vector of covariates of animal

_i;

xilranges

between 23 and 74

_days

of age in the data and is the animal’s _ageat

_entering

test;

xiz ranges between 0.0 and 1.0 and xi3 ranges between 0.0 and 0.78125

and are

_proportions

of _genesfrom

foreign

_populations

(American

Brown Swiss

and Red Holstein

_cattle)

and xi4

(which

ranges between 0.0 and

1.0)

is the

degree

of

_{heterozygosity}

due to

_{crossbreeding. x}

_ii

is included in order to take into account that bulls are

_entering

test at different _ages;Xi2 and x23 in order to

take additive effects of gene

importation

into account and xi4 in order to take

account of heterosis due to dominance.

_{

_3’

_

(0

1 , ... , Q4)

is the

corresponding

vector of

_regression

_{parameters. Z}

_i

=

exp{h!

₊_k_s₊_a_i₊

e

i

is

the

log normally

distributed

_frailty

variable of animal i.

_h

_j

is the effect of the

_jth

herd of

origin

(10)

number of herd of

_origin

_by

_{period combination,}

and sk is the effect of

entering

test in the kth yearseason

(one

season is 1

_month),

k =

1,...,

K,

where K is

the number of yearseasons. ai is an additive

_genetic

effect of animal i and ei is an effect of environment not

_already

taken into

_account;

i =

_{1, ... ,}

_n,where

n is the number of animals with records. In this

example

J is

_540,

K is 170 and n is 1 635. The

_relationship

among the test bulls was traced back as far as

possible, leading

to a total of N = 5 083 animals

defining

the additive

genetic

relationship

matrix.

3.3

_{Implementation}

of the Gibbs

_sampler

and results

The Gibbs

_sampler

was

_implemented

with

_prior

distributions

_according

to

the

prior

distributions of the

_{hyperparameters}

_{a 2,}

_{as, or}

2 and

_or2

_were

_given

_by

inverse _gammadistributions with

_parameters

and

That

_is,

the

_prior

means were of

afl and

Q

a

_{were 0.1 and the}

_prior

means of

0 ’;

s 2

and

_Q

_e

were

0.8. The

prior

variance of all the

hyperparameters

is 10 000. The

following

starting

values were

_assigned

to the

_parameters

h!°! _

_{(0, ... , 0)’,}

2 (

0 )

=

0.1,

s!°>

=

(0, ... , 0)’,

as !°!

=

0.8,

a(°) =

0 )’,

0 , ... ,

(

2 (

0 )-

0 .

1 ,

e!°! _

_{(0, ... , 0)’,}

u

2 (0)

=

0.8,

!3!°>

=

(0,0,0,0)’.

Sampling

was carried out from

the

respective

full conditional

posterior

distributions in the

_following

order,

describing

one round of the Gibbs

_sampler:

1)

sample

1

1 °r&dquo;,;

m =

_1,...,

M from the

gamma distribution

given

by

equation

(16);

2)

sample

_{h!; j}

=

_{1, ... ,}

J from

_equation

(8)

with u

j

= h

j

and

using

adaptive rejection sampling;

3)

sample

_afl

from the _{inverse gamma}distribution

_given

_by

_equation

₍₁₃₎

with,

a2

=

Oh,

q =

J,

u = h and

(pu, 1

/u)

=

(p

h

,

Vh

);

4)

sample

ai from the normal distribution

given by

equation

(10)

if i is

an animal without

records;

if i is an animal with

records,

ai is

sampled

from

equation

(9)

with

_h

_j

+ s,! substituted for uj in

Of

and

using

adaptive

rejection

sampling;

5)

sample

Q

a

from

the _{inverse gamma}distribution

_{given by equation}

_(14);

6)

sample

ei; i =

_{1, ... ,}

n from

equation

(11) with h

j

+ sk substituted for

Uj in

Of

using

adaptive rejection sampling;

7)

sample

Q

e

from

the inverse gamma distribution

given

by

equation

(15);

8)

sample

(3

b

;

b =

1, 2, 3,

from

equation

(12)

with

h

j

₊_Sk substituted for

(11)

9)

sample

sk k =

_{1, ... ,}

K from

_equation

(8)

with _u_j = _s_k and

using adaptive rejection

sampling;

10)

sample

u2

from

the inverse gamma distribution

given

by

(13)

with

a£ = 0’;,

q =

K,

u = s and

u,

M

(

v

u

) =

(u

s

,

v

s

).

After 40 000 rounds of the Gibbs

sampler,

8 000

_samples

of model

_parameters

were saved with a

_sampling

interval of

_20;

i.e. a total chain

_length

of 200 000.

After each round of the Gibbs

_sampler,

the

_following

standardized

_parameters,

of

log frailty,

were

_computed

where

Q

z

=

cr!

₊

a/

₊

a§

₊

ae

is the variance of

log frailty

(not

of survival

time).

Summary

statistics of selected

_parameters

are shown in table 1.

The rate of

_mixing

of the Gibbs

_sampler

was

_{investigated by}

_estimating

lag-correlations in a standard time series

_analysis.

_Lag

1 and

_lag

10 correlations

(lag

1

_corresponds

to 20 rounds of the Gibbs

_sampler)

are

_given

in table I.

_N

_e

is the effective

_{sample size,}

derived from the method of

_batching

_{(e.g. !13!).}

The chain of

_samples

from the

_marginal

_posterior

distribution of

_Q

_a

has _very slow

_{mixing properties.}

This is reflected in the standardized

_parameters

as

well,

(12)

The

_marginal

_{posterior density}

of

_a

_and

the standardized

_parameters

_h

₂

_,

c2,

c2

and

e

2 _(of

log

frailty)

are shown in

figure

1. The densities were estimated

by

the

_methodology

of Scott

_!27!.

The mean of the

_{marginal posterior density}

of

h

2

is 0.14 where

h

2 is

heritability

of

_log

_frailty.

If herd of

_origin

_{by period}

and _year

_by

season had been considered as fixed effects rather than random

effects,

then

_heritability

of

_log

frailty

would have been

_higher.

The

_marginal

_posterior

densities of

{3

1 ,

,

2 /?

Q3

and are shown in

figure

2.

Because the

_marginal

_posterior

mean of

₍₃

₁

_,

the effect of _ageat

_entering

_test,

is

negative

(-0.0061),

the hazard is decreased

by increasing

age at

_entering

test. That

_is,

for a bull

_entering

test at 23

_days

of _age,the conditional hazard is

always

exp{-0.0061

x

23}/exp{-0.0061

x

74}

=

0.87/0.64

= 1.36

higher,

than

that of a bull

_entering

test at 74

_days

of age; conditional on

frailty

and other

covariates

_being

the same for the two bulls.

_Similarly,

because the

_marginal

posterior

mean of

(

3

4 ,

the effect of

heterozygosity,

is

_negative

_(-0.22),

the hazard is decreased

_{by increasing}

the

_degree

of

_{heterozygosity.}

The

_marginal

posterior

means of additive effects of _gene

_immigration

from American Brown

Swiss and Red Holstein cattle are close to 0.0. The

_marginal

_posterior

mean of

h

2 ,

the

_heritability

of

_{log frailty,}

is

_0.1406;

of

_c!,

the

_proportion

of variation in

log frailty

due to herd of

_origin

_by

period

combination is 0.0913 and of

_c!,

the

proportion

of variation in

_log

_frailty

caused

_by

a _year

_by

season effect is 0.2766.

4. DISCUSSION

This _paperillustrates how a

_Bayesian

analysis

can be carried out in the

semiparametric log

normal

_frailty

model

_using

Gibbs

_sampling.

In the version

of the Gibbs

_sampler

_implemented

_here,

_samples

were

_repeatedly

taken from univariate full conditional

_posterior

distributions. This is

_only

one

_possible

implementation.

With

_highly

correlated univariate

_components

it could be

preferable

to

_sample

from the

_joint

conditional

_posterior

distribution of these

components.

The

_advantage

of this method is its

_greater

_speed

of convergence to

the

_joint

_posterior

distribution

_[23).

The

_methodology

is

_quite

_general

and could

obviously

be used in full

_parametric

models and in models with stratification

and/or

time-dependent

covariates. It is time

_consuming

to

_sample

thousands of observations from thousands of

_{distributions; but,}

_analysing

the

_relatively

small dataset in section 3 left us

_optimistic

about

_{possibilities}

for

_analysing

larger

datasets.

Another

_possibility

is to

_perform

a

Bayesian

analysis

using Laplace

integra-tion. This was carried out

_{by Ducrocq}

and Casella

_[9]

with

_{special emphasis}

on

_obtaining

the

_marginal

_posterior

distribution of

_parameters,

_T_,of the

dis-tribution of

_frailty

terms. Their

_prior

distributions of

_frailty

terms were either gamma or

log

normal.

_They

_point

out that the

_computations

_may

_quickly

be-come too

_heavy

and propose to summarize the

approximate

marginal

posterior

of T

through

its first 3 moments

_using

the Gram-Charlier series

_expansion

of a function. The moments are found

_using

numerical

_integration

based on

Gauss-Hermite

quadrature

followed

_by

an iterative

_strategy

to obtain

precise

estimates. When

_frailty

terms are _gamma

distributed,

these can be

integrated

out

_{algebraically}

to obtain the

_marginal

posterior

distribution of the

remaining

parameters.

From a likelihood

_point

of

_view,

_gammadistributed frailties can

(13)

(14)

This is so in correlated _gamma

_frailty

models as well

_(e.g.

!22!)

but not with

_log

normally

distributed

_frailty

terms. If T is a vector of variance

_components

of

log frailty

terms from a

_log

normal

_frailty

model such as

_equation

(2),

Laplace

integration

may be considered too

_costly

_(9!.

These authors

_{suggested letting}

some

_frailty

terms _{be gamma}distributed and others

_{log normally}

distributed in

order to

_integrate

out

_{algebraically}

some

_frailty

terms.

_{Implementation}

of the Gibbs

_sampler

in a

_log

normal

_frailty

model avoids

_making

the

_{mathematically}

tractable but somewhat artificial choice of a _gammadistribution of

_frailty

terms.

In this _papersome of the

_parameters

are modelled with

_{improper prior}

(15)

as

_pointed

out

by

Hobert and Casella

_!18!,

the mathematical

intractability

that necessitates use of the Gibbs

sampler

also makes

demonstrating

the

_propriety

of the

_posterior

distribution a difficult task. The fact that the full conditional

posterior

distributions

_defining

the Gibbs

_sampler

are all _properdistributions

does not

_guarantee

that the

_{joint posterior}

distribution will be _proper.The

question

of which

_{improper prior}

distributions will

yield

proper

joint posterior

distributions in hierarchical linear mixed models was addressed

_by

Hobert and

Casella

_(18!.

The

_{important question}

of

propriety

of the

posterior

distribution

using improper prior

distributions must be dealt with in a

_{Bayesian analysis}

using

a

_{Laplace integration}

as well. One _wayto avoid

_{improper posteriors}

is to

use _proper

_prior

distributions.

The

_analysis

could have been carried out without

_augmenting

_[29]

_by

additive

_genetic

values of animals without records. The number of animals

defining

the additive

_genetic

_relationship

matrix A was

_{5 083,}

but

_only

1635

had records. Let a be the

_vector

of additive

_genetic

values of animals with

records,

then

_taking

the

part

A

of A

_relating

to animals with

_records,

the

analysis

could have been carried out

_using

the

_prior

_£[a

_§

_-

_{N.!(0, AQa).}

The number of distributions

_needing

_sampling

from would have been 5 083-1635 lower at the _expenseof

_{obtaining marginal}

_posterior

distributions of

_breeding

values on animals without records. However

A-

1 ,

is much denser than

_A-

₁

_,

and is more difficult to

_compute.

The model of

_{log frailty, specified}

_by

_equation

_(2),

can

easily

be

generalized

to include more

independent

variance

_components

and/or

the

_assumption

of

independence

can be

_relaxed;

for

_example

in

_equation

_(2),

u could

_represent

a

maternal effect.

_Assuming

that

_{(u’, a’)’1 G rv N}

_2N

_{(0, G}

Q9

_A),

where G is the

2 x 2 matrix of additive

_genetic

_covariances,

and that the

_prior

distribution of G is inverse Wishart

_distributed,

then the full conditional

_posterior

distribution of

_G,

_required

for Gibbs

_sampling,

will also be inverse Wishart distributed.

Furthermore,

it should be

_possible

to carry out a multivariate

_analysis

of a

survival

_trait,

a

_quantitative

trait and a

_{categorical trait,}

by

_assuming

that

the

_{log frailty}

of the survival

_trait,

the

_quantitative

trait and the

_liability

of the

_categorical

trait follow a multivariate normal distribution. It should also

be

_possible

to

_generalize

to an

_arbitrary

number of survival

_traits,

_quantitative

traits and

_categorical

traits.

By definition,

the trait considered in the

_example:

’time until first occurrence

of a

_respiratory

disease

during

test’ can occur at most once for each animal. These data _are,

however,

only

a subset of the data

_being

collected on bulls

during

the test

_period.

Each

repeated

occurrence of a

respiratory

disease is

recorded,

as well as other

_categories

of diseases and several

_quantitative

traits

and other traits of interest. Oakes

_[25]

_gives

a _surveyof

frailty

models for

multiple

event time

_data,

and it would be

_interesting

to extend the

log

normal

frailty

models to allow for

_multiple

survival times for each animal as well as a

multivariate

_analysis

of censored survival time data and other traits of interest.

ACKNOWLEDGEMENTS

We wish to thank Anders Holst

_{Andersen, Department}

of Theoretical

_Statistics,

(16)

REFERENCES

[1]

Andersen

_{P.K., Borgan 0.,}

Gill

_{R.D., Keiding N.,}

Statistical Models Based

on

Counting Processes, Springer,

New

York,

1992.

[2]

Arjas

E.,

Haara

P.,

A marked

_point

process

approach

to censored failure data with

_complicated

covariates, Scand. J. Stat. 11

₍₁₉₈₄₎

193-209.

[3]

Breslow

_N.E.,

Covariance

_analysis

of censored survival data, Biometrics 30

(1974)

89-99.

[4]

Clayton D.G.,

A Monte Carlo method for

_Bayesian

inference in

_{frailty models,}

Biometrics 47

₍₁₉₉₁₎

467-485.

[5]

Cox

_{D.R., Regression}

models and life tables

_{(with discussion),}

J. R. Stat. Soc.

B 34

₍₁₉₇₂₎

187-220.

[6]

Dellaportas P.,

Smith

_{A.F.M., Bayesian}

inference for

_generalized

linear and

proportional

hazards models via Gibbs

_{sampling, Appl.}

Statist. 42

₍₁₉₉₃₎

443-459.

[7]

Dempster A.P.,

Laird

_N.M.,

Rubin

_D.B.,

Maximum likelihood from

incom-plete

data via the EM

_algorithm

_{(with discussion),}

J. R. Stat. Soc. Ser. B 39

₍₁₉₇₇₎

1-38.

[8]

Ducrocq V.,

Statistical

_analysis

of

_length

of

_productive

life for

_dairy

cows of the Normande

breed,

J.

_Dairy

Sci. 77

₍₁₉₉₄₎

855-866.

[9]

Ducrocq V.,

Casella

G.,

A

_{Bayesian analysis}

of mixed survival

_models,

Genet. Sel. Evol. 28

₍₁₉₉₆₎

505-529.

[10]

Ducrocq V., Quaas R.L.,

Pollak

E.J.,

Casella

_{G., Length}

of

productive

life of

_dairy

cows. 2. Variance component estimation and sire evaluation, J.

_Dairy

Sci. 71

(1988)

3071-3079.

[11]

Gauderman

_W.J.,

Thomas

_D.C.,

Censored survival models for

_genetic

epi-demiology:

A Gibbs

_{sampling approach,}

Genet.

_Epidemiol.

11

₍₁₉₉₄₎

171-188.

(12!

Gelfand

A.E.,

Smith

A.F.M., Sampling-based approaches

to

_{calculating marginal}

densities,

J. Am. Stat. Assoc. 85

₍₁₉₉₀₎

398-409.

[13]

Gelman

_A.,

Carlin

J.B.,

Stern

H.S.,

Rubin

D.B., Bayesian

Data

_Analysis,

Chapman

and

Hall, London,

1995.

[14]

Geman

_S.,

Geman

D.,

Stochastic

relaxation,

Gibbs

_{distributions,}

and the

Bayesian

restoration of

_images,

IEEE Trans. Pattern

_Analysis

and Machine

Intelli-gence 6

₍₁₉₈₄₎

721-741.

[15]

Gill

_{R.D., Johansen S.,}

A _surveyof

product-integration

with a view towards

application

in survival

_analysis,

Ann. Stat. 18

₍₁₉₉₀₎

1501-1555.

[16]

Gilks

_W.R.,

Wild

_{P., Adaptive rejection sampling}

for Gibbs

_{sampling, Appl.}

Stat. 41

₍₁₉₉₂₎

337-348.

[17]

Hastings W.K.,

Monte Carlo

_sampling

methods

_using

Markov chains and their

applications,

Biometrika 57

₍₁₉₇₀₎

97-109.

[18]

Hobert

J.P.,

Casella

G.,

The effect of

_{improper priors}

on Gibbs

sampling

in

hierarchical linear mixed

models,

₍₁₉₉₆₎

1461-1473.

[19]

Kalbfleisch

_{J.D., Nonparametric Bayesian analysis}

of survival time

_data,

J. R. Stat. Soc. B 40

₍₁₉₇₈₎

214-221.

[20]

Kalbfleisch

_J.D.,

Prentice

_R.L.,

The Statistical

_Analysis

of Failure Time

Data,

John

_Wiley,

New

_York,

1980.

[21]

Korsgaard LR.,

Genetic

_analysis

of survival

_data,

Ph.D. thesis,

University

of

Aarhus, Denmark,

1997.

[22]

Korsgaard I.R.,

Andersen

A.H.,

The additive

genetic

gamma

frailty model,

(17)

[23]

Liu

_{J.S., Wong W.H., Kong A.,}

Covariance structure of the Gibbs

_sampler

with

_applications

to the comparisons of estimators and

_{augmentation schemes,}

Biometrika 81

₍₁₉₉₄₎

27-40.

[24]

Metropolis N.,

Rosenbluth

A.W.,

Rosenbluth

M.N.,

Teller

A.H.,

Teller

E.,

Equations

of state calculations

_by

fast

_{computing machines,}

J. Chem.

_Phys.

21

₍₁₉₅₃₎

1087-1091.

[25]

Oakes

_{D., Frailty}

models for

multiple

event-time

data,

in: Klein

_J.P.,

Goel

P.K.

_(eds.),

Survival

_Analysis:

State of the

_Art,

Kluwer Academic

_Publishers,

The

Netherlands, 1992,

pp. 371-380.

[26]

Petersen

_J.H.,

Andersen

_P.K.,

Gill

_R.,

Variance component models for sur-vival

_data,

Statistica Neerlandica 50

₍₁₉₉₆₎

193-211.

[27]

Scott

_D.W.,

Multivariate

_Density

Estimation:

Theory, Applications,

and

Visualisation,

John

_Wiley,

New

York,

1992.

[28]

Sorensen

D.A.,

Andersen

_{S., Gianola D., Korsgaard LR., Bayesian}

inference

in threshold models

using

Gibbs

sampling,

Genet. Sel. Evol. 27

₍₁₉₉₅₎

229-249.

[29]

Tanner

_{M.A., Wong W.H.,}

The calculation of

posterior

distributions

by

data

augmentation,

₍₁₉₈₇₎

528-540.

[30]

Yashin

A.,

Iachine

_I.,

Survival of related individuals: an extension of some fundamental results of

_{heterogeneity analysis,}

Math.

_Popul.

Studies 5

₍₁₉₉₅₎

321-340.

[31]

Yashin

_{A.L, Vaupel J.W.,}

Iachine

_LA.,

Correlated individual

_frailty:

an

ad-vantageous

approach

to survival

_analysis

of bivariate

_data,

Math.

_Popul.

Studies 5

(1995)

145-160.

[32]

Vaupel J.W.,

Manton

_K.G.,

Stallard E., The _impactof

_{heterogeneity}

in

individual

frailty

on the

dynamics

of

mortality, Demography

16

₍₁₉₇₉₎

439-454.

APPENDIX: Convention used for gamma, inverse gamma and

log

normal distributions

The

_following

convention is used for gamma and inverse gamma

distri-butions: let

_{X - r(a, 0),}

with

_density

_p(x)

=

e-

3 (

x/

l

-

a

x

[f(

a

),8

a

r

1 .

Then

E(X)

=

a,8

and

V(X) = a,0

2

=

,8E(X).

Y =

X-’

has _aninverse _orinverted

gamma distribution Y -

IG(a,,8)

with

density

p(y)

=

-(

a)

y

e

l

&dquo;+

W

-1

[f(a),8

a

r

1

and

_E(Y)

_{_}

_{[(a -}

_1),8]-

₁

for a > 1 and

_V(Y)

₌

_{[(a -}

₁₎

₂

(

a

-

2 )

0

2 ]-l

=

_(E(I’))

₂

_{(a -}

₂₎

_for

a > 2.

If X -

_N(p,,

_a

₂

_),

then Y =

exp{X}

is

said _tohave _a

log

normal

Bayesian inference in the semiparametric log normal frailty model using Gibbs sampling

HAL Id: hal-00894208

https://hal.archives-ouvertes.fr/hal-00894208

Submitted on 1 Jan 1998

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Bayesian inference in the semiparametric log normal

frailty model using Gibbs sampling

Inge Riis Korsgaard, Per Madsen, Just Jensen

To cite this version:

Original

article

Bayesian

inference

in the

semiparametric

log

normal

frailty

model

using

Gibbs

sampling

Inge

Riis

Korsgaard*

Per

Madsen Just Jensen

Department

Breeding

Genetics,

Foulum,

Agricultural Sciences,

Tjele,

(Received

1997; accepted

April

1998)

Bayesian analysis

semiparametric

log

frailty

using

sampling.

posterior

describing

sampler

log

adaptive rejection sampling

Using

augmentation, marginal posterior

breeding

example,

performance testing

analysed.

entering

respiratory

respiratory

during

analysing

right

increasing

entering

increasing

degree

heterozygosity

crossbreeding.

importation

genetic

log frailty

_{semiparametric}

_log

_using

_sampling

_Korsgaard*

_Foulum,

_{Agricultural Sciences,}

_April

₁₉₉₈₎

_{Bayesian analysis}

_{semiparametric}

_frailty

_using

_log

_breeding

_example,

_analysed.

_entering

_respiratory

_during

_analysing

_increasing

_increasing

_{heterozygosity}

_importation

_genetic

_{log frailty}

_©

_/

_{model / disease}

_reprints

_{[email protected]}

_{[email protected]}

_{analyse complètement}

_{l’échantillonnage}

_{semiparamétrique log-normal.}

_posteriori

_{l’échantillonnage}

_rejet

_{exemple analysé}

_risque

_génétiques

_génétiques

_d’origine

_{année-saison).}

_©

_{Inra/Elsevier,}

_/

_/

_/

_happens,

_analysed,

_probability

_failing

_t,

_proportional

_[5]

_A

_i

_{(t) = A}

_o

_(t)

_{semiparametric models,}

_A

_O

_(t)