HAL Id: hal-00894208
https://hal.archives-ouvertes.fr/hal-00894208
Submitted on 1 Jan 1998
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Bayesian inference in the semiparametric log normal
frailty model using Gibbs sampling
Inge Riis Korsgaard, Per Madsen, Just Jensen
To cite this version:
Original
article
Bayesian
inference
in the
semiparametric
log
normal
frailty
model
using
Gibbs
sampling
Inge
Riis
Korsgaard*
Per
Madsen Just Jensen
Department
of AnimalBreeding
andGenetics,
Research CentreFoulum,
Danish Institute ofAgricultural Sciences,
P.O. Box 50, DK-8830Tjele,
Denmark(Received
16 October1997; accepted
23April
1998)
Abstract - In this paper, a full
Bayesian analysis
is carried out in asemiparametric
log
normalfrailty
model for survival datausing
Gibbssampling.
The full conditionalposterior
distributionsdescribing
the Gibbssampler
are either known distributions orshown to be
log
concave, so thatadaptive rejection sampling
can be used.Using
dataaugmentation, marginal posterior
distributions ofbreeding
values of animals with and without records are obtained. As anexample,
disease data on future AI-bulls from the Danishperformance testing
programme wereanalysed.
The trait considered was ’time fromentering
test until first time arespiratory
disease occurred’. Bullswithout a
respiratory
diseaseduring
the test and those tested without disease atdate of
analysing
data hadright
censored records. The results showed that thehazard decreased with
increasing
age atentering
test and withincreasing
degree
of
heterozygosity
due tocrossbreeding.
Additive effects of geneimportation
had noinfluence. There was
genetic
variation inlog frailty
as well as variation due to herd oforigin by
period
and yearby
season.©
Inra/Elsevier,
Parissurvival analysis
/
semiparametric log normal frailtymodel / Gibbs
sampling/
animal
model / disease
data on performance tested bulls*
Correspondence
andreprints
E-mail:
snfirk@genetics.sh.dk
orIngeR.Korsgaard@agrsci.dk
Résumé - Inférence Bayésienne dans un modèle de survie semiparamétrique
log-normal à partir de l’échantillonnage de Gibbs. Une
analyse complètement
Bayésienne
utilisantl’échantillonnage
de Gibbs a été effectuée dans un modèle de surviesemiparamétrique log-normal.
Les distributions conditionnelles aposteriori
mises àprofit
parl’échantillonnage
de Gibbs ontété,
soit des distributions connues, soit des distributionslog-concaves
de telle sorte quel’échantillonnage
avecrejet
adaptatif
a pu être utilisé. En utilisant la simulation des données manquantes, onavec ou sans données. Un
exemple analysé
a concerné les données de santé des futurstaureaux d’insémination dans les stations danoises de contrôle de
performance.
Lestaureaux sans maladie
respiratoire
ou n’en ayant pas encore eu à la date del’analyse
ont été considérés comme porteurs d’une information censurée à droite. Les résultats
ont montré que le
risque
instantané décroissait quantl’âge
à l’entrée en station ou ledegré d’hétérozygotie
lié au croisement croissaient. Les effets additifs des différentes sources degènes importés
n’ont pas eu d’influence. Lerisque
instantané de maladie a été trouvé soumis à des influencesgénétiques
et nongénétiques
(troupeau
d’origine
et
année-saison).
©
Inra/Elsevier,
Parisanalyse de survie
/
modèle semi-paramétrique/
échantillonnage de Gibbs/
modèle animal/
résistance aux maladies1. INTRODUCTION
When survival
data,
the time until a certain eventhappens,
isanalysed,
very often the hazard function is modelled. The hazardfunction,
A
i
(t),
of an animali,
denotes the instantaneousprobability
offailing
at timet,
if risk exists.In Cox’s
proportional
hazards model[5]
it is assumed thatA
i
(t) = A
o
(t)
exp{x!,6},
where,
insemiparametric models,
A
O
(t)
is anyarbitrary
baseline hazard function common to all animals. Covariates of animali,
xi, aresupposed
to act
multiplicatively
on the hazard functionby
exp{x!,6},
where,Q
is avector of
regression
parameters.
Infully
parametric
models the baseline hazard function is alsoparameterized.
Theproportional
hazard model assumes thatconditional on
covariates,
the event times areindependent
and attention isfocused on the effects of the
explanatory
variables. The baseline hazard functionis then
regarded
as a nuisance factor.Frailty
models are mixed models for survival data. Infrailty
models it isassumed that there is an unobserved random
variable,
afrailty
variable,
which is assumed to actmultiplicatively
on the hazard function. Sometimes afrailty
variable is introduced to make correct inference on
regression
parameters.
Inother situations the
parameters
of thefrailty
distribution are ofmajor
interest. In sharedfrailty
models,
introducedby
Vaupel
et al.(32),
groups of individ-uals(or
several survival times on the sameindividual)
share the samefrailty
variable. Frailties of two individuals have a correlation
equal
to 1 ifthey
comefrom the same group and
equal
to 0 ifthey
come from different groups.Mainly
for reasons of mathematical
convenience,
thefrailty
variable is often assumedto follow a gamma distribution. In the animal
breeding
literature,
this methodhas been used to fit sire models for survival data
using
fully
parametric
models(e.g.
[8, 10]).
Several papers deal with correlated gamma
frailty
models(e.g.
[22,
26, 30,
31!).
In these models individual frailties are linear combinations ofindependent
gamma distributed random variables constructed to
give
the desired variance covariance matrix among frailties. From a mathematicalpoint
of view thesemodels are convenient because the EM
algorithm
[7]
can be used to estimate theparameters.
Because of the infinitesimal model often assumed inquantitative
genetics,
frailties may belog
normally distributed;
thereby
conditional randomimmediate to use the EM
algorithm
inlog normally
distributedfrailty
models as statedby
several authors and shown inKorsgaard
!21!.
In this paper we show how a full
Bayesian analysis
can be carried out ina
semiparametric
log
normalfrailty
modelusing
Gibbssampling
andadap-tive
rejection
sampling.
It is shown thatby using
dataaugmentation, marginal
posterior
distributions ofbreeding
values of animals without records can beob-tained. The work is very much
inspired
by
the works of Kalbfleisch!19!,
Clayton
!4!,
Gauderman and Thomas!11!
andDellaportas
and Smith!6!.
Kalbfleisch[19]
presented
aBayesian analysis
of thesemiparametric regression
model. Gibbssampling
was usedby
Clayton
[4]
forBayesian
inference in thesemiparamet-ric gamma
frailty
model andby
Gauderman and Thomas[11]
for inference ina related
semiparametric
log
normalfrailty
model withemphasis
onapplica-tions in
genetic
epidemiology. Finally
Dellaportas
and Smith[6]
demonstrated that Gibbssampling
inconjunction
withadaptive rejection sampling gives
astraightforward computational procedure
forBayesian
inferences in the Weibullproportional
hazards model.The
semiparametric log
normalfrailty
model is defined in section 2 of thispaper. In this
part
we show how a fullBayesian analysis
is carried out in thespecial
case of thelog
normalfrailty
model,
where the model oflog
frailty
is a variancecomponent
model. The full conditionalposterior
distributionsrequired
forusing
Gibbssampling
are derived for agiven
set ofprior
distributions. Insection
3,
weanalyse
disease data onperformance
tested bulls as anexample
and section 4 contains a discussion.
2. BAYESIAN INFERENCE IN THE SEMIPARAMETRIC LOG NORMAL FRAILTY MODEL - USING GIBBS SAMPLING
Let
T
i
andC
i
be the random variablesrepresenting
the survival time and thecensoring
time of animali,
respectively.
Then data on animal i are(y
2
,
6
i
),
where y
2
is the observed valueof Y
=min{T
i
,
C
d
and
6
i
is an indicator randomvariable,
equal
to 1 ifT
i
<,
i
C
and 0 ifC
i
<T
i
.
In thesemiparametric frailty
model,
it is assumedthat,
conditional onfrailty
Z
i
= zi, the hazardfunction,
Ài(t),
ofTi;
i =1, ... ,
n, isgiven by
where
A
h
(t)
is the common baseline hazard function of animals thatbelong
tothe hth
stratum,
h =1, ... , H,
where H is the number of strata.x
i
(t)
is a vector ofpossible time-dependent
covariates of animal iand
is thecorresponding
vector of
regression
parameters. Z
i
is thefrailty
variable of animal i. This isan unobserved random variable assumed to act
multiplicatively
on the hazardfunction. A
large value,
zi, ofZ
i
increases the hazard of animal ithroughout
the whole time
period.
Definition:
let w =(wl, ... , w
n
)’;
ifw I E - N
n
(0,
E)
and thefrailty
variableZi
in
equation
(1)
begiven
by
Z
i
=},
exp f w
2
i.e.Z
Z
islog normally
distributed;
i =
1, ... ,
n. Then the modelgiven
by
equation
(1)
is called asemiparametric
This is the definition of a
semiparametric
log
normalfrailty
model in broadgenerality.
However, special
attention isgiven
to a subclass of models wherethe distribution of
log frailty
isgiven by
a variancecomponent
model:or in scalar
form,
wi =Uj+ ai+ ei
where j
is
the class of the randomeffect,
u, that animal
i belongs
to; j
E{1, ... , q}.
ai is the random additivegenetic
value and ei the random value of environmental effect not
already
taken into account. It is assumed thatula
’
-
Nq(O,
Iq
U
’), a[a§ -
N
N
(0,
Aa!)
ande!er! !!(0,In.cr!).
Q
u
andQ
a
.
Q!
andQ
a
are knowndesign
matrices ofdimension n x q and n x
N,
respectively,
where N is the total number ofanimals
defining
the additivegenetic
relationship
matrix,
A,
and n is the number of animals with records.Here,
(u,
a’),
(a,
or’)
and(e,
U2
)
are assumedto be
mutually independent.
Generalizations will be discussed later. Fromequation
(2),
the hazard ofT
i
is:assuming
that the covariates are timeindependent
and that there is nostratification. The vector of
parameters
andhyperparameters
of the modelis aJ =
(AoO,;3, u,a!,a,a!,e,a!),
whereA
o
(t)
= It
A
o
(u)du
is theintegrated
hazard function.Note that
log frailty,
wi, of animali,
is an unobservedquantity
whichis modelled. This is
analogous
to the threshold model(e.g.
[28]),
where anunobserved
quantity,
theliability,
is modelled. In the thresholdmodel,
acategorical
trait isconsidered,
butheritability
is defined for theliability
of the trait. In thesemiparametric
log
normalfrailty
model the trait is a survivaltime,
butheritability
is defined forlog frailty
of the trait. Thesemiparametric
log
normalfrailty
model is not alog
linear model for the survival timesT
i
,
i =
1, ... , n.
Theonly log
linear models that are alsoproportional
hazardsmodels are the Weibull
regression
models(including
exponential regression
models),
where the error term ise/p,
with pbeing
aparameter
of the Weibulldistribution and
having
the extreme value distribution!20!.
Without restrictionon the baseline
hazard,
theproportional
hazard modelpostulates
no directrelationship
between covariates(and
frailty)
and time itself. This is unlike the thresholdmodel,
where the observed value is determinedby
agrouping
on theunderlying
scale.2.1. Prior distributions
In order to carry out a full
Bayesian analysis,
theprior
distributions of allparameters
andhyperparameters
in the model must bespecified.
Apriori,
it is assumed(by
definition of thelog
normalfrailty
model)
that u,given
thehyper-parameter
(
7
u 2,
follows a multivariate normal distribution:U
I
U2
u -
Nq(O,I9Qu).
priori
elements in/3
are assumed to beindependent
and each is assumed to fol-low animproper
uniform distribution over the realnumbers;
i.e.p({3
b
)
oc1;
b =
1,...,.B,
where B is the dimension of!3.
Thehyperparameters
a£, a
§
a andQ
e
are
assumed to followindependent
inverse gammadistributions;
i.e.a! ’&dquo; IG(¡.¿u, lIu), a! ’&dquo;
’
¿
a
,
IG(¡
v
a
)
andor2
-e
,
¿
’
IG(¡
v
e
),
where , u¿’¡ lIu, pa, va,and,a,,
ve, are valuesassigned according
toprior
belief. The convention used forinverse gamma distributions is
given
in theAppendix.
The baseline hazardfunc-tion
>’0 (t)
will beapproximated by
astep
function on a set of intervals definedby
the different ordered survivaltimes,
0 <t(
1
)
< ... <t(
M
)
< oo:>’o(t) =
Aom
for
t(,!_1)
< t:=:;t(!,);
m =1,...,
M,
witht<
o
>
= 0 and M the number of different uncensored survival times. The
integrated
hazard function is thencon-tinuous and
piecewise
linear. Apriori
it is assumed that!oi, ... , A
OM
arein-dependent
and that theprior
distribution ofA
om
isgiven
by
p(A
om
)
oc>’
0
’;’;
m = 1, ... , M.
Theprior
distribution ofAo
m
=Ao (t<
m
> ) -
Ao(t(.,))
-MAom(t(m) - t(m-,))
is thenp(A
om
)
a,)-’
o
(A
and
p(Aoi, ... ,
AoM)
ocII
A
o
m,
m=1 1by
having
assumedindependence
of!ol, ... , >’O
M
apriori.
Based on theseas-sumptions and, assuming
furthermore that apriori
(A
ol
, ... ,
Ao,!,l),
!3,
(u,
u u 2),
(a, a’)
and(e,
Q
e)
aremutually independent,
theprior
distributionof V)
canbe written
2.2. Likelihood and
joint posterior
distributionThe usual convention that survival times tied to
censoring times,
pre-cede the
censoring
times isadopted.
Furthermore,
as in Breslow[3],
it isas-sumed that
censoring occurring
in the interval[t(
m-
1
)
t(m))
occurs att(,,,-
1
);
m =
1,...,
M +1,
witht(
M+i
)
= oo.Under the
assumption, where,
conditional on u, a and e,censoring
isindependent
(e.g. [1, 2]),
thepartial
conditional(censoring omitted)
likelihood(e.g.
(15!).
Under theassumptions given above, equation
(5)
becomes_ _ r _ _ i
where
D(t(m»)
is the set of animals that failed at timet!&dquo;,!,
d(t(
m»
)
is the number of animals that failed at timet!&dquo;,!,
andR(t!&dquo;,!)
is the set of animals at risk offailing
at timet(
m
)
’
Furthermoreassuming that,
conditional on u, aand e,
censoring
is non-informative for!,
then thejoint posterior
distributionof o
is,
using
Bayes’
theorem,
obtained up toproportionality by
multiplying
the conditional likelihood and theprior
distributionof 0
where
p((y, 8
) 11/i)
is the conditional likelihoodgiven
by
equation
(6)
andp(qp)
isthe
prior
distribution ofparameters
andhyperparameters given by equation
(4).
2.3.Marginal posterior
distributions and Gibbssampling
If cp is a
parameter
or a subset ofparameters
of interest from1/i,
themarginal
posterior
distribution of cp is obtainedby integrating
out theremaining
param-eters from the
joint posterior
distribution. If this can not beperformed
ana-lytically
for one or moreparameters
ofinterest,
Gibbssampling
[12, 14]
canbe used to obtain
samples
from thejoint posterior distribution,
andthereby
also from anymarginal
posterior
distribution of interest. Gibbssampling
isan iterative method for
generation
ofsamples
from a multivariate distributionwhich has its roots in the
Metropolis-Hastings
algorithm
[17, 24!.
The Gibbssampler produces
realizations from ajoint posterior
distributionby sampling
repeatedly
from the full conditionalposterior
distributions of theparameters
in the model. Geman and Geman
[14]
showedthat,
under mildconditions,
and after alarge
number ofiterations,
samples
obtained are from thejoint posterior
distribution.
2.4. Full conditional
posterior
distributionsIn order to
implement
the Gibbssampler,
the full conditionalposterior
distributions of all theparameters
in 1/i
must be derived. Thefollowing
notation is used: that1/i
B
<p
denotes 1/i
except
cp; e.g. if cp ={3,
then1/i
V3
is(A
01’
... ,
A
om
,
u,o’!,
a,<r!,
e,o, e 2)
The full conditionalposterior
distribution of cpgiven
data and all theremaining
parameters,
1/iB<p’
isproportional
to thejoint
posterior
distributionof 1/i given by
equation
(7).
where
Of
=!!
exp{ai+ei+x!,8}Aom
andd(u
j
)
is the number of animals !n,a!.&dquo;,,! < y;,that failed from the
jth
class of u andS( Uj)
is the set of animalsbelonging
to thejth
class of u. Fori,
an animal withrecords,
the full conditionalposterior
distribution of ai is
given by
where
Of =
L
exp{uj+et+x!}Aomand{!4’’-’}aretheelementsofA !.
m:t!m! Yi
For an
animal, i,
withoutrecord,
the full conditionalposterior
distribution ofa
i follows a normal distribution
according
to/ B
The full conditional
posterior
distribution ofei,
i =1, ... , n,
is,
up topropor-tionality,
given
by
where
Of
=L
exp{ Uj
+ ai-f- xi/3!Ao!&dquo;,
and the full conditionalposte-ma!&dquo;,! < y;
rior distribution of each
regression
parameter
,!6, b =
1, ... ,
B isgiven by
The full conditional
posterior
distribution of each of thehyperparameters
<7!, <r!
and
afl is
inverse gamma,according
to:and
Sampling
from gamma, inverse gamma andnormalely
distributed random variables isstraightforward.
The full conditionalposterior
distribution of u!, of ai, fori,
an animal withrecords,
of ei and ofregression
parameters,
given
by equations
(8), (9), (11)
and(12),
respectively,
can all be shown[21]
to belog
concave, and thereforeadaptive rejection
sampling
[16]
can be usedto
sample
from these distributions.Adaptive
rejection
sampling
is useful inorder to
sample efficiently
from densities ofcomplicated
algebraic
form. It isa method for
rejection sampling
from any univariatelog-concave
probability
density
function,
which needonly
bespecified
up toproportionality.
3. AN EXAMPLE
3.1. Data
As an
example,
disease data on future AI-bulls from the Danishperformance
testing
programme for beef traits ofdairy
and dual purpose breeds wereanalysed.
The trait considered was ’time fromentering
test until first timea
respiratory
disease occurred’. The bulls of the Danish Red breed were allperformance
tested in the15-year
period
1982-1996 and entered theAalestrup
test station between 23 and 74days
of age. Bulls which did notexperience
arespiratory
diseaseduring
the testperiod
or which were stillundergoing testing,
on the date of dataanalysis
haveright
censored records. For theseanimals,
itis
only
known that the time at first occurrence of arespiratory disease, T
i
,
willbe
greater
than the time atcensoring, C
i
,
thatis,
either the time at the end of the test(336
days
ofage)
or the time at the date of dataanalysis
or thetime at
being
culled before end of test(a
very rareevent).
Data on animali;
i =
1, ... , n
is(y; ,
6
i
),
where y
i
is the observed value ofY
=min{T
i
,
C
i
}
and
6
i
is a random indicatorvariable, equal
to 1 if arespiratory
disease occurredduring
test,
and 0 otherwise. Data on all animals is(y, 6).
3.2. Model
It is assumed that the hazard
function,
A
i
(t),
ofT
i
,
isgiven
by
where t is time
(in days)
fromentering
test. In(17), A
o
(t)
is the baseline hazardfunction;
x’
=(
X
i
l
,
X i 2
, Xi3,
!i4)
is a vector of covariates of animali;
xilrangesbetween 23 and 74
days
of age in the data and is the animal’s age atentering
test;
xiz ranges between 0.0 and 1.0 and xi3 ranges between 0.0 and 0.78125and are
proportions
of genes fromforeign
populations
(American
Brown Swissand Red Holstein
cattle)
and xi4(which
ranges between 0.0 and1.0)
is thedegree
ofheterozygosity
due tocrossbreeding. x
ii
is included in order to take into account that bulls areentering
test at different ages; Xi2 and x23 in order totake additive effects of gene
importation
into account and xi4 in order to takeaccount of heterosis due to dominance.
{
3’
_(0
1
, ... , Q4)
is thecorresponding
vector of
regression
parameters. Z
i
=exp{h!
+ ks+ ai+e
i
is
thelog normally
distributed
frailty
variable of animal i.h
j
is the effect of thejth
herd oforigin
number of herd of
origin
by
period combination,
and sk is the effect ofentering
test in the kth yearseason
(one
season is 1month),
k =1,...,
K,
where K isthe number of yearseasons. ai is an additive
genetic
effect of animal i and ei is an effect of environment notalready
taken intoaccount;
i =1, ... ,
n, wheren is the number of animals with records. In this
example
J is540,
K is 170 and n is 1 635. Therelationship
among the test bulls was traced back as far aspossible, leading
to a total of N = 5 083 animalsdefining
the additivegenetic
relationship
matrix.3.3
Implementation
of the Gibbssampler
and resultsThe Gibbs
sampler
wasimplemented
withprior
distributionsaccording
tothe
previous
section. Theprior
distributions of thehyperparameters
a 2,
as, or
2 andor2
weregiven
by
inverse gamma distributions withparameters
and
That
is,
theprior
means were ofafl and
Q
a
were 0.1 and theprior
means of0
’;
s 2and
Q
e
were
0.8. Theprior
variance of all thehyperparameters
is 10 000. Thefollowing
starting
values wereassigned
to theparameters
h!°! _
(0, ... , 0)’,
2
(
0
)
=0.1,
s!°>
=(0, ... , 0)’,
as !°!
=0.8,
a(°) =
0
)’,
0
, ... ,
(
2
(
0
)-
0
.
1
,
e!°! _
(0, ... , 0)’,
u
2
(0)
=0.8,
!3!°>
=(0,0,0,0)’.
Sampling
was carried out fromthe
respective
full conditionalposterior
distributions in thefollowing
order,
describing
one round of the Gibbssampler:
1)
sample
1
1
°r&dquo;,;
m =1,...,
M from thegamma distribution
given
by
equation
(16);
2)
sample
h!; j
=1, ... ,
J fromequation
(8)
with uj
= h
j
andusing
adaptive rejection sampling;
3)
sample
afl
from the inverse gamma distributiongiven
by
equation
(13)
with,
a2
=Oh,
q =J,
u = h and(pu, 1
/u)
=(p
h
,
Vh
);
4)
sample
ai from the normal distributiongiven by
equation
(10)
if i isan animal without
records;
if i is an animal withrecords,
ai issampled
fromequation
(9)
withh
j
+ s,! substituted for uj inOf
andusing
adaptive
rejection
sampling;
5)
sample
Q
a
from
the inverse gamma distributiongiven by equation
(14);
6)
sample
ei; i =1, ... ,
n from
equation
(11) with h
j
+ sk substituted forUj in
Of
using
adaptive rejection sampling;
7)
sample
Q
e
from
the inverse gamma distributiongiven
by
equation
(15);
8)
sample
(3
b
;
b =1, 2, 3,
fromequation
(12)
withh
j
+ Sk substituted for9)
sample
sk k =1, ... ,
K fromequation
(8)
with uj = sk andusing adaptive rejection
sampling;
10)
sample
u2
from
the inverse gamma distributiongiven
by
(13)
witha£ = 0’;,
q =K,
u = s andu,
M
(
v
u
) =
(u
s
,
v
s
).
After 40 000 rounds of the Gibbs
sampler,
8 000samples
of modelparameters
were saved with a
sampling
interval of20;
i.e. a total chainlength
of 200 000.After each round of the Gibbs
sampler,
thefollowing
standardizedparameters,
oflog frailty,
werecomputed
where
Q
z
=cr!
+a/
+a§
+ae
is the variance oflog frailty
(not
of survivaltime).
Summary
statistics of selectedparameters
are shown in table 1.The rate of
mixing
of the Gibbssampler
wasinvestigated by
estimating
lag-correlations in a standard time series
analysis.
Lag
1 andlag
10 correlations(lag
1corresponds
to 20 rounds of the Gibbssampler)
aregiven
in table I.N
e
is the effective
sample size,
derived from the method ofbatching
(e.g. !13!).
The chain ofsamples
from themarginal
posterior
distribution ofQ
a
has very slowmixing properties.
This is reflected in the standardizedparameters
aswell,
The
marginal
posterior density
ofa
and
the standardizedparameters
h
2
,
c2,
c2
ande
2
(of
log
frailty)
are shown infigure
1. The densities were estimatedby
themethodology
of Scott!27!.
The mean of themarginal posterior density
of
h
2
is 0.14 whereh
2
is
heritability
oflog
frailty.
If herd oforigin
by period
and yearby
season had been considered as fixed effects rather than randomeffects,
thenheritability
oflog
frailty
would have beenhigher.
The
marginal
posterior
densities of{3
1
,
,
2
/?
Q3
and are shown infigure
2.Because the
marginal
posterior
mean of(3
1
,
the effect of age atentering
test,
isnegative
(-0.0061),
the hazard is decreasedby increasing
age atentering
test. Thatis,
for a bullentering
test at 23days
of age, the conditional hazard isalways
exp{-0.0061
x23}/exp{-0.0061
x74}
=0.87/0.64
= 1.36higher,
thanthat of a bull
entering
test at 74days
of age; conditional onfrailty
and othercovariates
being
the same for the two bulls.Similarly,
because themarginal
posterior
mean of(
3
4
,
the effect ofheterozygosity,
isnegative
(-0.22),
the hazard is decreasedby increasing
thedegree
ofheterozygosity.
Themarginal
posterior
means of additive effects of geneimmigration
from American BrownSwiss and Red Holstein cattle are close to 0.0. The
marginal
posterior
mean ofh
2
,
theheritability
oflog frailty,
is0.1406;
ofc!,
theproportion
of variation inlog frailty
due to herd oforigin
by
period
combination is 0.0913 and ofc!,
theproportion
of variation inlog
frailty
causedby
a yearby
season effect is 0.2766.4. DISCUSSION
This paper illustrates how a
Bayesian
analysis
can be carried out in thesemiparametric log
normalfrailty
modelusing
Gibbssampling.
In the versionof the Gibbs
sampler
implemented
here,
samples
wererepeatedly
taken from univariate full conditionalposterior
distributions. This isonly
onepossible
implementation.
Withhighly
correlated univariatecomponents
it could bepreferable
tosample
from thejoint
conditionalposterior
distribution of thesecomponents.
Theadvantage
of this method is itsgreater
speed
of convergence tothe
joint
posterior
distribution[23).
Themethodology
isquite
general
and couldobviously
be used in fullparametric
models and in models with stratificationand/or
time-dependent
covariates. It is timeconsuming
tosample
thousands of observations from thousands ofdistributions; but,
analysing
therelatively
small dataset in section 3 left usoptimistic
aboutpossibilities
foranalysing
larger
datasets.Another
possibility
is toperform
aBayesian
analysis
using Laplace
integra-tion. This was carried out
by Ducrocq
and Casella[9]
withspecial emphasis
on
obtaining
themarginal
posterior
distribution ofparameters,
T, of thedis-tribution of
frailty
terms. Theirprior
distributions offrailty
terms were either gamma orlog
normal.They
point
out that thecomputations
mayquickly
be-come too
heavy
and propose to summarize theapproximate
marginal
posterior
of T
through
its first 3 momentsusing
the Gram-Charlier seriesexpansion
of a function. The moments are foundusing
numericalintegration
based onGauss-Hermite
quadrature
followedby
an iterativestrategy
to obtainprecise
estimates. When
frailty
terms are gammadistributed,
these can beintegrated
outalgebraically
to obtain themarginal
posterior
distribution of theremaining
parameters.
From a likelihoodpoint
ofview,
gamma distributed frailties canThis is so in correlated gamma
frailty
models as well(e.g.
!22!)
but not withlog
normally
distributedfrailty
terms. If T is a vector of variancecomponents
oflog frailty
terms from alog
normalfrailty
model such asequation
(2),
Laplace
integration
may be considered toocostly
(9!.
These authorssuggested letting
some
frailty
terms be gamma distributed and otherslog normally
distributed inorder to
integrate
outalgebraically
somefrailty
terms.Implementation
of the Gibbssampler
in alog
normalfrailty
model avoidsmaking
themathematically
tractable but somewhat artificial choice of a gamma distribution of
frailty
terms.
In this paper some of the
parameters
are modelled withimproper prior
as
pointed
outby
Hobert and Casella!18!,
the mathematicalintractability
that necessitates use of the Gibbssampler
also makesdemonstrating
thepropriety
of the
posterior
distribution a difficult task. The fact that the full conditionalposterior
distributionsdefining
the Gibbssampler
are all proper distributionsdoes not
guarantee
that thejoint posterior
distribution will be proper. Thequestion
of whichimproper prior
distributions willyield
properjoint posterior
distributions in hierarchical linear mixed models was addressed
by
Hobert andCasella
(18!.
Theimportant question
ofpropriety
of theposterior
distributionusing improper prior
distributions must be dealt with in aBayesian analysis
using
aLaplace integration
as well. One way to avoidimproper posteriors
is touse proper
prior
distributions.The
analysis
could have been carried out withoutaugmenting
[29]
by
additivegenetic
values of animals without records. The number of animalsdefining
the additivegenetic
relationship
matrix A was5 083,
butonly
1635had records. Let a be the
vector
of additivegenetic
values of animals withrecords,
thentaking
thepart
A
of Arelating
to animals withrecords,
theanalysis
could have been carried outusing
theprior
£[a
§
-
N.!(0, AQa).
The number of distributionsneeding
sampling
from would have been 5 083-1635 lower at the expense ofobtaining marginal
posterior
distributions ofbreeding
values on animals without records. HoweverA-
1
,
is much denser thanA-
1
,
and is more difficult tocompute.
The model of
log frailty, specified
by
equation
(2),
caneasily
begeneralized
to include more
independent
variancecomponents
and/or
theassumption
ofindependence
can berelaxed;
forexample
inequation
(2),
u couldrepresent
amaternal effect.
Assuming
that(u’, a’)’1 G rv N
2N
(0, G
Q9A),
where G is the2 x 2 matrix of additive
genetic
covariances,
and that theprior
distribution of G is inverse Wishartdistributed,
then the full conditionalposterior
distribution ofG,
required
for Gibbssampling,
will also be inverse Wishart distributed.Furthermore,
it should bepossible
to carry out a multivariateanalysis
of asurvival
trait,
aquantitative
trait and acategorical trait,
by
assuming
thatthe
log frailty
of the survivaltrait,
thequantitative
trait and theliability
of thecategorical
trait follow a multivariate normal distribution. It should alsobe
possible
togeneralize
to anarbitrary
number of survivaltraits,
quantitative
traits and
categorical
traits.By definition,
the trait considered in theexample:
’time until first occurrenceof a
respiratory
diseaseduring
test’ can occur at most once for each animal. These data are,however,
only
a subset of the databeing
collected on bullsduring
the testperiod.
Eachrepeated
occurrence of arespiratory
disease isrecorded,
as well as othercategories
of diseases and severalquantitative
traitsand other traits of interest. Oakes
[25]
gives
a survey offrailty
models formultiple
event timedata,
and it would beinteresting
to extend thelog
normalfrailty
models to allow formultiple
survival times for each animal as well as amultivariate
analysis
of censored survival time data and other traits of interest.ACKNOWLEDGEMENTS
We wish to thank Anders Holst
Andersen, Department
of TheoreticalStatistics,
REFERENCES
[1]
AndersenP.K., Borgan 0.,
GillR.D., Keiding N.,
Statistical Models Basedon
Counting Processes, Springer,
NewYork,
1992.[2]
Arjas
E.,
HaaraP.,
A markedpoint
processapproach
to censored failure data withcomplicated
covariates, Scand. J. Stat. 11(1984)
193-209.[3]
BreslowN.E.,
Covarianceanalysis
of censored survival data, Biometrics 30(1974)
89-99.[4]
Clayton D.G.,
A Monte Carlo method forBayesian
inference infrailty models,
Biometrics 47
(1991)
467-485.[5]
CoxD.R., Regression
models and life tables(with discussion),
J. R. Stat. Soc.B 34
(1972)
187-220.[6]
Dellaportas P.,
SmithA.F.M., Bayesian
inference forgeneralized
linear andproportional
hazards models via Gibbssampling, Appl.
Statist. 42(1993)
443-459.[7]
Dempster A.P.,
LairdN.M.,
RubinD.B.,
Maximum likelihood fromincom-plete
data via the EMalgorithm
(with discussion),
J. R. Stat. Soc. Ser. B 39(1977)
1-38.
[8]
Ducrocq V.,
Statisticalanalysis
oflength
ofproductive
life fordairy
cows of the Normandebreed,
J.Dairy
Sci. 77(1994)
855-866.[9]
Ducrocq V.,
CasellaG.,
ABayesian analysis
of mixed survivalmodels,
Genet. Sel. Evol. 28(1996)
505-529.[10]
Ducrocq V., Quaas R.L.,
PollakE.J.,
CasellaG., Length
ofproductive
life ofdairy
cows. 2. Variance component estimation and sire evaluation, J.Dairy
Sci. 71(1988)
3071-3079.[11]
GaudermanW.J.,
ThomasD.C.,
Censored survival models forgenetic
epi-demiology:
A Gibbssampling approach,
Genet.Epidemiol.
11(1994)
171-188.(12!
GelfandA.E.,
SmithA.F.M., Sampling-based approaches
tocalculating marginal
densities,
J. Am. Stat. Assoc. 85(1990)
398-409.[13]
GelmanA.,
CarlinJ.B.,
SternH.S.,
RubinD.B., Bayesian
DataAnalysis,
Chapman
andHall, London,
1995.[14]
GemanS.,
GemanD.,
Stochasticrelaxation,
Gibbsdistributions,
and theBayesian
restoration ofimages,
IEEE Trans. PatternAnalysis
and MachineIntelli-gence 6
(1984)
721-741.[15]
GillR.D., Johansen S.,
A survey ofproduct-integration
with a view towardsapplication
in survivalanalysis,
Ann. Stat. 18(1990)
1501-1555.[16]
GilksW.R.,
WildP., Adaptive rejection sampling
for Gibbssampling, Appl.
Stat. 41(1992)
337-348.[17]
Hastings W.K.,
Monte Carlosampling
methodsusing
Markov chains and theirapplications,
Biometrika 57(1970)
97-109.[18]
HobertJ.P.,
CasellaG.,
The effect ofimproper priors
on Gibbssampling
inhierarchical linear mixed
models,
J. Am. Stat. Assoc. 91(1996)
1461-1473.[19]
KalbfleischJ.D., Nonparametric Bayesian analysis
of survival timedata,
J. R. Stat. Soc. B 40(1978)
214-221.[20]
KalbfleischJ.D.,
PrenticeR.L.,
The StatisticalAnalysis
of Failure TimeData,
JohnWiley,
NewYork,
1980.[21]
Korsgaard LR.,
Geneticanalysis
of survivaldata,
Ph.D. thesis,University
ofAarhus, Denmark,
1997.[22]
Korsgaard I.R.,
AndersenA.H.,
The additivegenetic
gammafrailty model,
[23]
LiuJ.S., Wong W.H., Kong A.,
Covariance structure of the Gibbssampler
withapplications
to the comparisons of estimators andaugmentation schemes,
Biometrika 81(1994)
27-40.[24]
Metropolis N.,
RosenbluthA.W.,
RosenbluthM.N.,
TellerA.H.,
TellerE.,
Equations
of state calculationsby
fastcomputing machines,
J. Chem.Phys.
21(1953)
1087-1091.
[25]
OakesD., Frailty
models formultiple
event-timedata,
in: KleinJ.P.,
GoelP.K.
(eds.),
SurvivalAnalysis:
State of theArt,
Kluwer AcademicPublishers,
TheNetherlands, 1992,
pp. 371-380.[26]
PetersenJ.H.,
AndersenP.K.,
GillR.,
Variance component models for sur-vivaldata,
Statistica Neerlandica 50(1996)
193-211.[27]
ScottD.W.,
MultivariateDensity
Estimation:Theory, Applications,
andVisualisation,
JohnWiley,
NewYork,
1992.[28]
SorensenD.A.,
AndersenS., Gianola D., Korsgaard LR., Bayesian
inferencein threshold models
using
Gibbssampling,
Genet. Sel. Evol. 27(1995)
229-249.[29]
TannerM.A., Wong W.H.,
The calculation ofposterior
distributionsby
dataaugmentation,
J. Am. Stat. Assoc. 82(1987)
528-540.[30]
YashinA.,
IachineI.,
Survival of related individuals: an extension of some fundamental results ofheterogeneity analysis,
Math.Popul.
Studies 5(1995)
321-340.[31]
YashinA.L, Vaupel J.W.,
IachineLA.,
Correlated individualfrailty:
anad-vantageous
approach
to survivalanalysis
of bivariatedata,
Math.Popul.
Studies 5(1995)
145-160.[32]
Vaupel J.W.,
MantonK.G.,
Stallard E., The impact ofheterogeneity
inindividual
frailty
on thedynamics
ofmortality, Demography
16(1979)
439-454.APPENDIX: Convention used for gamma, inverse gamma and
log
normal distributions
The
following
convention is used for gamma and inverse gammadistri-butions: let
X - r(a, 0),
withdensity
p(x)
=e-
3
(
x/
l
-
a
x
[f(
a
),8
a
r
1
.
ThenE(X)
=a,8
andV(X) = a,0
2
=,8E(X).
Y =X-’
has an inverse or invertedgamma distribution Y -
IG(a,,8)
withdensity
p(y)
=-(
a)
y
y
e
l
&dquo;+
W
-1
[f(a),8
a
r
1
andE(Y)
_
[(a -
1),8]-
1
for a > 1 andV(Y)
=
[(a -
1)
2
(
a
-
2
)
0
2
]-l
=(E(I’))
2
(a -
2)
for
a > 2.If X -