HAL Id: hal-00902184
https://hal.archives-ouvertes.fr/hal-00902184
Submitted on 1 Jan 1994
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Advantages and inconveniences of the Cox model
compared with the logistic model: application to a study
of risk factors of nursing cow infertility
F Bugnard, C Ducrot, D Calavas
To cite this version:
Advantages
and
inconveniences
of the
Cox
model
compared
with
the
logistic
model:
application
to
a
study
of
risk factors of
nursing
cow
infertility
F
Bugnard
C
Ducrot D Calavas
Centre
d’Écopathologie
Animale,
26, rue de la Baisse, 69100 Villeurbanne, FranceSummary ―The
survival Cox model and thelogistic
model werecompared
on a data set obtained from anecopathological survey
relative to the risk factors ofnursing
cowinfertility.
The risk factorsresulting
from the 2 models were the same. The Cox model has the
advantage
ofpreserving
the variable in itsoriginal quantitative
form, and ofusing
a maximum of information. However, very restrictive condi-tions ofapplication
of this model make its use rather limited.ecopathology
I survivalmodel 1 logistic
modelRésumé -
Avantages
et inconvénients du modèle de Cox par rapport au modèlelogistique :
application
à l’étude des facteurs derisque
de l’infécondité des vaches allaitantes. Le modèle de survie de Cox et le modèlelogistique
ont étécomparés
surun jeu
de données issu d’uneenquête
d’écopathologie
relative aux facteurs derisque
de l’infécondité des vaches allaitantes. Les facteurs derisque
issus des 2 modèles étaient les mêmes. Le modèle de Coxprésente l’avantage
de conserver à la variable étudiée sa formequantitative d’origine
et d’utiliser le maximum d’information.Cepen-dant,
les conditionsd’application
très restrictives de ce modèle sont une limite à son utilisation.écopathologielmodèle
de survielmodèlelogistique
INTRODUCTION
Logistic
regression
iswidely
usedfor
investi-gation
of risk factors inepidemiology
ingen-eral and in the
Centre
d’Ecopathologie
Ani-male inparticular.
There are 2 main reasonsfor this.
Firstly,
thepathologies
studied areoften characterized as absent or
present
and we wish to
explain
the risk of an occur-renceof
the disease. In this sense, the logis-*Correspondence
andreprints
tic model is
appropriate
as it allows thepre-diction of the
probability
of an event. On the otherhand,
anadvantage
of thelogistic
model is that theparameters
can beinter-preted
asbeing
thelogarithms
of the odds ratios ofexplanatory
variables.However,
when the
pathology
studied is character-izedby
a time interval(eg, calving
interva-landbreeding-to-conception period),
logis-tic
regression
seems less suitable.Indeed,
of
time into discreteclasses,
which leads toan
important
loss of information and poses aproblem
of whichgroup
endpoints
tochoose. The survival models
and,
inpartic-ular the Cox
model,
appear,
therefore,
tobe better
adapted.
Thesemodels,
initially
created for cancer research in order to
study
the survival
time of
patients
after
therapy,
allow thestudy
of
time intervals without divi-sion intoclasses.
Inagreement
withthe
logistic
model,
theparameters
of the Cox model can beeasily interpreted
sincethey
are
the
logarithms
of
the relative risks ofexplanatory
variables. Tocompare
theirresults,
both thelogistic
and the Cox modelwere
applied
to the same data set collectedin a
survey
carried outby
theCentre
d’Eco-pathologie
Animale in order tohighlight
the riskfactors
ofnursing
cowinfertility.
MATERIALS AND METHODS
The data were collected
during
a survey carriedout from 1987 to 1989
by
the Centred’fco-pathologie
Animale in 116 farms in theRh6nes-Alpes
and Centreregions
and in the district of Yonne(Ducrot, 1993).
In the course of thisperiod,
3 583
mating
cows wereindividually
monitored after their firstcalving during
the survey. Theinfertility
of cows was determinedby
thecalving
interval. The covariates introduced in thelogis-tic and the Cox model were selected from the
hypotheses
of the risk factors with thehelp
of univariateanalyses
(x
2
,
Logrank
and Wilcoxontest)
andonly
those with a 20%significant
link withinfertility
werekept.
The 20% level was cho-sen in order not to eliminate covariates which had little effect on the disorder when studiedsep-arately,
but, if associated,might
have astrong
influence upon the disorder(Hosmer
andLemeshow,
1989).
The selected factors were: the characteristics of the cow and the calf(parity,
breed of the cow, number, sex,weight,
and pre-sentation ofcalves);
thecalving
conditions(calv-ing difficulty,
characterizedby
thetype
ofinter-vention,
retainedplacenta,
characterizedby
the fact that theplacenta
is not eliminatedsponta-neously
within 24 h ofcalving,
and acute metritis withgeneral
symptoms
appearing
within 2 d ofcalving);
and themating
conditions(calving
toturning-out-to-graze period,
whether or not the cows turned out to graze beforemating,
cleanli-ness score of the cows at the time ofturning
them out to graze measured on 4 anatomic zones of the hindquarters
(Faye
and Barnouin,1985),
type
of basal ration after
calving, quantity
ofconcen-trates distributed after
calving, fattening
score at the time ofturning
out to graze determinedby
manualhandling (Agabriel
etal,
1986), fattening
variation
during winter).
Two additional data ele-ments were taken into consideration asadjust-ing
factors: thegeographic region
and thetype
of winter
housing.
In the
following,
the vectorzi=
= (Zj
1,Z
j
2,...,
z
i
P
)
refers to the p covariates measured for theindi-vidual j U
1, ...,n).
The
logistic
modelThe
calving
interval was divided into 2 discrete classes; the cows were considered infertile if thecalving
interval was more than 368d,
or ifthey
were not fertilized, or ifthey
were fertilizedbe-latedly,
ie if the zootechnicalobjective
of one calf per year was not reached. Thus, we tried toexplain
an event Y coded aspresent
or absent(infertile/fertile).
).
If
P(Z)
is theprobability
of a cowbeing
infertileknowing
Z== (
Z1’
z2, ...,zp),
thelogistic
model isdefined as follows:
where b
(bo, b
1
, ...,
bp)
is the vector of the modelparameters.
The
logistic
model wasadjusted
to the datausing
the maximum likelihood estimation method. Theresulting
model wasanalyzed by
comparing
globally
all estimated parametersb
i
to 0,using
the likelihood ratio test, and alsoby comparing
eachparameter
to 0,using
the Wald test(Hosmer
and
Lemeshow,
1989).
Theparameters
notsig-nificantly
different from 0 at the 5% level were eliminated.Finally,
thegoodness
of fit of the final model was determinedby
theX
2
test of Hosmer rand Lesmeshow
(1989)
which compares the num-ber of thepredicted
infertility
cases to the number of the observed cases. Thisanalysis
was carried out with the SAS software, LOGISTICprocedure
The Cox model
The
calving
interval was studied withoutbeing
divided into classes. The studied interval was from
calving
to the end of the surveyperiod
for the cows that did not have a secondcalving
before the end of the survey(censored cows).
Let the hazard function
h(t,Z)
be theproba-bility
ofcalving
at the time tfor a cowknowing
Z=
(
z
i,
z2,...,zp).
The Cox model is:where b’
= (b’
1’
b’
2
,
...,b’p)
is the vector of themodel
parameters.
Theseparameters
are esti-matedby
the maximum likelihood methodusing
the Coxpartial
likelihood;
l1o(t)
remains an unspec-ified function which does not need to be esti-mated(Cox
andOakes,
1984).
The
analysis
wasperformed
in 2stages.
The utilization of the Cox model is bound to the veri-fication of theproportional
hazardsassumption
(which
means, in otherwords,
that the effect of the factors must beindependent
oftime),
and so the firststage
consisted inverifying
thishypothesis by
representing
thelogarithm
of thenegative
loga-rithm of the survival function of each class as a function of time for each covariate(fig 1
Accord-ing
to theproportional
hazardsassumption
thecurves must not cross each other
(Lee
etal,
1989).
Covariatespresenting
many classes did notverify
thishypothesis,
and so wegrouped
classes in order to obtain variablessatisfying
thisapplication
condition. For the cleanliness score, the curves of the 2 classes did cross each other(fig 2).
Sincegrouping
wasimpossible,
this vari-able wasintegrated
in the model but the esti-matedparameters
could not beinterpreted.
In the course of a secondstage,
the Cox model wasapplied
to the whole data set. As for thelogistic
model, the estimated
parameters
wereanalyzed
by
the likelihood ratio test and the Wald test(Cox
and
Oakes,
1984).
The covariates were elimi-nated from the model in the same way as in thelogistic
model(parameters
notsignificantly
dif-ferent from 0 at the 5%level).
Thisanalysis
was carried out with the SASsoftware,
PHREG pro-cedure(SAS
InstituteInc,
1991
).
The methods that allow an estimation of the
goodness
of fit of the Cox model have notyet
beencompletely developed. They
are basedmainly
ongraphical
methods and were notimple-mented in this
study.
RESULTS
The 2 final models
gave
the same risksig-nificant for
parity,
acutemetritis,
and clean-liness score. For retainedplacenta,
the 2models
gave almost
the same P values(logistic
model: P =0.04;
Cox model: P =0.0521
).
However,
there are 2 differences. In thelogistic
model,
for the variable’calving
diffi-culty’,
theparameters
ofthe
classes
’as-sistance withcalf
puller (1 person)’
and ’forcible extraction’ were notsignificantly
different from
0,
butthey
were in the Cox model. On thecontrary,
for
the variable’calv-ing
toturning-out-to-graze period’ the
param-eters of the classes ’< 1month’,
’1-2months’ and ’2-3 months’ were not
signifi-cantly
different
from 0 in theCox
model,
butthey
were in thelogistic
model.Finally,
for the
2adjusting
variables,
itwas observed that:
(i)
for thetype
of winterhousing,
the same results were obtained in the 2models;
and(ii)
for thegeographic
region,
the classes ’Loire’ and ’Centre/ Yonne’ werepositioned
in the same wayrelative
to ’Rh6ne-Alpes’ (Loire being
morefavorable,
’Centre/Yonne’ lessfavorable).
However,
’Loire’ was significant only for the
logistic
model,
while ’Centre/Yonne’ wassignificant
only
for the Cox model.DISCUSSION
Considering
riskfactors,
the results of bothmodels
wereequivalent.
In ourexample,
the loss of information due to the division into classes of the
calving
interval did notseem to influence the results.
The
Cox
model has 3 mainadvantages.
First,
it allows us to take into account all thecows even if
they
were not observed for thesame
period
of time and even if thestud-ied event
(next
calving)
did not takeplace
before
the end of the survey. Thisadvan-tage
wasarbitrarily
blurred to a certainextent in our
example
insofar as the cowsthat did not have a second
calving
could beintegrated
in thelogistic
model(as they
weredeclared
infertile).
Second,
it does notrequire
the division of the studiedvariable
into discrete classes. This is anadvantage
as it allows us tokeep
eliminates the
always
difficultproblem,
of whichgroup
endpoints
to choose.Finally,
it allows us toformulate
theresults
according
tothe increase
in the stud-ied time interval in d when the factor ispre-sent
(Lee
et al,
1989).
Forexample,
acutemetritis increases the
calving
intervalby
23 d. Forpopularization,
thisway
ofpre-senting
the results is morecomprehensible
than the
presentation
in terms of the odds ratio or evenof
relative risk.Despite
all theseadvantages
the use ofthe
Cox
model isvery
limitedby
itsvery
restrictive
application
conditions(the
haz-ards should beproportional).
Indeed,
wenoticed
in ourexample
that if the studied variable consisted of alarge
number ofclasses,
the hazards wererarely
propor-tional all
along
the curve. This leads to groupclasses. In extreme cases, this condition
can
prevent
us fromintegrating
certain vari-ables into the model.However,
problems
only
arise if the devi-ation from theproportional
hazards assump-tion islarge
and thefrequency
of the studiedevent is
high (the
secondcalving,
in ourcase).
In this case estimatedparameters
are
greatly
influenced
(Ingram
andKlein-man,
1989)
and therefore cannot beinter-preted.
As for the power of the 2
models,
the Cox model and thelogistic
model seem tobe
statistically equivalent
foridentifying
the riskfactors
when thepercentage
of the stud-ied events islow,
whereas the Cox model ismore
powerful
when thepercentage
ishigh
(Annesi
et al,
1989).
The conclusion is that when thefollow-up
time issufficiently
shortor the survival rate is
high,
the choice of model mustdepend
on convenience(for
theuser), availability
ofcomputer
software,
and
expertise.
REFERENCES
Agabriel
J, Giraud JM, Petit M(1986)
Détermi-nation et utilisation de la note d’étatd’engraissement
en6levage
allaitant. Bull Techn CRZV Theix INRA 66, 43-50Annesi
I, MoreauT,
Lellouch J(1989) Efficiency
of the
logistic regression
and Coxproportional
hazards models in
longitudinal
studies. StatMed 8, 1515-1521
Cox
DR,
Oakes D(1984) Analysis
of Survival Data.Chapman
and Hall, London, 201 p Ducrot C(1983) Approche biométrique
desfac-teurs de
risque
despathologies d’élevage
apartir
desenqubtes d’écopathologie.
Appli-cation a l’infbcondit6 des vaches allaitantes. Université Claude Bernard,
Lyon,
These dedoctorat,
118 p pFaye
B,
Barnouin J(1985) Objectivation
de lapropreté
des vaches laitibres et des stabula-tions - L’indice depropreté.
Bull Techn CRZVTheix INRA 59, 61-67
Hosmer DW, Lemeshow S
(1989) Applied
Logis-ticRegression. Wiley,
New York, 307 pIngram
DD, Kleinman JC(1989)
Empirical
com-parisons
ofproportional
hazards andlogistic
regression
model. Stat Med 8, 525-538 Lee LA,Ferguson JD, Galligan
DT(1989)
Effectof disease on
days
open assessedby
survivalanalysis. J Dairy Sci 72,
1020-1026 SAS Institute Inc(1990)
SAS/STA T user’sguide
2Version 6,