HAL Id: hal-00893984
https://hal.archives-ouvertes.fr/hal-00893984
Submitted on 1 Jan 1993
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
A marginal quasi-likelihood approach to the analysis of Poisson variables with generalized linear mixed models
Jl Foulley, S Im
To cite this version:
Jl Foulley, S Im. A marginal quasi-likelihood approach to the analysis of Poisson variables with
generalized linear mixed models. Genetics Selection Evolution, BioMed Central, 1993, 25 (1), pp.101-
107. �hal-00893984�
Note
A marginal quasi-likelihood approach
to the analysis of Poisson variables with generalized linear mixed models
JL
Foulley S Im
1 INRA, Institut National de la Recherche
Agronomique,
Station de Genetique Quantitative et Appliquee, 78352 Jouy-en-Josas Cedex;
2 INRA, Station de Biométrie et d’Intelligence Artificielle,
31326 Castanet Tolosan Cedex, France
(Received
19 June 1992; accepted 16 November1992)
Summary - This paper extends to Poisson variables the approach of Gilmour, Anderson
and Rae
(1985)
for estimating fixed effects by maximum quasi-likelihood in the analysisof threshold discrete data with a generalized linear mixed model.
discrete variable / Poisson distribution / generalized linear mixed model / quasi-
likelihood
Résumé - Une approche de quasi-vraisemblance pour l’analyse de variables de Poisson en modèle linéaire mixte généralisé. Cet article généralise à des variables de Poisson l’approche de Gilmour, Anderson et Rae
(1985!
destinée à l’estimation par maximum de quasi-vraisemblance des ef,!’ets fi,!és lors de l’analyse de variables discrètes à seuils sous un modèle mixte.variables discrètes / distribution de Poisson / modèle linéaire mixte généralisé
INTRODUCTION
As shown
by Ducrocq (1990),
there has beenrecently
some interest in non linear statisticalprocedures
ofgenetic
evaluation.Examples
of suchmodelling procedures
involve:
1)
the thresholdliability
model forcategorical
data(Gianola
andFoulley, 1983;
Harville andMee, 1984)
and forranking
data incompetitions (Tavernier,
* Correspondence and reprints
1991); 2)
Cox’sproportional
hazard model for survival data(Ducrocq
etal, 1988);
and
3)
a Poisson model forreproductive
traits(Foulley
etal’, 1987).
InFGI,
estimation of fixed
(II)
and random(u)
effects involved in the model is basedon the mode of the
joint posterior
distribution of those parameters. As discussedby Foulley
and Manfredi(1991),
thisprocedure
islikely
to have some drawbacksregarding
estimation of fixed effects due to the lack ofintegration
of random effects.A
popular
alternative is thequasi-likelihood approach (Mc Cullagh
andNelder, 1989)
forgeneralized
linear models(GLM)
whichonly requires
thespecification
ofthe mean and variance of the distribution of data. This
procedure
has been usedextensively by
Gilmour etal b (1985)
ingenetic
evaluation for threshold traits.In
particular,
GAR derived a veryappealing algorithm
forcomputing
estimates of fixed effects which resembles the so-called mixed modelequations
of Henderson(1984).
The purpose of this note is to show how the GARprocedure
can be extendedto Poisson variables.
THEORY
The same model as in FGI is
postulated.
Let Yk be the random variable(with
realized value y! =
0,1,2...) pertaining
to the kth observation(k
= 1,2,...,K).
Given
A k ,
theY!s
haveindependent
Poisson distributions with parameterA k ,
ie:As the canonical link for the Poisson distribution is the
logarithm (Me Cullagh
and
Nelder, 1989), A k
is modelled as:where p
and u are(p x 1 )
and(q x 1 )
vectors of fixed and random effectsrespectively,
and
x’ and z’
are thecorresponding
row incidence vectors, theparameterization in P being
assumed to be of full rank.Notice that
[2]
is an extension to mixed models of the structure of &dquo;linearpredictors&dquo; originally
restricted to fixed effects in the GLMtheory:
see eg Breslow andClayton (1992)
andZeger
et al(1988)
for more detail about the so-calledgeneral
linear mixed models(GLMM). Moreover,
it will be assumed as in other studies(Hinde,
1982;Im, 1982; Foulley
etal, 1987)
that u has a multivariate normal distributionN(0, G)
with mean zero and variance covariance matrixG,
thus
resulting
in what is called thePoisson-lognormal
distribution(Reid, 1981;
Aitchinson and
Ho, 1989).
FGI showed that the mixed model structure in
[2]
can cope with mostmodelling
situations
arising
in animalbreeding
such as, eg,sire,
sire and maternalgrand sire,
and animal models on onehand,
and direct and maternal effects on the other hand.A
simple example
of that is the classical animal model In(a2! )
=x!. p +
ai +pi, forthe
jth performance
of the ith female(eg
ovulation rate of anewe)
as a function ofa
From now on referred to as FGI
(Foulley,
Gianola andIm) ; b from
now on referred toas GAR
(Gilmour,
Anderson andRae).
the usual fixed effects
(eg
herd x year,parity),
the additivegenetic
value ai and a permanent environmental component pi for female i.According
to the GLMtheory,
thequasi-likelihood estimating equations for 13
areobtained
by differentiating
thelog quasi-likelihood
functionQ(j;
y,G) (G being
assumed
known),
with respect to(3,
andequating
thecorresponding quasi-score
function to zero, vix.
As
clearly
shownby
theexpression
in[3],
thequasi-likelihood approach only requires
thespecification
of themarginal
meanvector 11
and of the variancecovariance matrix V of the vector Y of observations.
Given the moment
generating
function of the multivariate normal distribution ieE(exp (t’Y)
= exp[t’1I
+(t’Vt/2)j,
it can be shown that:(Hinde,
1982;Zeger
etal, 1988)
and(Aitchinson
andHo, 1989), 8!l being
the Kroneckerdelta, equal
to 1 if k =l,
and0 otherwise.
Generally
models used in animalbreeding yield,
in the absence ofinbreeding, homogeneous
variances so that for anyk, a 2 = zkGz!
=a 2 ,
andIn ( ILk )
=x!!-1- (a
2
/2). Moreover, letting L!Kxx) _ {exp(z!Gzl) - 1!
andM lKx x)
=Diag 1,U kl ,
variances and covariances of observations defined in
[9]
can beexpressed
in matrixnotations as:
Using
Fisher’sscoring
method based on thegradient
vectorâQ(.)/âfi,
andminus the
expected
value of the Hessian matrix-E[,9’Q(.)Iai3alY],
onegets
aniterative
algorithm
which can beexpressed
under the form ofweighted least-squares
equations:
[t] being
the round of iteration.As in
GAR,
one may consider toapproximate
V. This can beaccomplished
here
using
a first orderTaylor
expansion of exp(z!Gz1)
around G = 0, iereplace
exp
(z!Gz1) -1
1 inL, by z!Gzl’
This approximation islikely
to be realistic aslong
as the u- part of
variation
remains smallenough
in the total variation.Doing
so, V in[10] becomes V
= M +MZGZM,
withZ!Kxql
=, ( Zl
z2, ... , z! , ... ,Zk )’ being
the overal incidence matrix of u.
Putting
this formula into the inverse of W in!12!,
one has
This formula exhibits the classical form
(R
+ ZGZ’ in the usualnotation)
of avariance
cori_ance
matrix of data describedby
a linear mixedmodel;
this allows us to solvefor?
in[11] using
the mixed modelequations
of Henderson(Henderson, 1984),
ie here with:or,
alternatively, defining
The
similarity
between[16]
and the formulagiven by
FGI should be noted.Actually,
here tlk =Eu (.),k) replaces A k ,
thusindicating
the way random effectsare
integrated
out in the GARprocedure.
It should bekept
in mind that the mainadvantage
of[15]
is toprovide
estimatesofp
which can becomputed
in a similar wayas with mixed model
equations
of Henderson(1984).
Theseequations
alsoimply
asa
by-product
an estimate of uwhich,
aspointed
outby
Knuiman and Laird(1990)
about the GAR system of
equations,
&dquo;has no apparentjustification&dquo;.
DISCUSSION
The
procedure
assumes G known.Arguing
from the mixed model structure ofequations
in!15J,
GAR haveproposed
an intuitive method forestimating
G whichmimics classical EM
type-formulae
for linear models. FGI advocatedapproximate
marginal
likelihoodprocedures
based on theingredients
of their iterative systemin #
and u.Actually, applying
suchprocedures
would mean to use a third level ofapproximation;
the first one was resorted toquasi-likelihood procedures
and thesecond one to the use of
!15J
instead of!11J. Alternatively,
pure maximum likelihoodapproaches
based on the EMalgorithm
were alsoenvisaged by
Hinde(1982),
viewing u as
missing
andusing
Gaussianquadratures
toperform
the numericalintegration
of the random effects. More details about methods forestimating
variance components in such non-linear models can be found eg in
Ducrocq (1990),
Knuiman and Laird
(1990),
Smith(1990), Thompson (1990),
Breslow andClayton (1992);
Solomon and Cox(1992)
andTempelman
and Gianola(1993).
It must be
kept
in mind that the mixed model structure in[2] applies
to alarge
variety
of situations. Inparticular,
it can be used to remove extra-Poisson variation when the fit due to identifiedexplanatory
variables remains poor. In such cases,some authors
( eg Hinde, 1982; Breslow, 1984)
havesuggested
toimprove
the fitby introducing
an extra variable into the random component part of[2]
ieby modelling
the Poisson mean as In
(A k )
=xk13
+z!.u
+ ek. Thisprocedure
can beapplied
eg to a sire model so as to fit the fraction(3/4)
of thegenetic
variance that is notexplicitly
accounted for in the model.Finally,
ourapproach
can also be used forpartitioning
the observedphenotypic
variance
(Q!)
into itsgenetic (a9)
and residual(or 2 )
components. Let us assume that the trait isdetermined by
apurely
additivegenetic
model on the transformedscale, ie,
In(A)
+ awhere,
as in!2J,
q is the location parameter, and a -N(0, O ra 2)
isthe
genetic
valuenormally
distributed with mean zero and variancea a. 2 Following
Falconer
(1981),
thegenetic
value(g)
on the observed scale can be defined as themean
phenotypic
value of individualshaving
the same genotype,ie, g
=E(Yla).
Now,
with, using [7]
and[91,
Thus,
theheritability
in the broad sense2 [H
=Q9/(!9 + u § )]
on the observed scale can beexpressed
as:The additive
genetic
variance on the observed scale( Q9* )
can be defined as0!2 ! (E(8g/ 8 a)) 2 Q a (see Dempster
andLerner, 1950;
p222),
oralternatively
as09*
_ [Cov (g, a)J2 ja! (see Robertson,
1950; formula 1, p234). Now, E(agloa) =
Ji,and Cov
(g, a) = poa2.
Both formulaegive
the sameresult,
iewith a
heritability
coefficient in the narrow sense(h 2 = Q9*/a!) equal
toNotice
that,
forufl small enough, Q9 -! a9 *
so that[20]
and[22]
tend to0,.2 / (or + J-l-1),
which can be viewed as theexpression
ofheritability
on the linearscale,
asanticipated by
FGI andexpected
from theexpression
of the system in[15].
CONCLUSION
Although
some other moresophisticated procedures ( eg Bayesian
treatment withGibbs
sampling; Zeger
andKarim, 1991)
can beenvisaged
to make inference aboutGLMM parameters, it has been shown that methods based on the
quasi-likelihood
or related concepts are
reasonably
accurate for manypractical
situations(Breslow
and
Clayton, 1992).
ACKNOWLEDGMENTS
The authors are grateful to V Ducrocq, M Perez Enciso and one anonymous reviewer for their
helpful
comments and criticisms on previous versions of this manuscript.REFERENCES
Aitchinson
J,
Ho CH(1989)
The multivariate Poissonlog
normal distribution.Biometrika
76,
643-653Breslow NE
(1984)
Extra-Poisson variation inlog-linear models. Appl
Stat33, 38-44
Breslow
NE, Clayton
DG(1992) Approximate Inference
in Generalixed Linear Mixed Models. TechRep
No106,
UnivWashington,
SeattleDempster ER,
Lerner IM(1950) Heritability
of threshold characters. Genetics35,
212-236
Ducrocq V, Quaas RL,
PollakEJ,
Casella G(1988) Length
ofproductive
life ofdairy
cows. 1. Justification of a Weibull model. JDairy
Sci71,
2543-2553Ducrocq
V(1990)
Estimation ofgenetic
parametersarising
in nonlinear models.In:
4th
WorldCongr
GenetA PP I
LivestockProd, (Hill WG, Thompson R,
WooliamsJA, eds) Edinburgh,
23-27July 1990,
vol 13, 419-428Falconer DS
(1981)
Introduction toQuantitative
Genetics.Longman, London,
2nd ednFoulley JL,
GianolaD,
Im S(1987)
Genetic evaluation of traits distributed asPoisson-binomial with reference to
reproductive
characters. TheorAppl
Genet 73, 870-877Foulley JL,
Manfredi E(1991) Approches statistiques
de 1’6valuationg6n6tique
desreproducteurs
pour des caractères binaires a seuils. Genet Sel Evol 23, 309-338 GianolaD, Foulley
JL(1983)
Sire evaluation for orderedcategorical
data with athreshold model. Genet Sel
Evol 15,
201-224Gilmour
A,
AndersonRD,
Rae A(1985)
Theanalysis
of binomial databy
ageneralized
linear mixed model. Biometrika72,
593-599Harville
DA,
Mee RW(1984)
A mixed modelprocedure
foranalyzing
orderedcategorical
data. Biometrics40,
393-408Henderson CR
(1984) Applications
of linear models in animalbreeding.
UnivGuelph, Guelph,
OntHinde JP
(1982) Compound
Poissonregression
models. In: GLIM 82(Gilchrist R, ed) Springer Verlag,
NY 109-121Im S
(1982)
Contribution a 1’etude des tables decontingence
aparamètres
al6atoires: utilisation en biom6trie. These 3e
cycle,
Universite PaulSabatier,
Toulouse
Knuiman
M,
Laird N(1990)
Parameter estimation in variance component mod- els forbinary
response data. In: Advances in Statistical Methodsfor
Genetic Im- provementof
Livestock(Gianola D,
HammondK, eds) Springer-Verlag, Heidelberg,
177-189
McCullagh P,
Nelder J(1989)
Generalized Linear Models.Chapman
andHall, London,
2nd edn.
Reid DD
(1981)
The Poissonlognormal
distribution and its use as a model ofplankton aggregation.
In: Statistical Distributions inScientific
Work(Taillie C,
Patil
GP, Baldessari, eds) Reidel, Dordrecht, Holland,
6, 303-316Robertson A
(1950)
Proof that the additiveheritability
on the P scale isgiven by
the
expression z; 2 h;/pq.
Genetics35,
234-236Smith SP
(1990)
Estimation ofgenetic
parameters in non-linear models. In:Advances in Statistical Methods
for
GeneticImprovement of
Livestock(Gianola D,
Hammond
K, eds) Springer-Verlag, Heidelberg,
190-206Solomon
PJ,
Cox DR(1992)
Non linear component of variance models. Biometrika?9, 1-11
Tavernier A
(1991)
Genetic evaluation of horses based on ranks incompetitions.
Genet Sel Evol 23, 159-173
Tempelman RJ,
Gianola D(1993) Marginal
maximum likelihood estimation of variance components in Poisson mixed modelsusing Laplace integration.
GenetSel Evol
(submitted)
Thompson
R(1990)
Generalized linear models andapplications
to animal breed-ing.
In: Advances in Statistical Methodsfor
GeneticImprovement of
Livestock(Gi-
anola