A NNALES DE LA FACULTÉ DES SCIENCES DE T OULOUSE
E LENA A. E ROSHEVA
S TEPHEN E. F IENBERG
B RIAN W. J UNKER
Alternative statistical models and representations for large sparse multi-dimensional contingency tables
Annales de la faculté des sciences de Toulouse 6
esérie, tome 11, n
o4 (2002), p. 485-505
<http://www.numdam.org/item?id=AFST_2002_6_11_4_485_0>
© Université Paul Sabatier, 2002, tous droits réservés.
L’accès aux archives de la revue « Annales de la faculté des sciences de Toulouse » (http://picard.ups-tlse.fr/~annales/) implique l’accord avec les conditions générales d’utilisation (http://www.numdam.org/conditions).
Toute utilisation commerciale ou impression systématique est constitu- tive d’une infraction pénale. Toute copie ou impression de ce fichier doit contenir la présente mention de copyright.
Article numérisé dans le cadre du programme Numérisation de documents anciens mathématiques
http://www.numdam.org/
- 485 -
Alternative statistical models and representations for large sparse multi-dimensional contingency tables
(*)ELENA A. EROSHEVA
(1),
STEPHEN E.FIENBERG (2),
BRIAN W.
JUNKER (1)
Annales de la Faculte des Sciences de Toulouse Vol. XI, n° 4, 2002
pp. 485-505
RESUME. - Une variete de strategies permet d’approcher l’analyse de larges tableaux creux a 1’aide de modeles presentant un nombre modeste
de parametres. Cet article presente quelques unes de ces strategies, insis-
tant sur le role des modeles log-lineaires et le concept d’echangeabilite :
des generalisations multidimensionnelles du modele de quasi-symetrie du a Henri Caussinus, des versions bayesiennes du modele de Rasch en theorie
des questionnaires et ses generalisations, et le modele de degre d’apparte-
nance.
ABSTRACT. - A variety of strategies allow one to approach the analy-
sis of large sparse contingency tables using models with a modest num-
ber of parameters. This paper gives an overview of some of these strate- gies, emphasizing the role of log-linear models and exchangeability: multi- dimensional generalizations of the model of quasi-symmetry due to Henri Caussinus, Bayesian versions of the Rasch model from item response the- ory and its generalizations, and the Grade of Membership model.
(*) Recu le 18 septembre 2001, accepte le 18 septembre 2002
The preparation of this paper was supported in part by Grant No. lR03 AG18986-01,
from the National Institute on Aging and by National Science Foundation Grant No.
EIA-9876619 to the National Institute of Statistical Sciences.
(1) Department of Statistics, Carnegie Mellon University, Pittsburgh PA 15213-3890, U.S.A. E-mail:[email protected] and [email protected]
(2) Department of Statistics and Center for Automated Learning & Discovery, Carnegie
Mellon University, Pittsburgh PA 15213-3890, U.S.A. E-mail: [email protected]
1. Introduction
The literature on
log-linear
models for theanalysis
of multi-dimensionalcontingency
tables which arose in the 1960s and 1970s built on fundamen- tal contributionsby Birch, Darroch, Good, Goodman,
and others(see
theoverview in
Fienberg [29]).
It wasspurred
onby
theoreticaldevelopments (e.g.,
see Haberman[36]), practical applications involving large
sparse tables(e.g.,
the National HalothaneStudy
in the U.S.[13]),
and advances in com-puting
that madepractical
the use ofapproaches
such asDeming-Stephan
iterative
proportional fitting algorithm,
andapproaches
for the Generalized Linear model(e.g., [53]).
From theoutset,
thelog-linear
model literatureplaced heavy emphasis
on thefitting
ofparsimonious models,
e.g.,by setting
as many
higher-order
interaction termsequal
to zero aspossible.
Anotherway to induce
parsimony
isby invoking
forms ofsymmetry
orexchange- ability (see
forexample, Bishop, Fienberg,
and Holland[10], Bhapkar
andDarroch
[8],
Ten Have and Becker[58],
andFienberg, Johnson,
and Junker[30]),
which is one of the focuses of this paper. Without suchparsimonious representations
forlarge
sparsetables,
the likelihoodgets
maximized on theboundary
of theparameter
space and many of the cell estimates are zero(or
minusinfinity
in thelogarithmic scale).
Henri C aussinus’s landmark 1966
paper [15],
while focusedlargely
ontwo-way
tables and the models ofquasi-independence
andquasi-symmetry,
fits well with those theoretical
developments
and stimulated multidimen- sionalgeneralizations
thatappeared initially
inBishop, Fienberg,
and Hol-land
[10], Chapter
8. Caussinus’insights
on the links betweenquasi-indepen-
dence and
quasi-symmetry
led to the reformulation ofquasi-symmetry
asa
log-linear
model inBishop, Fienberg,
and Holland[10].
This in turn al-lowed the
adaptation
of iterativeproportional fitting algorithm (IPF)
as analternative to the related
algorithm
for estimation underquasi-symmetry presented
in Caussinus[15].
The nextsteps
described in thatchapter
thenfollowed
naturally: (1) re-expressing
IPF forquasi-symmetry
in terms of thestandard IPF
algorithm
for no 2nd-order interaction in athree-way
tableconstructed from the
original two-way table,
and(2)
various multivariategeneralizations
ofquasi-symmetry.
The former awaited thesubsequent
the-oretical work of
Meyer [52]
for a firmfoundation,
and the latter remained somewhat of a hiddencuriosity
until its link to other statistical models wasrecognized
in the 1980s and 1990s.In the late
1970s,
OtisDudley Duncan,
adistinguished sociologist,
dis-covered the Rasch model from the educational
testing
literature and wasapplying
it in innovative ways to surveyanalysis (e.g.,
see Duncan[26]).
But he was
stymied
when it came tounderstanding
the theoretical basisfor the
approach
he hadintuitively adopted. Fortunately,
workby Tjur [60]
provided
thekey
to hisproblem and,
ingiving
it a propercontingency
tablerepresentation
inFienberg
andMeyer [31],
we were able to tie itdirectly
to one of the multivariate
generalizations
ofquasi-symmetry!
These sameideas surfaced
again
in the 1990s in work on models forheterogeneity
in amultiple-recapture setting by Darroch, Fienberg, Glonek,
and Junker[23]
and
Fienberg,
Johnson and Junker[30],
andby Agresti [2]
and Coull andAgresti [19].
For related discussion and links to otherlog-linear
and relatedmodels,
see Cox and Wermuth[20]
and the papersby Agresti [3]
and Good-man
[35]
in this issue.Despite
the 35 years that havepassed
since the emergence of thelog-
linear model
literature,
the interest in succinctparametric
models forlarge
sparse
contingency
tablesremains,
andquasi-symmetry
ideas remain one of the essential tools that arebeing
refined and used. In this paper, we re-view three different
approaches
to themodeling
oflarge
sparsecontingency tables,
standardlog-linear models, including
those withquasi-symmetric
interactions
(Section 2),
the related class of Rasch model extensions(Sec-
tions 3 and
4),
and the class of Grade ofMembership (GoM)
models whichare of
special
interest in many of the samesettings
as are the Rasch andquasi-symmetry approaches (Section 5).
.2.
Log-linear
models andquasi-symmetry
Let n =
(nl,
n2 , ... ,, nt)
be a vector of observed counts for tcells,
struc-tured in the form of a cross-classification for n =
~~-1
nc observations.Now let m =
E[n]
=(ml,
m2, ... ,mt)
be the vector ofexpected
values thatare assumed to be functions of unknown
parameters 8’ - (91, 92,
... ,9S ),
where s t.
Let M denote the
log-linear
modelspecified by
m =m (8 ) .
When the tcells form a J-dimensional cross-classification
corresponding
to Jcategorical variables,
then we can rewrite the mostgeneral
version of model m =m(8)
in its more
recognizable
saturatedlog-linear
form asTo make model
(1) identifiable,
werequire
eachsubscripted
u-term to sumto zero over any
subscript following Bishop, Fienberg,
and Holland[10],
orwe set selected u-terms
equal
to zero as in GLIM or S-Plus.By setting
someof the interaction terms in
(1) equal
to zero, weget
a reduced or unsaturatedlog-linear
model.For the
general log-linear model, A4,
the minimal sufficient statistics(MSSs)
aregiven by
theprojection
of n ontoA4 , Pa4 n (e.g.,
see Haberman[36]).
In the case of model(1),
the MSSs consist of themarginal
totalscorresponding
to thehighest
order terms in the model. Forexample
if weset all 3-factor and
higher-order interactions,
then model(1)
reduces toand the MSSs are the
two-way marginal totals, corresponding
to the two-factor terms in
(2).
The
quasi-symmetry strategy
forreducing
the number ofparameters
in(1)
sets all of thehigher-order
terms of the same orderequal
to oneanother,
e.g.,
where
perm[.]
denotes anypermutation
of the sequence of indices in theargument (e.g.,
seeBishop, Fienberg
and Holland[10],
Darroch[22]).
AsTen Have and Becker
[58] note, quasi-symmetry
in this sense involves aform of conditional
exchangeability
ofexpected
cell values which combines both classparameter
invariance(equivalence
of u-terms of similarorder)
and class
parameter symmetry
of any interactioninvolving
all variables ofa class.
Clearly,
we can alsoseparate
out the variouscomponents
of this model and utilize them in a hierarchicallog-linear
fashion.Because the model which combines
(1)
with(3)
islog-linear,
we canread off the MSSs
directly. They
are thetwo-way marginals
which corre-spond
to the the first-order interactionparameters
in(1)
which are notin
quasi-symmetrization
of(3),
and the sums of countscorresponding
tothe
symmetry
orequivalence
sets definedby (3).
Had we alsosymmetrized
the first-order interaction
terms, Ujlj2(ijl ij2)’
we wouldreplace
thetwo-way marginals by
thecorresponding
sums of counts and include the one-waymarginals
in the MSSs.We can also combine the two
strategies
ofmodeling by setting
somehigher-order
termsequal
to zero andassuming
classparameter symmetry
and/or
invariance for others.Computing
maximum likelihood estimates under any of these models isstraightforward using
anygeneralized
linearmodel
computer
program, such as the ones in S-Plus or SAS.3. Variations on the Rasch model
In education and the social
sciences,
the Jcategorical
variables whosecross-classification leads to the table n =
(nl,
n2 , ... ,, nt)
modelled in equa-tion
(1)
often arises as codedsubject
responses to a set of items(questions,
statements,
ortasks)
on surveyforms, self-report inventories,
and mentaltests.
Subject
iresponds
to itemj, for j
=1, 2, ... , J,
coded asXi;
= 1or
0, indicating
theagreement (1)
ordisagreement (0)
ofsubject
i withproposition j,
success(1)
or failure(0)
atperforming
taskj,
presence(1)
or absence
(0)
of asymptom
orfeature j,
etc. The table n is then the table of counts nx of theunique binary
responsepatterns
x =(~i,~2?..’ ~ 1t j ).
Since t =
2~
growsexponentially
withJ,
n istypically large
and sparse.It is often sensible to
separate subject (row)
effects from item(column)
effects in the matrix of
responses X
= i =1,2,..., n; j
=1,2,..., , J}.
One of the earliest models to do this was
developed by Georg
Rasch[55],
and
specifies
row and column effects in an additivelogistic
model for A’:where ()i represents subject
i’spropensity
torespond positively
to any item relative to othersubjects ;
and{3j represents
thedifficulty
ofresponding
pos-itively
to itemj,
relative to other items.As ()i increases,
theprobability
ofresponding
1 toitem j increases,
and as thedifficulty parameter (3j increases,
the
probability
ofresponding
1 toitem j
decreases. The Raschmodel,
to-gether
with similar modelsusing
different forms fordeveloped by Loevinger [48],
Lord[49],
andothers,
became the core of what is now item responsetheory (e.g.,
see Fischer and Molenaar[32]
and van der Lindenand Hambleton
[62]). Clearly,
the number ofparameters 6i
increaselinearly
with the
sample
size n;the ()i
aretypically
treated as latent variables in the model(conditioned
orintegrated
away when estimation ofthe /3j
is de-sired).
In thepsychometric
literature it is common to talk about asingle
latent variable for
ability
but that assumes arelationship
amongthe ()i
suchas that
they
come from a common distribution(see
Section 3.1below).
We further assume "local
independence"
among item responses, thatis,
we assume allXi;
areconditionally independent, given
the ()’s and .3’s.Letting
Xij denote the observed values ofXij,
we may write the likelihoodfor the Rasch model as
The Rasch model in
equation (4)
has theinteresting
invarianceproperty
that the odds ratiocomparing
twoitems j
and k iswhich is
independent
of8i
Hence any two items may becompared,
inprinci- ple, using
any convenientsample
ofsubjects regardless
of theirpropensities
to
respond positively (in practice,
of course,subjects
still contribute at least"small
sample
bias" to the estimated{3’s). Similarly,
the odds ratio com-paring
twosubjects
isindependent
of the itemdifficulty parameters.
Thesetwo invariance
properties
are instances of"specific objectivity,"
aproperty
Rasch consideredimportant
indefining
a measurementmodel,
and fromwhich
equation (4)
can be derived(c.f.
Fischer and Molenaar[32]).
The Rasch model is also
closely
related to theBradley-Terry [12]
modelof
paired comparisons (see Fienberg
andMeyer [31]).
Consider twoitems j
and k for which
{3j ~3~ .
For anysubject
iresponding positively
toonly
one of these two
items,
it follows fromequation (4)
thatindependent
of01 ;
this isexactly
theBradley-Terry model,
which says that thelog
odds ofchoosing item j
from thepair jand k depends only
on aratio of
positive parameters
of the form:~~ /~~ .
We can use the the lack ofdependence
of theright
hand side ofequation (7) on 03B8i
to constructempirical
tests of the Raschmodel,
since we can estimate theprobability
in
equation (7)
withoutassuming
the Rasch model and this should be thesame in any
subpopulation
ofsubjects
in which the Rasch modelapplies.
3.1. The Rasch model and
quasi-symmetry
for the2~
tableTwo estimation methods are
traditionally
associated with the Raschmodel, joint
maximum likelihood(JML)
and conditional maximum likeli- hood(CML).
Work ofErling
Andersen in the 1970’s summarized in Ander-sen
[4],
shows that JML estimators for(3j
obtainedby maximizing equation
(5) jointly
in the 8’s and~3’s,
areinconsistent,
thatis, asymptotically
biasedas n grows and J remains fixed. On the other
hand,
the CMLapproach, conditioning
on row totals in X to eliminate the92’s,
lead to consistent es-timators of
/3~.
Estimates of both the 8’s and the{3’s
in thejoint
likelihoodin
equation (5)
can be made to be consistent if both n and J are allowed to grow at controlled rates(Haberman [37]).
A more
widespread approach
tosolving
theinconsistency problem
withJML has been to assume the
scalars 03B8i
areindependent
random effectsfollowing
a common(discrete
orcontinuous)
distribution withhy- perparameters
~.Integrating over 01
for eachsubject yields
themarginal
likelihood
Maximizing equation (8)
withrespect
to~32’s
and 77(using
an E-M-like al-gorithm,
forexample,
as in Bock and Aitken[11]) yields
what are calledmaximum
marginal
likelihood(MML)
estimates~3~
and7}.
These MML es-timates are also consistent
(asymptotically unbiased).
The method basedon
equation (8)
can beinterpreted
as anempirical Bayes method;
it alsolinks the Rasch model
directly
with other latent variableapproaches
such asfactor
analysis,
where the are treated as unobserved randomvariables,
or,
equivalently,
asmissing
data.Equation (8)
alsoprovides
animportant
link tolog-linear
models. Underi.i.d.
sampling
ofsubjects
with observed responsepatterns
x =(.ri,
z2 , ... , ,xJ)
andmissing
latent variables0,
theintegral
inequation (8) gives
the cellprobabilities
for theunique binary
responsepatterns
x =(x 1,
x2, ... ,xJ).
.Indeed,
sinceequation (4) implies
that - 0 -/3~,
itfollows that these cell
probabilities
areThis
readily simplifies
to alog-linear
model of the formwhere
(for details,
seeTjur [60],
Cressie andHolland, [21]; Fienberg
andMeyer [31],
Holland[39], Darroch, Fienberg,
Glonek and Junker[23]).
Thelog-
linear model in
equation ( 10)
isequivalent
to the fullquasi-symmetry
modelof
equation (3)
for the2J table, subject
to moment-like order constraints onthe
implied by equation (11)
and detailed in Cressie and Holland[21].
Lindsay, Clogg
andGrego [47] explore
in detail connections between Rasch models andlog-linear quasi-symmetry models,
and consequences for theidentifiability
of the distribution in asemiparametric
formulation ofequation (8).
.3.2. The Rasch model and
quasi-symmetry
for theKJ
tableThe Rasch model
generalizes naturally
to multinomiallogit
modelsdepending
onj3
andB,
when there are more than two levels of response per measure and thecategories
are ordered. Inparticular,
suppose eachXij
takes values in the discrete
set {0,1,....
Ii -1 ~ .
Thepolytomous
Raschmodel,
or"partial
credit model"(Masters [51]) specifies
linearadjacent category logits
asor,
equivalently,
in a somewhatoverparametrized form,
where sums with indices from 1 to 0 are defined to be zero. Under this
model,
the response
categories
arestochastically
orderedby
0: > is anon-decreasing
functionof 03B8i
for eachc. Agresti [1]
considers relatedmodels,
e.g., when
~~~
inequation (12)
isreplaced by
two terms additivein j
and k.Again,
if we assume localindependence
andintegrate
out the latentvariables 01
withrespect
to a common distribution we can write the cellprobability
for thepattern
ofpolytomous
responses x =(xl,
x2, ...in the
KJ
table as theintegrated product
multinomialwhere
Using
the definition inequation (13)
we see thatwhere, again,
x+ =~~ 1 x~ .
Asimplification analogous
to thatleading
from
equation (9)
toequation (10) produces
thelog-linear
modelwhere
bjk = -~=1~
and~(5)
=E[es8lx
=0]. Again,
this is aquasi-
symmetry representation
for theKJ table, subject
to moment constraintson
~y (s )
Unlike the2 ~
case,however,
this is not the mostgeneral K J quasi- symmetry
modelexpressed
inequation (3),
since any two interaction terms with the same sum x+ are constrained to beequal
inequation (16).
Forordinal tables with
"equally spaced scores,"
theimplied log-linear
modelhas a
simpler
main effects form(c.f., Agresti’s [3]
"ordinalquasi-symmetry"
model) .
Variants of this
approach
have been outlined for social survey databy Sobel, Hout,
andDuncan, [57],
Thissen andMooney [59],
and Goodman[34].
Alternative forms of the model inequation (15)
andequation (16)
maybe obtained
by changing
the form of the linearlogits
inequation (12) (e.g.,
see
Huguenard, Lerch, Junker,
Patz and Kass[41]).
4.
Semi-exchangeability
variations on the Rasch model Due to the stochasticordering property,
thatP[Xij
>c[0j] z
is nonde-creasing in 8i
for alli, j
and c, the dichotomous andpolytomous
Raschmodels exhibit
strong positive dependence
in the structurethey
induce for2J
andK~
tables. Holland and Rosenbaum[40] explore
thisdependence
in
detail,
and Junker and Ellis[44]
use relatedproperties
to characterize abroad class of item response models.
However,
in manyapplications, ranging
from
multiple-recapture
census enumeration toepidemiology,
biostatistics and evencognitive testing,
thisstrong positive dependence
structure is too restrictive forpractical modeling.
One way to weaken this structure has been to
generalize
thequasi- symmetry model, especially
in the case of dichotomous responsedata,
forexample by adding two-way
interactionsKelderman
[46]
considered these so-called"generalized log-linear
Rasch mod-els" to
develop hierarchically
nested alternatives to the nullhypothesis
thatthe data follow the
log-linear
Rasch model ofequation (10).
Cormack[18]
provided
anindependent,
alternativedevelopment
of these ideas. These ideas have also proven useful inextending
thelog-linear
Rasch model to accommodatedependence
in the table of counts nx that is Rasch-like butmore
general
than the"exchangeable higher
moments" structure of the Rasch model(see
Darroch andMcCloud, [24], Carriquiry
andFienberg [14],
Biggeri
et al.[9]
and Bartolucci and Forcina[7]).
Inparticular,
the termsbjlj2XjlXj2
allow fornegative dependence
between somepairs
of items thatare excluded
by
the basic Rasch model’s assertion ofequal positive
asso-ciations among all the items. It is also worth
noting
that models of the formequation ( 17)
may contain interactions of allorders,
butthey
may beseverely
constrained bothby
the form of the~(’)
terms inequation ( 17)
and
by
momentinequality
constraints of thetype
discussed afterequation
( 11 )
above.To relate
log-linear
models likeequation (17)
to the Rasch model ofequation (4),
webegin by rewriting
the likelihood for the conditional2J
table of counts
nxl8, given 8,
aswhere
a (8 )
=logU.~Jl 2014 Pj(8)], , and for the Rasch model in particular,
Àj(8)
= 8 + bj (bj
= -{3j
in equation (4) ) .
Darroch,
et al.[23]
note thatinterdependencies
in the observed2~
tablemay either be the result of
collapsing
over 8(a
version ofSimpson’s paradox,
see Holland and Rosenbaum
[40]
orKadane, Meyer
andTukey [45])
orthey
may be a consequence of the items
being truly interdependent
even when thedata are
disaggregated
to the person orobject
level.Examples
can be foundin diverse
applications,
e.g., two web searchengines
may draw from the samepool
of web pages(perhaps
because of a commonindexing strategy),
so thata web page may be more
likely
to show up in one,given
than it is in theother, quite apart
from thevisibility
of the page to searchengines
ingeneral.
In
multiple-recapture
census work based on administrativelists,
lists thatby
their naturepenetrate nearly disjoint subpopulations
induce atendency
toward
negative dependence
in themarginal
distribution 7rx(Asher
andFienberg [6]).
To illustrate models for
item-by-item dependencies
that are not artifactsof
aggregating
over8,
we consideradding
toequation (18)
thetwo-way
interactions in the conditional
(fixed 8)
model:where
a(8)
issimply
the usuallog-linear
modelnormalizing
constant(sum
ofmodel terms over all
binary patterns
xl, x2, ... ,rj)’
We assume, asbefore,
that
~~ (8)
= B +bj ,
and now we also assume that(B)
= 8 + Thisleads to the form
(c.f.
Jannarone[42]
andJannarone,
Yu andLaughlin [43]).
Exponentiating, integrating
withrespect
to the distribution of the ran- domcatchability effects, 8,
andtaking
thelogarithm again, Fienberg,
John-son and Junker
[30]
obtained thelog-linear
modela submodel of
equation (17) [subject
to moment constraintsanalogous
toequation (11)].
A different
development generates
terms like~y’ (x.~, x+ )
inequation ( 17)
and
interprets
them in terms ofheterogeneity
acrosssubjects
that cannotbe described
by
thesimple
Rasch model. Let usbegin again
with the basiclikelihood in
equation (18),
and suppose now that 8 ismultidimensional, i.e.,
8 =
(81, e2, ... ~g). Moreover,
suppose that different itemsdepend
on dif-ferent 8’s
through
the Rasch model. Forexample,
suppose that 8 =(81, 92 )
and we can
partition
the items into I items thatdepend only
on81
andJ - I items that
depend only
on82. Then,
afterpermuting
itemindices,
thelikelihood
given 9
becomesIf,
as wouldusually
seemreasonable,
thedensity
of 8 =(91, 82 )
does notfactor,
then a derivation which is similar to thatleading
fromequation (20)
to
equation (21 )
above now leads us towhere = x~, and = x~ .
Equation (23)
is apartial quasi-symmetry
model in which two sets of itemsparticipate
inseparate quasi-symmetry relationships.
This sort of structure wasemployed by
Dar-roch,
et al.[23]
intriple-system enumeration,
to model thediffering visibility (modelled by 81
and(2)
of persons in administrativelists,
in U.S. Censuslists and in a
post-enumeration
survey also conductedby
the U.S. Census Bureau.Clearly,
the same construction can be used to add terms~+ -1~1 )
tothe basic Rasch
quasi-symmetry
model inequation ( 10) .
This isequivalent
to
adding
the term~y ( 1~ 1,1~+ )
to themodel,
since there is a 1-1 correspon- dence between levels of thepair (A-i, k+ - k1)
and levels of thepair (k1, k+ ) (due
to the constraintk+
=k1
+ ... + Of course, we may also additem-by-item
interactions to this model as inequation (19),
in the end ob-taining
the fullgenerality
ofequation (17).
In addition toproviding
linksbetween the Rasch model and
hierarchically
structuredlog-linear models,
this
approach
is useful inrelaxing
thestrong positive
association constraints of the basic Rasch model andexhibiting
somemeaningful
stochastic order-ing properties,
whileretaining
anintepretation
in terms of latent variable models.5. The Grade of
Membership
modelLog-linear models,
as we introduced them in Section2,
focus on de-pendencies
among variables andthey
assume that the units(subjects)
arehomogeneous.
Latent structure modelsoriginated
fromsearching
for aphe-
nomenon that would describe substantive differences among
subjects
in thesample.
Inparticular,
latent class models(Goodman [33])
assume that aheterogeneous population
iscomposed
of a fewsubpopulations
that are ho-mogeneous in their responses with
respect
to theproblem
ofinterest,
andthat, given
the latentclass,
the responses to the items are considered in-dependent (a
form of the "localindependence" assumption).
When thisassumption
isreasonable, by representing
the likelihood as a mixture of la- tentclasses,
we can obtain agood
fit tomulti-way contingency
table datathat would otherwise
require
a number of not so easy toexplain higher-order
interactions in a
log-linear
model.Rasch models involve
parameters
for two sets ofobjects,
e.g. for itemsand
subjects.
Theproperty
of"specific objectivity" (Rost [56])
assures thatthe estimation of
parameters
for one set ofobjects
isindependent
of theother set. Rasch showed that when
only
two responsecategories
are consid-ered,
the dichotomous Rasch model is theonly parametric
model that fulfills therequirement
ofspecific objectivity (see
Andersen and Olsen[5]). By
as-suming
randomsubject
effects for the Raschmodel,
we can think in termsof
disaggregating
the2J contingency
table intoindependence
tablesby
thevalue of
subject parameter.
It is common to assume a unimodaldistribution,
such as the
normal,
forsubject parameters,
and thenintegrate
them out in order to estimate the itemparameters.
Unimodal distributions are consis- tent with theassumption, plausible
in many cases, that apopulation
canbe
represented
on a latent continuum in such a way that a few individuals would have veryhigh/low
values of thesubject parameters
and the rest of thepopulation
would have values concentrated near the average. But suchunimodality
does not holduniversally,
and we often need alternatives to the Rasch models formodeling
purposes.We often observe another kind of
population heterogeneity
in medicalclassification
problems. Examining
whether each of Jsymptoms
ispresent
or absent for
patients,
clinicians canclassify
agood proportion
of a popu- lation as eitherbeing healthy
orhaving
adisease,
but thediagnosis
for therest of the
population
can be less certain. Theuncertainty
in thediagnosis
can be
incorporated
into asubject parameter
where the value of the pa- rameter would indicate how close eachsubject
is tohaving
a disease. This idea gave rise to formulation of the Grade ofMembership (GoM)
modelproposed originally
in the 1970sby
MaxWoodbury
and described in detailby Manton, Woodbury,
andTolley [50].
The GoM modelregards
the prop-erty
ofconvexity
in the responseprobabilities
in apopulation
asworthy
ofmodeling. Intuitively, convexity
is a two-foldphenomenon: first,
extremalindividual cases or extreme
profiles
mustexist,
at leasttheoretically, and, second,
all other individual cases must be convex conlbinations of extremalcases. Extreme
profiles
are definedby
the conditionalprobabilities
of re-sponse for "certain
diagnosis"
cases. These are the itemparameters
as well.Subject
levelparameters
arecomponents
of the K-dimensional vector ofmembership
scores. Eachcomponent represents
how close an individual is to therespective
extremeprofile.
In the GoM model weagain
invoke thelocal
independence assumption: given
themembership
scores, J responsesare considered
independent. Therefore,
the modeldisaggregates
observed2J
table
according
to the vector ofmembership
scores.As
before,
we consider dataarising
in the form of a vector of J dichoto-mous manifest
variables,
X =( X 1, X 2 , ... XJ), and
we let x =( x 1,
x 2 , ... ,xJ)
denote abinary
responsepattern, where
= 0 or 1 is a response toj th
manifest
variable, for j
=1, 2, ... ,
J. We denoteby
G =(Gi, G2 , ... , , GK)
a vector of
membership
scores with distributionH(g)
such that0,
for k =1,2,.... K,
andGk =
l. Denote the conditionalprobability
ofpositive
response for each extremeprofile by
The
probability
ofnegative
response,Xj
=0,
is one minus theprobability
of
positive
response.The
marginal probability
of response for the manifest variableXj , given
the
membership
scores, is a convex combination of theprobabilities
thatcorrespond
to the extremal cases:where g
=(91 ~ g2 ~ ... , 9x )
E(~~ 1 ) x ~
.The GoM local
independence assumption
states that manifest variablesare
conditionally independent, given
the latent variables.Thus,
for the con-ditional
probability
ofobserving
a responsepattern
x isThis
particular
form of conditionalprobability
function results in a pro-perty
of the GoM model thatdistinguishes
it from other continuous latentvariable models.
Thus,
for the Raschmodel, Ramsay [54]
haspointed
outthat the value of the
ability parameter
in the Rasch model isonly indirectly
related to its
position
within the manifold thatcorresponds
to the modelin the
probability
space. Incontrast,
for the GoMmodel,
Erosheva[27]
has shown that the values of the
membership
scoresrepresent
the distancesalong
the model manifold fromcorresponding
extremeprofiles. Thus,
themembership
scores have theproperty
ofbeing
intrinsic to the model mani-fold,
whichprovides
a natural characterization of the latent continuum.Integrating
the conditionalprobability
inequation (26)
withrespect
to the distribution of themembership
scores, we obtain themarginal
distribu-tion of the data for the GoM model
where the
integration
is over(O,l)K.
Following
asuggestion
of Haberman[38],
Erosheva[28]
shows that we can derive the same distributionby treating
the data asarising
from a con-strained latent class model with J latent variables. Let
Zl, Z2, ... , ZJ
bemultinomial
exchangeable
latent variables with thevalues zj
E{1,2,..., , I~~.
Let the
probability
ofobserving
a latent vector z =(zl,
z2, ... , be theexpected
value of the J-foldproduct
ofcorresponding membership
scores:By assuming
that the conditional distribution of thejth
manifest variabledepends only
on thejth component
of the latent vector andsetting
Erosheva
[28]
has shown that theprobability
to observe responsepattern
x under this constrained latent class model coincides with thecorresponding probability
under the GoM model.We can consider the classical notion of item
independence
in the contextof the GoM model. Assume that
items ji and j2
areindependent
ifthey
areindependent
in the2J contingency
tableaggregated
over the distribution of individualparameters.
In this case, the observed2J
table would showindependence
or nearindependence of j 1
andj2 .
From thegeometric
repre- sentation of the GoMmodel,
it can be seen that two items areindependent
only
if the model manifold is astraight
line on the surface ofindependence
in the
subspace
of theparameters
of thecorresponding
2 x 2 table(Ero-
sheva
[27]).
If the distribution of the GoM scores is notdegenerate,
thiscan
happen only
when one of theitems,
sayj1,
has the sameprobability
ofresponse for all extreme
profiles.
Since the items in the GoM model arefully
characterized
by
their extremeprofile probabilities,
thisimplies
that itemji
is in factindependent
from other J - 2 items as well. In otherwords,
itemji
is irrelevant forexplaining population heterogeneity
in the GoM model.This result is similar to the one for IRT models: item responses can be in-
dependent
in theaggregated 2J contingency
tableonly
if eachsubject
inthe
population
has the sameability; otherwise,
apositive
correlation among items is to beexpected.
While the Rasch model can be used to model pos- itive itemdependencies,
the GoM model can beemployed
to model moregeneral
item interactions in2J
table.By using
the latent classrepresentation
of the GoM model describedabove,
we cangive
the GoM model alog-linear
random effects form. Consi- der the(KJ
x2 )-dimensional
table. Let be theexpected
countin the cell
(z, x 2> ... > x J)~
where z = z 2> ... , Because the modelassumes that the items are
conditionally independent, given
the value ofthe latent vector z, the
log-linear
model willonly
have the main item effectsand the interaction effects between each of the manifest variables and the latent vector-valued variable z,
i.e.,
The
log-linear parameters
are related to conditionalprobabilities
in theGoM model as
or,
equivalently,
asBy using
the conditionalprobability
from the GoM model andimposing
the usual zero-sum
constraints,
we can write thelog-linear
main effect and interactionparameters
asThis random effects
log-linear
formulation has anunderlying partial- symmetry
structure that results fromexchangeability
in thecomponents
of the latent vectors: if z and z* are suchthat zj
=z;, then
=ufll i
Thispartial-symmetry structure, however,
does not resemblequasi-symmetry
asproposed originally by
Caussinus.There has
long
been a fascination in the statistical literature about thedevelopment
of additive asopposed
tolog-additive
models for multi- dimensionalcontingency
tables(e.g.,
see Darroch andSpeed [25]).
Intui-tively,
it is clear that additive models on theprobability
scale can not beadditive on the
log probability
scale. Theremight
beexceptions
to theserule,
e.g., ifthe gk parameters
of the GoM model wereindependent,
thenwe can rewrite
equation (30)
as a constantplus
Jterms,
eachdepending
on
Ajk
and theexpected
values of themembership
scores. It is not clearwhether there
might
exist an alternativeparameterization
that would pr o- vide alog-linear
form for the GoM model.Finally,
we note the similarities of the GoM model and itsrepresentations
to the latent
budget
model of van derHeijden, Mooijaart
and de Leeuw[61],
which is also described in terms of extreme
profiles ( "latent budgets" ),
anda simultaneous latent class model
proposed by Clogg
andGoodman [16]
~ 17~ .
6.
Summary
Henri Caussinus introduced the notion of
quasi-symmetry
in the con-text of
two-way contingency
tables in 1965. Thesubsequent generalizations
of this
quasi-symmetry
model haveproved
to beespecially
useful in themodeling
ofmulti-way contingency
tables. Inparticular, they
represent aninteresting point
ofdeparture
for alternatives to standardlog-linear
modelsfor
large
sparsecontingency
tables. Such models offer a way tosimplify
pa-rameter structures while at the same time
allowing
for interactions in the models.In this paper, we have described some alternative models for
large
sparse tables that are based on latentstructures,
and we have described some of theirrelationships
toquasi-symmetry
andlog-linear
structures. These latent variable models scale tohigh
dimensionalsettings
and retain thesimplicity
of
interpretation
associated with the traditional localindependence
assump- tion.Cox and Wermuth