A NNALES DE LA FACULTÉ DES SCIENCES DE T OULOUSE
W ICHER P. B ERGSMA T AMÁS R UDAS
Modeling conditional and marginal association in contingency tables
Annales de la faculté des sciences de Toulouse 6
esérie, tome 11, n
o4 (2002), p. 455-468
<http://www.numdam.org/item?id=AFST_2002_6_11_4_455_0>
© Université Paul Sabatier, 2002, tous droits réservés.
L’accès aux archives de la revue « Annales de la faculté des sciences de Toulouse » (http://picard.ups-tlse.fr/~annales/) implique l’accord avec les conditions générales d’utilisation (http://www.numdam.org/conditions).
Toute utilisation commerciale ou impression systématique est constitu- tive d’une infraction pénale. Toute copie ou impression de ce fichier doit contenir la présente mention de copyright.
Article numérisé dans le cadre du programme Numérisation de documents anciens mathématiques
http://www.numdam.org/
- 455 -
Modeling conditional and marginal association in contingency tables(*)
WICHER P. BERGSMA
(1), TAMÁS
RUDAS (2)Annales de la Faculte des Sciences de Toulouse Vol. XI, n° 4, 2002 pp. 455-468
RESUME. - Les outils classiques de l’analyse de la structure des associ- ations conditionnelles
(moyennes)
de la distribution d’un tableau de con-tingence multiple sont les modeles log-lineaires. Un concept d’association different est celui d’association marginale, et cet article montre com-
ment les parametres marginaux log-lineaires peuvent etre utilises pour
mesurer cet aspect de F association. Cet article discute de facon non tech- nique de ces deux aspects du concept d’association en etudiant leur na- ture complementaire ; il decrit aussi comment le concept d’association conditionnelle est naturellement incorpore dans le cadre fourni par les parametres log-lineaires marginaux. Les proprietes et Finterpretation de
ces parametres sont discutees, en incluant la variation independante des parametres de modeles log-lineaires marginaux hierarchiquement relies.
Enfin, les consequences de ces resultats sur la modelisation sont indiquees.
ABSTRACT. - Standard tools for the analysis of the
(average)
conditional association structure of the distribution on a multiway contingency tableare log-linear models. A different association concept is that of marginal association and this paper demonstrates how marginal log-linear parame- ters can be used to measure this aspect of association. The paper gives a non-technical discussion of these two aspects of association by discussing
their complementary nature and also describes how conditional associ- ation is naturally incorporated in the framework provided by marginal log-linear parameters. The properties and interpretation of these param- eters are discussed, including the variation independence of hierarchically related marginal log-linear parameters, and the modeling implications of
these results are indicated.
~ * ~ Recu le 18 septembre 2001, accepte le 18 septembre 2002
( 1) > Department of Methodology, Faculty of Social and Behavioral Sciences, Tilburg University, 5000 LE Tilburg, The Netherlands. E-mail:[email protected]
(2) Department of Statistics, School of Sociology and Social Policy, Eotvos Lorand
University, Hungary. E-mail: [email protected]
1. Introduction
The first aim of this paper is to
give
an intuitivedescription
of vari-ous
aspects
of association incontingency
tables and of some consequences forparameter
and model definition. Ourapproach
will be introducedby sharply distinguishing
between conditional andmarginal aspects
of asso- ciation.However,
it will be shown that what istraditionally
consideredas conditional
association,
can beincorporated
in ageneralized
frameworkof
marginal
associationby
the introduction ofmarginal log-linear
parame- ters. It ishoped
thatby
thisunderstanding
of therelationship
betweenconditional and
marginal association,
newinsight
can begained
into theassociation structure of multidimensional
contingency
tables.The second aim of this paper, which in
part overlaps
with the first one, is toprovide
an accessibleexposition
of some of the results ofBergsma
andRudas
(2002).
This includes variationindependence
and smoothness condi- tions formarginal log-linear parameters. Furthermore,
asimple
notation formarginal log-linear parameters
is introduced which makes theirapplication
easier.
Section 2 of the paper
briefly
reviews the basic facts and ideas regar-ding
the association structure of a multidimensionalcontingency
table. Theconditional and
marginal aspects
of association are discussed with reference tolog-linear analysis.
Section 3 considerslog-linear parameters
that are standard tools to measure conditional association and introducesmarginal log-linear parameters
that are able torepresent
a moregeneral aspect
of association.Marginal log-linear parameters
aresimply log-linear parameters computed
frommarginals
of the table.Section 4 discusses
parameterizations
of thejoint
distributionusing marginal log-linear parameters.
This is alarge
class of flexibleparameteriza- tions,
with thelog-linear parameterization being
asimple special
case. Condi-tions for such desirable
properties
as theparameterization being
smoothand its
components being variationally independent
are alsogiven.
Theseconditions are formulated in terms of
simple
combinatorialproperties
ofthe subsets of variables involved. Section 5 considers statistical models ob- tained
by imposing
affine restrictions onmarginal log-linear parameters.
The conditions that assure existence and standard
asymptotic
behavior of such models are the same combinatorialproperties.
The paper contains almost no
proofs
and most of the detailedarguments
thatsupport
the claims made here can be found inBergsma
and Rudas(2002).
2. Conditional and
marginal
associationIn real-life
applications
ofstatistics,
the relevantproblems
are almostall multivariate. In such
situations,
it is not so much theseparate
behavior of the variables observed but rather the association among them which is ofprimary
interest.Association,
of course, can have many different forms andsubject
matterknowledge
can often be used topostulate
aparticular
association structure.
A definition of association among a group of
variables,
which is notrelated to any
specific type
ofassociation,
is obtainedby considering
thedifference of information contained in the
joint
distribution and that of the lower ordermarginal
distributions.Here, again,
noparticular
techni-cal
meaning
of information isused,
rather it is said that if all lower ordermarginal
distributions areknown,
the additional information needed to re-construct the
joint
distribution is the association among the variables. The additional information should be sufficient to reconstruct thejoint
distri-bution,
but it also should be necessary(non-redundant)
in the sense thatassociation is
only
information not contained in the lower ordermarginal
distributions. This latter
requirement
is best characterizedby
theconcept
of variationindependence,
thatis,
thejoint
range of themarginals
and ofthe measure of association should be the Cartesian
product
of theseparate
ranges.
As an
example,
consider a 2 x 2contingency
table.Here,
the lower ordermarginal
distributions arerepresented by
themarginal probabilities
and7r-(-i. There are several
expressions
of the cellprobabilities
that carryenough
information to reconstruct the
joint
distribution. Forexample 03C011/ (03C01+03C0+1)
is
intuitively appealing
and is sometimes used as a measure of thestrength
of association. This
quantity, however,
is notvariationally independent
fromthe one way
marginals,
thatis,
its range is effectedby
the actualmarginals.
Therefore,
it lacks calibration and itsvalues,
other than1,
may be diflicult to compare across different tables. It isonly
the odds ratioand its one-to-one functions that are both sufficient and necessary in the above sense to reconstruct the
joint
distribution. Thatis,
every measure of association(if
defined as information not contained in themarginals)
isa one-to-one function of the odds ratio. For a detailed
exposition
of thisargument
see Edwards(1963)
or Rudas(1998).
The
log-linear
association term is a one-to-one function of the odds ratio and is therefore anappropriate
measure of association. In the multivariate case, theargument
abovegeneralizes,
inparallel
to thetheory
oflog-linear representation (Bishop, Fienberg
andHolland, 1975),
in a hierarchical way(see Rudas, 1998).
A crucial
aspect
inunderstanding
andmodeling
the association structure of amultiway table,
is the way ofdefining
subsets of variables of which thestrength
of association is measured. From a technicalpoint
ofview,
thereare two ways of
deriving
lower dimensional subsets from a set of variables:conditioning
andmarginalization.
Inconditioning,
some of the variables arefixed at certain
categories,
and thestrength
of association is measured for theremaining
variables. Theparameter
values obtained willdepend
on theactual
categories
of the fixed variables and refer to association in a subset of thepopulation. Log-linear parameters
can be used to measure the"average"
conditional association over the
categories
of the fixed variable.Analysis
ofthis conditional association structure can therefore be done
by
means oflog-linear analysis (Bishop, Fienberg
andHolland, 1975,
pages33-34).
Forexample,
thelog-linear
model of no-three-variable interaction is the model of constant conditional association between any two variablesgiven
the third.Marginalization,
on the otherhand,
considers subsets of variables with- outpaying
attention to theremaining variables,
no selection is involved and the association for a group of variables refers to the entirepopulation.
Thismarginal approach
tomeasuring
andmodeling
association cannot beimple-
mented in standard
log-linear analysis
and it is the aim of thepresent
paper to illustrate how thetheory developed
inBergsma
and Rudas(2002)
can beused to
analyze
the association structure of atable, including marginal
as-sociations,
but also conditional associations and certain mixtures of these.The
approach presented
here also contains thelog-linear approach,
as asimple special
case.This
general methodology
is based on the introduction of a very flexible class ofparameters
of association that will be discussed in the next section.3. Parameters of association
For
simplicity, log-linear
andmarginal log-linear parameters
will be in- troduced here for an I x J x Kcontingency
tableABC,
but the definitions extend in a natural way tohigher
dimensional tables.The
decomposition
oflog
cellprobabilities log
as a sum oflog-
linear
parameters
is as follows(see Bishop, Fienberg
andHolland,
1975 orAgresti, 1990):
In our
notation,
thesuperscript
of alog-linear parameter
identifies the variables and thesubscript
shows to which variables theparameter
refers to(the
ones notreplaced by
anasterisk)
and these variables arerepresented by
their relevant indices. Forexample, À1ff is usually
denoted as .
The
log-linear parameters
are notyet
identified and cannot therefore beinterpreted. Many
identification methodsexist,
but a common one is the so-called effectcoding,
obtainedby setting
the sum over anysubscript
tozero that
is,
where a
"+"
in thesubscript
means summation over thecorresponding
index. Another
popular
method is the so-calleddummy coding,
where iden-tification is obtained
by setting
certainlog-linear parameters
to zero.It is well known
(Bishop, Fienberg
andHolland, 1975,
pages33-34)
thatthe
log-linear parameters
measure the averagestrength
of conditional asso-ciation. For
example,
in the effectcoding scheme,
if Band C arebinary,
is
equal
to a constant times the average of the values of the odds ratios of BandC,
conditioned on the different values of A.More
generally (in
the abovenotation),
alog-linear parameter represents
thestrength
of association between the non-asteriskedvariables,
conditionedon and then
averaged
over thecategories
of the asterisked variables. If thehigher
orderparameters
are(close to)
zero then the valuesaveraged
do notdiffer
(too much)
and thelog-linear parameters
can beinterpreted
aspartial
associations
(Hagenaars, 1990).
Noticehowever,
that theassumption
thatfor a
multiway contingency
table thehigher
order interactionparameters
are zero is a very
strong
one thatessentially
removes one of the most im-portant
differences between the otherwise unrestricted distribution on thecontingency
table and a multivariate normaldistribution,
in the sense thata basic
property
of the latter is the lack ofhigher
than first order interac- tions. Such anassumption
isusually
called alog-linear
model(Haberman,
1974; Rudas, 1998).
The two-dimensional
marginal
cellprobabilities
are defined asand the one-dimensional
marginal
cellprobabilities
asAnalogously
to thejoint probabilities,
the two-dimensionalmarginal
onescan be
decomposed
asand the one-dimensional ones as
The
log-linear parameters
used in the aboverepresentations
arecomputed
from a
marginal
of theoriginal
table and aretherefore,
calledmarginal log-
linear
parameters.
Themarginal
to which aparameter pertains
is indicatedin the
superscript.
Theparameters
have aninterpretation
similar to the classicallog-linear parameters
discussed above.They
measure thestrength
of the average conditional association among a certain group of variables
(the effect) ,
with some of the other variables omitted and some others fixed(i.e.
conditionedon).
To define amarginal log-linear parameter,
one hasto choose a subset of the variables
(the marginal -
and the variables not selected areomitted)
and within thismarginal
another subset(the effect)
and the variables in the
marginal
but not in the effect are conditioned upon.Such a
marginal log-linear parameter
takes on different valuesdepending
onthe actual
categories
of the variables in the effects and the termparameter
refers to all these values(see Bergsma
andRudas, 2002,
fordetails).
The definition of
marginal log-linear parameters
opens up thepossibility
of
defining
alarge
number ofparameters, depending
on the choice of themarginal
and of the effect and the rest of this section in concerned with dis-cussing
astrategy
ofdefining marginal log-linear parameterizations.
Certainproperties
of theparameterizations
and of the statistical models obtainedby restricting
the ranges of theseparameters
will beinvestigated
later on.It is
always
the substantialproblem
at hand that determines which groups of variables(marginals)
of the table are of interest. These must be orderedhierarchically,
thatis,
in such a way that a latermarginal
is notcontained in an earlier one. For
example,
for a table ABC one may be in-terested in the three
marginals AB, BC,
and ABC. There are twopossible
hierarchical
orderings:
If the
ordering
of themarginals
has beenestablished,
thefollowing
in-ductive scheme can be used to construct a set of
parameters:
1. Calculate the
log-linear parameters
of all the effects(i.e., subsets)
inthe first
marginal
2. For k =
2,
... , n, calculate thelog-linear parameters
for those effects of the kthmarginal
that have not been usedbefore,
where n is the number of
marginals
involved.To illustrate for the
ordering (AB, BC, ABC),
the firststep
is to calcu- late thelog-linear parameters
in table AB from theprobabilities ~r‘~.~8 :
Next,
for the BCmarginal
the effects that have not been used before areincluded, i.e.,
the C and BC effects:Finally,
for the ABCmarginal,
theonly
effects that have not been includedyet
are the AC and ABC effects. HenceThus, combining
the sets obtained in(2), (3),
and(4),
theparameters
gen- eratedby
the sequence(AB, BC, ABC)
areNote that in
(5)
all subsets areincluded,
aseffects,
in the set ofgenerated parameters.
It is easy toverify
that thisonly happens
if the whole table is included in the sequence ofmarginals.
Such a sequence is calledcomplete.
Because of
hierarchy,
the whole table appears at the end of the sequence. No- tice that if there are several hierarchicalorderings
of themarginals possible,
the one selected will determine which subsets appear as effects within the
marginals. Marginal log-linear parameters generated by
acomplete
sequence form aparameterization
of the distribution on thecontingency
table.Two
specific
sets ofparameters generated by complete
hierarchical se-quences of
marginals
have receivedspecial
attention in the literature. The first is the set ofordinary log-linear parameters
that isgenerated by
thewhole table as the
only marginal
involved: alllog-linear
effectspertain
to thefull table. The second is what is called
by
Glonek andMcCullagh (1995)
themultivariate
logistic transform,
and isgenerated by
a hierarchical sequence of all the subsets of the variables asmarginals.
For
three-way
tablesABC,
theordinary log-linear
and multivariate lo-gistic parameterization
aregenerated by
respectively, yielding
thecomplete
hierarchical sets ofparameters
respectively.
Theordinary log-linear parameters
have all the variables in thesuperscript (i.e.,
thesuperscript
ismaximal),
and the multivariatelogistic parameters
have a minimalsuperscript.
The latter contain no asterisks in thesubscript.
In this sense, theordinary log-linear
and multivariatelogistic parameterizations
form theend-points
of all hierarchicalmarginal log-linear parameterizations.
A
huge
number of sequences ofmarginals
ispossible
even for a moderatenumber of variables. For one
variable,
sayA,
there are twopossible complete
sequences. The sequences and the
parameters they generate
are:Note that these are the
log-linear
and univariatelogistic parameters,
respec-tively.
For twovariables,
say A andB,
the ninepossible complete
sequences and theparameters they generate
are:The first and last sets form the
log-linear
and bivariatelogistic parameters, respectively.
Note that the seventh andeighth
sequences are the sameexcept
for the A and Bmarginals
that areinterchanged, yielding
two different setsof
parameters.
In the last sequence A and B can beinterchanged
but thiswould
yield
the same set ofparameters.
4.
Marginal log-linear parameterizations
A mixed
parameterization
of a distribution on acontingency
table con-sists of certain
marginal probabilities
and all thehigher
orderlog-linear
effects within the table
(see Rudas, 1998).
Forexample,
for the ABCtable,
a mixed
parameterization
may consist of themarginal probabilities ~~r‘~.~B ~
and and the
higher
orderlog-linear
effects~~AB,~ ~
andThis is a
large
and flexibleclass,
with thelog-linear parameterization being
a
simple special
case. As iswell-known,
if themarginal probabilities
andlog-linear parameters
in a mixedparameterization
haveprescribed values,
the iterative
proportional fitting (IPF) algorithm
can be used to reconstruct thejoint probability
distribution.In
exponential family terminology,
themarginal probabilities
are calledmean value
parameters,
and thelog-linear parameters
are called canonicalparameters.
Barndorff-Nielsen(1978) proved
severalimportant properties
of mixed
parameterizations
in terms of mean value and canonical param- eters.Firstly, they
are obtained from the distributions via a one-to-one transformation that satisfies certaindifferentiability
conditions. Such a pa- rameterization is called smooth.Secondly,
the mean value and canonicalparameters
are variationindependent.
This meansthat, provided
both themean value and the canonical
parameters
arecompatible
withinthemselves,
then
they
canalways
be combined to form ajoint
distribution. More for-mally,
two(possibly
vectorvalued) components
of aparameterization
arevariation
independent,
if theirjoint
range is the Cartesianproduct
of theirseparate
ranges.The absence of variation
independence
can lead toproblems
in estima-tion,
and may lead to difficulties in theinterpretation
ofparameters (see
the
example
in Section2).
Appropriately
selectedmarginal log-linear parameters
constitute a pa- rameterization of thejoint
distribution(see
Section2).
This is ageneraliza-
tion of the result
concerning
mixedparameterizations.
Consider the
marginal log-linear parameters generated by
acomplete
hi-erarchical sequence
(M1,
... , whereMk
contains all the variables. For1 ~ .7 i k,
let consist of thosevariables,
if any, inMy
which also be-long
toM2,
and let be the set ofmarginal log-linear parameters belong- ing
toMi. Then,
and themarginal probabilities
over ... ,form a mixed
parameterization
of the distribution overMi,
because~(i)
contains classical
log-linear parameters
in themarginal Mi (Bergsma
andRudas, 2002).
This fact allows thefollowing
recursive scheme to reconstruct the distribution overM~
to be established:1. Calculate the
probability
distribution overM1 directly
from the(mar- ginal) log-linear parameters.
2. For i =
2, ... , k, calculate, using IPF,
theprobability
distributionover
Mi
from and the distributions over ... ,Note that it is assumed
that,
for eachi,
the distributions over11~h2),
... , ,are
compatible.
Conditions for this to be the case will be discussed later on. To illustrate the reconstructionprocedure,
consider theparameters generated by
themarginals (AB, BC, ABC) :
To reconstruct the distribution over
ABC,
we first obtain the AB distribu- tionby
direct calculation:Hence,
we are left with the reduced setNow
7rf
=7rtf
so the mixedparameterization
of is included in
(8).
From(9),
can be reconstructedusing
IPF.
Replacement
in(8) yields:
Now this is a mixed
parameterization
of that canagain
be foundusing
IPF.Hence,
thecomplete
distribution over ABC can be reconstructedby applying
IPF to a sequence of mixedparameterizations.
As
long
as theoriginal
set ofparameters
iscompatible,
the constructioncan
always
beperformed.
As shownby
Barndorff-Nielsen( 1978),
eachstep
is a one-to-one and differentiable transformation. The
following
theoremfollows
directly:
THEOREM 1. - The set
of parameters generated by
a hierarchical com-plete
sequenceof marginals of
acontingency
tableT, excluding
the nulleffect, forms
a smoothparameterization of
the distributions over T. .Note that the null effect is redundant because the
probabilities
must sumto one.
It is
important,
that theprevious
theorem starts withparameters
de- rived from anexisting
distribution. If one starts the reconstruction process witharbitrarily
selectedparameter values,
the reconstruction process cansometimes fail. Consider the
parameterization generated by
themarginals (AB, BC, AC, ABC) :
:The first two
steps
are the same as above andyield
Now
~~,
forms a mixedparameterization
of that can bereconstructed
using
IPF.So,
one obtains thefollowing:
However,
now there may occur aproblem.
TheAB, BC,
and ACmarginals
have been
constructed,
butthey
may not becompatible.
Forexample,
there is no
three-way distribution,
that would have thefollowing two-way marginals.
Notice that this is an
example
of lack of variationindependence.
Thevalues
given belong
to therespective
ranges of theparameters
butthey
donot
belong
to thejoint
range. The well-known reason that a set ofmarginals
may have
prescribed
values that areincompatible,
is thatthey
do not forma so called
decomposable
set. A set ofmarginals
isdecomposable
if thereis an
ordering
that satisfies the so-calledrunning
intersectionproperty
( seeHaberman, 1974).
This means that for anymarginal
in theordering,
allthose variables which have
appeared
in any of themarginals
beforeit,
alsoappear in a
single marginal
before it. Forexample,
for the set ofmarginals
~AB, CD, AC~
theordering (AB, CD, AC)
does notsatisfy
therunning
intersection
property,
since the variables from AC which haveappeared
before it are A and
C,
butthey
do not both appear in either AB or CD.However,
theordering (AB, AC, CD)
doessatisfy
therunning
intersectionproperty,
hence the set{AB, AC, CD~
isdecomposable.
On the otherhand,
the elements of the set
{AB, BC, AC~
do not have anordering satisfying
the
running
intersectionproperty,
so~AB, BC, AC~
isnon-decomposable.
The
decomposability concept
can be used togive explicit
conditionsfor
marginal log-linear parameters generated by
a sequence ofmarginals
to be variation
independent.
Inparticular,
all themarginals
that are con-structed at any
given step
in the reconstruction process have to form adecomposable
set(Kellerer, 1964).
Thatis,
forall k
n, the maximal elements of the first kmarginals
in the sequence must bedecomposable.
Such sequences are called ordered
decomposable.
Forexample,
theordering (AB BC, CD ABCD)
is ordereddecomposable,
since~AB~, (AB , BC) ,
~ AB, BC, CD ~,
and(ABCD)
are alldecomposable, respectively.
The or-dering (AB, BC, AC, ABC), however,
is not ordereddecomposable,
sincethe maximal elements of the first three
marginals
are{AJ5, BC, AC~,
thatis not a
decomposable
set. Thefollowing
theorem followsimmediately
fromthe construction process.
THEOREM 2. - The
marginal log-linear parameters generated by
a hi-erarchical sequence
of marginals
are variationindependent if
andonly if
thesequence is ordered
decomposable.
It follows that the
ordinary log-linear parameters (excluding
the redun-dant
ones)
are variationindependent.
This is easy to see, sincegiven
anyprespecified values,
a distribution canimmediately
be foundby appropri-
ate additions such as in
(1).
On the otherhand,
the multivariatelogistic
transform is not variation
independent
if there are more than two variables.However,
for the three variable case,replacing Àff
in(7) by yields
the set
This set is
generated by
the sequence(0, A, B, C, AB, BC, ABC),
that is or-dered
decomposable. Hence,
theparameterization
is variationindependent.
5.
Restricting marginal log-linear parameters
A wide range of
interesting
statistical models are obtainedby impos- ing
affine restrictions onmarginal log-linear parameters.
Two fundamentalquestions
that will be dealt with in this section are,firstly,
when those re-strictions are
feasible, and, secondly,
how totest, using
arandomly
drawnsample,
thehypothesis
that the restrictions hold true for apopulation.
An
example
of infeasible restrictions wasgiven above,
whereprescribed AB, BC,
and ACmarginals
of a table ABC turned out to beincompatible.
The restrictions can be obtained
by prescribing,
forexample,
themarginal log-linear parameters generated by
the sequence(AB, BC, AC) .
. Since thethree
marginals
are not ordereddecomposable,
thegenerated parameters
are not variation
independent by
Theorem 2.Hence,
it ispossible
that re-strictions on them are infeasible. Note that no
ordering
of thesemarginals
makes them ordered
decomposable.
Ingeneral,
if restrictions areplaced
onparameters generated by
an ordereddecomposable
sequence ofmarginals,
those restrictions will be feasible.
Furthermore,
as shownby Bergsma
andRudas
(2002),
linear restrictions onparameters generated by
any sequence ofmarginals
arealways
feasible because the uniform distribution satis- fies them.It may be noted that the
question
offeasibility
is alsoimportant
in a relatedfield, namely
that ofconditionally specified
distributions(Arnold,
B.
C.,
E. Castillo and J-M.Sarabia, 1999). However,
themarginal log-linear parameters
discussed here are most suitable forspecifying
average condi-tional
parameters
or forspecifying constancy
of conditionaldistributions,
rather than for the
complete specification
of the conditional distributions.Therefore the
feasibility problems arising
in the two fields are of a differ-ent
nature,
and some of thecomplex
issuesarising
ingeneral conditionally specified
distributions do not occur here.If it has been determined that a
particular
model isfeasible,
it may be testedby drawing
asample
from thepopulation
andassessing goodness-of-
fit of the model. The most
widely
usedsampling
schemes are multinomial and Poisson. It follows from Theorem 1 that such models form a so-called curvedexponential family,
to which standardasymptotic theory
isapplica-
ble
(Lauritzen, 1996).
A detailed discussion of thistopic
isgiven by Bergsma
and Rudas
(2002).
.6.
Acknowledgments
Bergsma’s
research wassupported by
The NetherlandsOrganization
forScientific Research
(NWO), Project
Number 400-20-001. Rudas’s researchwas
supported
inpart by
Grant T-016032 from theHungarian
NationalScience Foundation.
Bibliography
[1]
AGRESTI(A.).
- Categorical Data Analysis, Wiley, New York, 1990.[2]
ARNOLD (B.C.),
CASTILLO (E.), SARABIA (J.M.). -
Conditional Specification ofStatistical Models, Springer-Verlag, New-York, 1999.
- 468 -
[3]
BARNDORFF-NIELSEN(O. E.). -
Information and Exponential Families in Statis-tical Theory, Wiley, New York, 1978.
[4]
BERGSMA(W. P.). -
Marginal Models for Categorical Data, PhD thesis, Tilburg University, Tilburg, 1997.[5]
BERGSMA(W. P.),
RUDAS(T.).
- Marginal models for categorical data, The Annals of Statistics, 30(2002),
140-159.[6]
BISHOP(Y.
M.M.),
FIENBERG(S. E.),
HOLLAND(P. W.).
- Discrete MultivariateAnalysis: Theory and Practice, MIT Press, Cambridge, MA, 1975.
[7]
COLOMBI(R.).
- A multivariate logit model with marginal canonical association, Communications in Statistics: Theory and Method, 27(1998),
2953-2972.[8]
COLOMBI(R.),
FORCINA(A.).
- Marginal regression models for the analysis of positive association of ordinal response variables, Biometrika, 88(2001),
1007-1019.[9]
EDWARDS(A.
W.F.).
- The measure of association in a 2 x 2 table, Journal of the Royal Statistical Society, Series A, 126(1963),
109-114.[10]
GLONEK(G.
J.N.),
MCCULLAGH(P.). -
Multivariate logistic models, Journal ofthe Royal Statistical Society, Series B, 57
(1995),
533-546.[11]
HABERMAN(S. J.).
- The Analysis of Frequency Data, University of Chicago Press, Chicago, 1974.[12]
HABERMAN(S. J.).
- Tests for independence in two-way contingency tables basedon canonical correlation and on linear-by-linear interaction, The Annals of Statis- tics, 9
(1981),
1178-1186.[13]
HAGENAARS(J. A.).
- Categorical Longitudinal Data, Sage, Newbury Park, 1990.[14]
KELLERER(H. G.).
- Verteilungsfunktionen mit gegebenen Marginalverteilungen, Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 3(1964),
247-270.
[15]
LAURITZEN(S. L.).
- Graphical Models, Oxford University Press, Oxford, 1996.[16]
RUDAS(T.).
- Odds Ratios in the Analysis of Contingency Tables, Sage, Thousand Oaks, 1998.[17]
RUDAS(T.),
BERGSMA(W. P.).
- On generalized symmetry, in The Quasi Symme- try Project,http://www.upstlse.fr/PROJET_QS/PAPERS/Rudas2/Rudas2.html,
2001.