F. L OOSEN
Note on the chi-square statistic of association in 2x2 contingency tables and the correction for continuity
Mathématiques et sciences humaines, tome 61 (1978), p. 29-37
<http://www.numdam.org/item?id=MSH_1978__61__29_0>
© Centre d’analyse et de mathématiques sociales de l’EHESS, 1978, tous droits réservés.
L’accès aux archives de la revue « Mathématiques et sciences humaines » (http://
msh.revues.org/) implique l’accord avec les conditions générales d’utilisation (http://www.
numdam.org/conditions). Toute utilisation commerciale ou impression systématique est consti- tutive d’une infraction pénale. Toute copie ou impression de ce fichier doit conte- nir la présente mention de copyright.
Article numérisé dans le cadre du programme Numérisation de documents anciens mathématiques
http://www.numdam.org/
NOTE ON THE
CHI-SQUARE
STATISTIC OF ASSOCIATION IN 2x2 CONTINGENCY TABLES AND THE CORRECTION FOR CONTINUITYF.
LOOSEN1
Research workers
frequently
do not have to make a decision butmerely
want some idea of how
likely
orunlikely
the observedsample
result is whensome null
hypothesis
is true.Recently
some authorsargued
that in those situations we can better report the smallest value of a for which thehypo-
thesis would have beenrejected (i.e.
the exceedanceprobability
of thesample
result under thisspecific hypothesis),
rather than the sole fact ofwhether or not this
hypothesis
should berejected
at the traditional .05 or _.01 levels of
significance (see
forexample Hays
and Winkler[1]).
Suppose
weapply
this recommendation to thechi-square
statistic ofassociation in 2x2
contingency
tables withe cell countsA,B,C,C
andgrand
total N.
First,
we would have to calculate theX2-statistic by
means of thewell-known formula :
(see Siegel [2]),
andconsequently
we would determine theprobability
thatX2
exceeds the obtained valueby
means of the tabular values of achi-square’
1.
University
ofLeuven, Belgium.
that,
so far as tests ofindependence
areconcerned,
this restriction remains valid in situations where one or bothmargins
are not fixed.However,
isthis restriction still valid when we
merely
want to know the exceedance pro-bability
of asample
result under theassumption
that thehypothesis
of in-dependence
is true ?In field
studies,
forexample, psychologists
do not know until the endof the classification what the
marginal
totals are. Moreover, most of the timethey
do not know in advance theexact,size
of thesample
since thisdepends
on the freecooperation
of thesubjects.
If apsychologist
wants toknow the
relationship between,
forexample,
sex andsmoking,
he will examinea number of
subjects (e.g.
all first year students!)
and hesubsequently
classifies every observation into the doubledichotomy Men/Women
andSmokers/
Non-smokers. It is clear that such a
psychologist
is not so interested inknowing
theprobability
inX2 values XP Vaalues
which are greater than the observedX2-
p value insamples
with the samemarginal totals,
but rather he wants to knowthe
probability
ofX2
p values greater than the observed2-value XP
insamples
obtainedby repeated application
of the wholesampling
process. The same holds for manyexperimental
and differential studies in which the2
statis-tic is
computed. Mostly only
themargins
related to the ’stimulus alterna-tives’ are fixed in
advance,
whereas themargins
related to the’response
alternatives’ are
only
known at the end of the classification. Also the reader in those domains ismainly
interested in theinterpretation
of theresults
against
thebackground
ofrepeated application
of the wholeexperi-
ment since the
frequency
conceptappeals
to the common user’sunderstanding.
In the present note we want to state that if one wishes
merely
to tellhow ’unusual’ the
sample
result is ascompared
with thesampling
distributionunder the
assumption
that thehypothesis
ofindependence
is true, it is bet-ter to use a
chi-square
statistic which does not include a correction forcontinuity
rather than a test statistic which does include such a correction(as
Formula1),
at least if one wishes to relate thechi-square
statistic tothe
sampling
distribution observedby repeated application
of the wholesampling process2.
However,
we are notagainst
a correction forcontinuity
as such. Weare
only against it
when it iscomputed by
Formula 1(or
a similarformula)
in the above mentioned situation where the
marginal
totals are not fixed inadvance of the
sampling.
In that case weobject
because Formula 1 assumesthat the
margins
are fixed.If,
inspite
of that reason Formula 1 is used in a 2x2 table with randommargins,
the result can nolonger
be considereda ’corrected’
2
in the sense that Formula 1 leads to a better estimation of the exact exceedanceprobability
of thechi-square statistic,
than can aformula without a built-in correction for
continuity.
Theexplanation
forthis can
easily
be found in therelationship
between Yates’ correction and Formula 1.The
principle
ofthe
correction forcontinuity
aspresented originally by
Yates[4]
amounts toreading
theX2 table,
not at thepoint
that corres-ponds
to the value of the2 XP actually
to beevaluated,
but at apoint
half-way between this value and the next lowest
possible
value of2
in asample
of
equal
size. It can beproved
that for 2x2contingency
tables with fixedmarginal
totals and with onedegree
offreedom,
therequired midpoint
can befound without
knowing
the next lowestpossible
value of2 :
it suffices to reduce the absolute value of each difference between the observed and expec-2. If one is interested in the answer to the
question
as to whether thehy- pothesis
ofindependance
should or should not berejected,
the situation is different. In that caseonly
a method isrequired
whichhelps
to decidebetween alternative
hypotheses
and it is clear that there are manypossible
ways of
making
this kind of decision. The reader may bereferred,
concer-ning
thisissue,
to Tocher[5],
Lehman[6],
Plackett[7],
Harkness & Katz[8],
Mantel & Greenhouse[9],
and Clark[10].
sive values of the difference
fo-fe
differby unity
if the estimated expec- tedfrequencies
remain constant over the universe ofsamples. Consequently, decreasing
the absolute différencefo-fe by
0.5 as well asusing
Formula 1only
make sense if thecontingency
table has onedegree
of freedom and ifthe
marginal
totals are considered as fixed in the sense that the same mar-ginal
totals must appear in anyrépétition
of the"experiment"
on N new ran-dom elements of the
population.
For very
large samples,
the above restrictionconcerning*the
use ofFormula 1 does not hold because 00, )
2 ’XP
has the samelimiting
distri-bution for fixed and random
marginal
totals. But this is not so in smallsamples
where Yates’ correction istraditionally
recommended.An
example
for illustration of theforegoing
isprovided by
Table 1.For the sake of
simplicity
we demonstrate our argument for the case N =8,
- but the conclusions holds also for
larger samples,
i.e. situations wherepsychologists
usetraditionally
thechi-square
statistic. In the first column arespecified
the different sets of cellfrequencies
which can beregistered
in asample
with size 8 and with randommarginal
totals >1.~
For each observed
contingency
table the2
value was calculated from Formu- la 2 and from Formula1,
these valuesbeing
indicatedby x§
P andX c
c Incolumn
(3)
are listed the different2 XP
values that can beregistered.
Thecontingency
tablesleading
to these2_values
arespecified
’ in the first twocolumns.
So,
there is one table(2222)
withx2 value 0 ;
four tables(1133,
p
3311,
1313 and3131) with X2
=0 ;
and four tables(1222,’2132,
3221 andXP
3.
By requiring
that themarginal
totals1,
32 of the 165possible
tableswith N - 8 were excluded.
5 N O
N W (P,
0 1 e e ...
d A i b N
P
0 a )t r
U il,
"
t0 -B b AiJ
S s
2013’ w 0
~ t!
u o e - a t0 c c t0 3 H M
41 v 0 ’~
u G e 0 M ., a
4j 3 o*
0 w
4j 1-4
M « .,, C Q ’ V M
-5
1 ’"
0
e ?
0 0!
M W 4-4 Cu ~
;zw E3 Co 0 CJ tt
%
through
the traditional Formula 1.When
comparing
columns(3), (4)
and(5),
one notes that differences in uncorrected values are notnecessarily accompanied by
differences in correc-ted values. Also the order of corrected values calculated from Formula 1
correspondsonly partially
with the order of the uncorrected values.Hence,
the results obtainedthrough
Formula 1 do not conform to theprinciple
of thecorrection for
continuity
if thé row and column totals are random variables.In columns
(6), (7)
and(8)
the exceedanceprobabilities
of the valuesfrom columns
(3), (4)
and(5)
are estimated from the tabularX2
distribution.The accuracy of these estimates can be evaluated
by comparing
theseprobabi-
lities with the
respective
exact upper tailprobabilities
calculated from the multinomial rule. These values are listed in columns(9), (10)
and(12).4
Since the
X2
c values in column(5)
are notgiven
inincreasing order,
we havealso
listed,
for the sake ofclarity,
the exactprobabilities
of theseX2
cvalues in column
(11).
Forexample,
the value 0.074 at the top of column(11)
is theprobability
ofobserving
aX2
value of 0.500. The value 0.074c
was obtained
by adding
the exactprobabilities
of all tables for whichX2
= 0.500. In this case, the calculation was made from the tables 2222(x 1)
c
and 1331
(x 2).
The exactprobabilities
for these tables are :4. It was taken into account that 32 of the 165
possible
tables were notcounted.
Therefore,
theprobabilities
obtainedthrough
the multinomial rulewere
always
dividedby 0.984,
i.e. 1 minus the sum of theprobabilities
ofthe 32 non-calculated tables.
and hence
p(X2
c =0.500) =
0.039 + 0.035 = 0.074. From these orderedX2
cvalues,
the exact exceedanceprobability
of the differentX2 values
c caneasily
be calculated.Table 1 shows that the exact exceedance
probabilities
of the uncor-rected
x2
values(i.e.
the values in column(9))
are betterapproximated by
c
the tabular exceedance
probabilities
of the uncorrected2 XP
values(i.e.
thevalues in column
(6))
thanby
the tabular upper tailprobabilities
of theX2
values obtainedthrough
Formula 1 listed in column(8).
Hence it may bec
contended that if the
marginal
totals are randomvariables,
the use of For-mula 1 cannot be considered a device for a more
precise
evaluation of the exceedanceprobability
of an observed(uncorrected) chi-square
statistic.This claim is in agreement with the ideas
proposed by
Pearson[11],
Grizzle[12J,
and Conover[13, 14, 15].
However,
when the correction forcontinuity
isperformed according
tothe
general principle,
the correction stillimproves
the estimation of the exceedanceprobability
of the uncorrected statistic. Table 1 shows that the values in column(7),
i.e. the tabular upper tailprobabilities
of the chi-square values corrected
according-to
thegeneral principle, usually
corres-pond
better with the exact exceedanceprobabilities
of the uncorrected chi- square values(see
column(9))
than do the values from column(6) (i.e.
thetabular exceedance
probabilities
of the uncorrectedvalues) correspond
withthe exact exceedance
probabilities
of the uncorrectedchi-square
values incolumn
(9).
This result shows thatConover’s [13]
statement "that the Yatescontinuity
correction should not be used in 2x2contingency
tables unlessrow and column totals are nonrandom" is rather
misleading. 5
The proper con-5. In the same
journal
issue Conover’s article as attendedby
three comments.(Starmer,
Grizzle and Sen[16] ;
Mantel[17] ;
Miettinen[18])
which areessentially
not a discussion of Conover’s article. Two years later Mantel[19] protested again
from an otherpoint
of viewagainst improper justifica-
tion for use of non-corrected statistics.
babilities. Formula
(1)
agreesonly
with thegeneral principle
of Yates’correction for
continuity
if row and column totals are fixed.~
REFERENCES
[1]
HAYS W.L. & WINKLERR.L., Statistics. Probability, Inference, and Decision,
Vol.I,
NewYork, Holt,
Rinehart &Winston,
1970.[2]
SIEGELS., Nonparametric
Statistics in the BehavioralSciences,
NewYork, McGraw-Hill,
1956.[3]
FISHERR.A.,
StatisticalMethods for
Research Workers.(5th ed.), Edinburgh,
Oliver andBoyd,
1934.[4]
YATESF., "Contingency
tablesinvolving
small numbers and theX2 test", Suppl.
Jr. R. Statist.Soc. ,
1(1934) ,
217-235 .[5]
TOCHERK.D.,
"Extension of theNeyman-Pearson theory
of tests to dis-continous
variates", Biometrika,
37(1950),
130-144.[6]
LEHMANE.L., Testing Statistical Hypotheses,
NewYork, Wiley,
1959.[7]
PLACKETTR.L.,
"Thecontinuity
correction in 2x2tables", Biometrika,
51
(1964),
327-338.[8]
HARKNESS W.L. & KATZL., "Comparison
of the power functions for the test ofindependence
in 2x2contingency tables",
Ann.Math. Statist.,
35(1964),
1115-1127.[9]
MANTEL N. & GREENHOUSES.W.,
"What is thecontinuity
correction?",
Amer.
Statistician,
22(1968),
27-30.[10]
CLARKR.E.,
Criticalvalues
in 2x2tables, Pennsylvania, Pennsylvania
State
University,
1969.[11]
PEARSONE.S.,
"The choice of a statistical test illustrated on theinterpretation
of data classed in a 2x2table", Biometrika,
34(1947),
139-167.
[12]
GRIZZLEJ.E., "Continuity
correction in theX2-test
for 2x2tables",
Amer.
Statistician,
21(1967),
28-32.[13]
CONOVERW.J., Practical Nonparametric Statistics,
NewYork, Wiley,
1971.[14]
CONOVERW.J.,
"Some reasons for notusing
the Yatescontinuity
correctionon 2x2
Contingency Tables",
Jr. Amer. Stat.Ass.,
69(1974),
374-376.[15]
CONOVERW.J. , "Rejoinder",
Jr. Amer. Stat.Ass.,
69(1974) , 382.
[16]
STARMERC.F.,
GRIZZLE J . E . & SEN P.K.,"Comnent" ,
Jr. Amer. Stat.Ass.,
69
(1974),
376-378.[17]
MANTELN.,
"Comment andsuggestion",
Jr. Amer. Stat.Ass.,
69(1974),
378-380.
[18]
MIETTINENO.S., "Comment",
Jr. Amer. Stat.Ass.,
69(1974),
380-382.[19]
MANTELN., "The continuity correction",
Amer.Statistician,
30(1976),
103-104.