A Categorical D ata Analysis of Health Practices , H ealth Status , and Hospital Utilization in
M et r op o lit an St . John's
By
@BarbaraM. Veitch
Apracticumreportsubmitt ed to the SchoolofGraduateStudiesin partial fulfillment
of therequirements forthedegree of Masterof AppliedStatistics
DepartmentofMathematicsandStatistics Faculty ofScience Memorial Universityof Newfoundland
February,1991
St.John's Newfoundland Canada
Abst r act
Questionnaireswere administeredto ad ultsfroma sampleofhouseholdsinthe Metropolitan St.John'sareato gather dat a.ontheirlifcstyl~,hcalth habitsuud utilizationofmedicalcare services.
Healthpracti ces,asdescribed in the socialmedicalliterat ure(eatingbreakfast, smoking, drinking, sleeping, correctnessofweight,andexercising),arcexplored./I.
varietyofstatist icalmeasures of associationarc used to gaugethestrengthofthe relationshipsbetween these variables andone'shealth status.
The relationshipsbetweensleepinghabits and one'shealt h isexaminedusiug logistic regression. Thi.sanalyticaltechnique is againemployedto studytileeffecl of alcoholicconsumptiononhealthand tofu rt her explore itseffect onceeducational level iscontrolledfor.
Fromindividualhealth practic es,aweighted healthpracticeindexis developed.
Using loglinear analysiswe buildmodels50astoexaminetheassociationbetween this score and hospitalutilizatio n,controllingCorsex,age and educat ion.
Acknowledgements
Funding for theresearchwas provided by theNationalHealthResearch and Development Program,Departmentof Health and Welfare, Canada.
The authorgratefullyacknowledgestheassistanc eof Dr.RoyBartlett,Asso- ciateProfessor, Depa rtmentofMathematics andStatis tics, who inhis roleas su- pervisor, provided guidanceandadvicewith infinite patiencethroughoutthisstudy.
Thanks arealso exten dedto Dr.Jorge Segovia,the principalmedical researcher on this resear chteam andtothe Division of Community Medicinefor providing the opportunitytoworkon thisproject.Finally, Iwishto expressmy gratit udeto my family fortheir support andencouragement.
iii
Preface
Theauthorjoinedthisstudy after the proposalfor the surveywas accepted.Shc wasemployed as a research assistantfor thedurat ion ofthe studyand was rcspollsihle for assisting in the pre-testingandrevision of thequestionnaireandthe Iliringand training of Interviewers.Asa memberofthe field officeandresearch team she was involvedwithediting questionnaires,qualitycontroland data cleaning inaddition to the othertasks requiredof thisteam. As well,she wasresponsiblefor theinitial analysisof data.
Duringthecourse oftheauthor's involvement withthe study, two prcscutatlou papers were writt enby theresearch team and presented by nile of theprincipal investigato rs,Dr.Jorge Segovia-one tothe American PublicHealth Association in Juneof 1986 and theother to the CanadianPublicHealth Associationin September ofthe same year .The Federal GovernmentReport wassubmitt ed inJanuary,1987.
Contents
Abst ract
Acknowledgem ents
Preface Tableof Content s Lis t ofFigures 1SurveyDesign and Sampli ng
1.1 Introduction• . . . ... 1.2 Sampling..
iii iv
viii
1.2.1 1.2.2
Population. ... .. ... .. Frame•••• 1.2.3 SamplingProcedure ..
1.2.4 SampleSize .
1.2.5 SelectionofSamplingUnits 2 The SurveyExecution
2.1 Pre-testing.
2.2 Types ofHouseholds .. . . •• • . . • ...
11 11 12
2.3 Training: The Interviewer's Manual.. 13
2.4 The CommencementofInterv ie wing 17
2.4.1 Interviewers- Keeping Tabs IJ
2.5 Data Entry, Processing,Checkingand Cleaning 19
2.6 Refusalsand Non-respondents•,•• . ... 22
2.7 Linkage
.. .. . .
242.8 SummarySuggestions for Future Telephone Surveys. 26 3 TheAnalysis ofData
3.1 ContingencyTableAnalysis
2.
:10 3.1.1 TestsofInd epende nce in Two-Way Tables 31 3.1.2 Partitioning X2Test St atisticsin Two-wayTables.. :lG
3.1.3 Measures ot Association III
3.2 DesignEffects..
3.2.1 The Design Effectin a2x2HealthTable.
73 73 3.2.2 UsingDesign Effects to Correct forX' in a 2x7lIealth Table 77 3.3 Logistic Regressionfor HealthStatusand TwoHealthPractices 86
3.3.1 Sleep andHealth Status 86
3.3.2 Drinkingand Health Status 93
3.3,3 Drinking andHealth Status, Cont rolling forEducation 97 3.4 HospitalUtilization:A LoglincarAnalysis
3.4,1 The HealthPract ice Score,, .
..,•.,,•., 103 103 3.4.2 Modelling for Hospitaliza tions and Crouped Healt hPractices, lOG
3.1.3 Examination ofthe Row Effects Model 108
3.4.4 Fittinga.morecomplex model . 111
3.4.5 Residualsfor the HierarchicalModel[AE,PE,PSA,IIS A] 116
3.4.6 Summary
4 Conclusions
.. ....•. . .• . . . •.... . . . .....118
124
References
Append ix:Questi onnaire
vii
127
135
List of Figures
3.1 Log Oddsversus NumberofHours Sleep(LinearModel) 8' 3.2 Log Oddsversus Numberof HoursSleep(QuadraticModel) 90 3.3 StandardizedResidualsPlot(Sleep) .
.. .. . . . ... .
913.4 ProbabilityPlot(SIC<!p) 92
3.5 Log Oddsversus NumberofDrinksper Week
. . . . . . . . . . .
953.6 ProbabilityPlot (Drinking)
. .
go3.7 StandardizedResidualsPlot(Drinking, controllingforEducation) 98 3.8 Probability Plot (Drinking, controllingfor Education) . . 99 3.9 LogOdds versusNumber ofDrinks perWeek, controlling for EducationlOa 3.10Adj usted Residualsversus ExpectedNormalValues,!AE,PE,PSA,IISAjI 16 3.11ExpectedCounts versus AdjustedResidua]s,[AE,?E,PSA,IISAI . 117
Chapter 1
Survey Design and Sampling
1.1 Introduction
This was ast udyusing a. telephonesurvey,of lifestyles,heahh practices, and medical care utilization. Itwas designed in partto consider health indicatorsand how these indicators arc related to healthstatus and medical careutilizati on.A unique feature ofthisstudy was thelinkage of thesurvey data withdala pertainingto hospital discharges and physicians'services.
Whilethere is some explanation of the executionof the survey from the stand- point of the field officework,in thisreport wewillconcentrateon some ofthe statis- tical issues -from the samplingprocedu retothe examination ofdesign effects and theanalysisof data.Data. are studiedprimarily wit hassociationmeasures,logistic regressionand loglincaranalysis.
1.2 Sampling
One or the firstthingsto addressonceit was decided what was wantedfrom the study,was the sampledesign.
1.2.1 Population
Thepopulationto whichthesurvey result sapply consistsofallpeople20 years ofage orolderinSt. John 's,Newfoundland. The sample was selected and thequcst lonuairca wereadministered inthe spring/s ummerof1985.As will be outlined,there were restrictions placed on thepopulationdefinitionduetothesampling frame andthe accessibility ofsomeofthe would-beresponde.its.Giventhatthe limitat ionswere not verysevere,weneed not be exceedingly cautious ingeneralizingto thepopul ation initially defined.
1.2.2 Frame
Essentiallythe frame for thisst udy wasonesectionofthe Newfoundland amiLal.rador TelephoneDirectorypublished in March of 1985, immediat elypriortotheselect ion ofthesamp lingun its. Thesection of interestinthe directory coveredtheSt.John's area.Itshould be noted that theframe,as such, exceedsthe frame of inte rest, For example,this section containeda small number oftelephone numbersoutside of the Met ropolit anSt.John' sarea.Becauseofthis, the definition of Metr opolitan St.John'swas limitedtothoseresidences having St .John's exchangesas listedat thefront of the directory.This in itself presentedsome difficulties since incertain areassome residenceshad theseexchangeswhileothersdid not. ItWiISquestion- able whether or not suchareasshouldbelong tothe framebutitwas feltthatthe definitionwas easiesttoapply if held consistentfor allSt. John'sexchanges.Any deviation s wereconsidered minimalandseldomoccurred.
Usingboundaries,thearea of interest could havebeen defined,~e06raphically.
This was ruledtobe tootime-consumi ng as many street addresses would haveto be manuallycheckedfortheregion towhichtheybelonged.In definingthearea hy
exchangecodesthen, such neighboringplaces as Bay Bulls,Pett yHarbour, and Outer Cove, for example, weresometimes included-'sometimes'as inthese placessome homes had a St .John's exchange while othersd:lnot.Thiswas notregarded to be a seriousproblem as thesehouseholdswerenotconsidered tobeverydifferent Cram those ofSt ,John's,per se. Also,residentsofthese areas, beinggeographicallyvery ncar St.John's,havcthe samemedical facilit iesavailable to them. Such households appearedinthe sample relatively infrequently.
As wellasthe above, theframe as defined exceededtheframeofinterest interms or privatehouseholdsin that itincluded telephonenumbers such as those belonging tobusinesses and instit ut ions.
Therewere two groupsofpeoplewhowereperhaps....der-reprcecntcdor ex- cluded altogethe r.Elde rlypeopleare likely to beunder-represented since old age homes were excluded. Thisshouldbe qualified.Old age homes which house the elderly in self-contained apartments wit hprivat etelephonenumbersand whichpro- vide minimalnursing carewere included.Theelderly areprobablyfurth er under- representedin thatmany of the non-respondentsinthestudywerenon-respondents because theywere hard-or-hearing and since thesurveywas atelephone surveyit would beespecially difficult to interviewsucha person.Itis perhapsa fairassump- tion that themajorityofsuch personswereelderly. Thosewho werenotwell enough to answerthe questionnai rewerealso excluded,Alt houghour non-responseratewas reasonablylow, thesepeopleshouldbekeptinmindtogetherwiththose who,at the timeof the survey, resided inan'excludedinstitut ion'suchasa hospit al.
A groupof people whowereignored altogetherwerethose withnotelephonesor tnoscwithunlistedtelephone numbers. According to theNewfoundlandTelephone Company,telephonecoverage in St.John'sis approximate ly 99%of which about4%
are unlistedtelephonenumbers. Given that thenumberof unlisted telephonesis small, their exclusion from the survey is unlikelytobias theoverall results.However, thesepersonswould havehadthe same chance ofbeing selected as thosewilh listed numbers had randomdigit dialingbeenused. Sinceno automa ticrandom digit dialing equipmentWallavailable,theprocedurewould have hadto havebeen carried out manually. Thenumber of'useless'numbers generatedby aCOI••puterprogramcould besubstant iallyred ucedprovidingthatonlySt. John' s extensions werepermitt ed for thefirstthreedigits,but non-existentnumberswouldbegenerated nonetheless.
Also, manybusiness numberswouldbe generated and wouldonlybe discardedonce tl.enumber was calledand theplaceWallidentifiedas such.Samplingmethodsfor randomdigit dialingwhich reduce the numberof uselesstelephone:"\Imber~in llle samplehave been proposed by WaksbergandMitofsky (1978),for example. IIIalater paper, Potthoff(1987jgeneralizestheirtechnique.Althoughconsidered.random digit dialing was not implementedas at thistime no bank of numbersfrom which telephone numbers could be generatedwas availablefor releasefromthetelephone company.
We were comfortablewilh the assurancefrom thetelephonecompanyof almost total listed telephonecoverageinthe surveyarea.
Otherexcluded numbers,andhen ce possible would-berespondents,werethose associatedwit hprisons,hotels,and institutionssuch as hospitals.
Ideally our framewouldhave consistedofan enu merated list of allSt.John's exchange telephones-preferaUyincludingunlisted numbersandexcludingthose not of interest suchas governmentdepartments and businesses. Unfortunatelythe telephone company couldnotrelease such a list.
1.2.3 Sampling Pr o ced ure
St ratification ofthe populationon asuitab le variable may have been valuable.In some casestelephone exchanges areidentical withsomecharaderisticsofthepopu- lationandsuchexchange numbersmaybe used to stratifythe popula tion.Butin thepresent survey,stratificat ionbytelephoneexchange numbersisnot relatedto any studyvariableofinterest.Our procedurewastotake a random sampleofhouseholds asdeterminedby telephonenumberselection.
Themedical researcherwas interestedin collectinginformation on alladult members ofa given household.A simplerandomsamp lewas takenon householdsin theSt.John's areaandonce ahousehold was selected, allpersons inthai household aged20 yearsofage orolderwere approachedto be interviewed.Thatis, we tooka single-stageclustersamplewith clusters of unequal size (see Cochran 1977).
One samplingmethod which was consideredwas systematicsampling.Since names were to be selected from the physical telephone directo ry,this wouldhave facilitated the task ofactuallyselectingmembersfor thesur vey.That is, only one ra ndom numberwouldbegeneratedfromwhich pointeverykthhousehold wouldbe selected.In thisway systematic samplingtendsto distrib utethe sampleoverthe populationframemore evenly. Inthesame way, thiscan alsoresult in periodicity.
The listedpopulation,however,was thetelephone director y and it islikely that alphabetization alonedoesnot group personsin anyfashion; they are still randomly listedIrom thepoint of vieworthe study.
Thenatureof thevari ablesinthis study is suchthatmostofthe data are categorical. Theintention was thatcontingencytableswould be examinedby means ofmeasuresorassociationandlogistic andloglinear analyses. Giventhis,itwas desirableto have as closetoa simple random sampleaspossible.Hence, alt hough
a systematicscheme of samplingwould havemadethe job ofselectionsomewhat easier , the needfor a simplerandomsample from the analvsisviewpointout...eighed thisfactor. Thesamplin·.procedure, however,wu not a simplerando msample of survey members but rather a simple random sampleof clustefll of unequal sizes,the averagecluster size per samplin!unitorhouseholdbein!approximatelytwo people aged 20years of ageorolder.What this meant interms of violationofundcrlyinA assumpt ionsof theanalytical techniques shanbe discussedin ala te r section on design effects.
1.2.4 Sample Size
To make a reasonabledeterminationof sample size, itis advantageous t.o have some idea of whichst.atisticalmet hods will be employedin the analysisofdata. The intention ofthisst.udywas tomodel andtest for associationbetween variables in two-way and multi -waytables (Segoviaetal. 1987).Acommonlyused testinthis type ofanalysis isthe X'test.Based on this type oftest one can determinethe sample sizerequired provided one fixesthe desiredsiAnificance level,0"(the probahility of mist akenlyrejectin! the null hypothesis,Ho),the powerof the test,1-fJ(wherefJis the probability of mistakenlyacecptin!: H. ), and the 'effect'size (an indexof degree ofdepart urefromH. ).
Ifourhypothesisofinde pendence betweenvariablesis correct,we wouldexpect certain frequencies ofoccurr ence, or proport ionsthereof,ineachcellofthecon tin- gencytable. Specifically,the proportionofoccurr enceineachcellwouldbe;!i,where misthenumber ofcells.Iftheexperimenta lorobser vedcontingencytableexhib its thesameproportion s asthat whichwould beexpected underthe nullhypothcela, then we wouldnot reject the hypothesisof independence or noassociation. The stre ngt h of a."sodationbetweenvariablesis reflectedinthedegreeofdeparture from
the expectedproportion. Cohen(1977)usesW to indexthe sizeofsuch departu res, oreffect.size.
- t
where n is the sample size and,\isthe noncentrality parameter ofthe noncentral x2distributhn.Cohen provides tables of samplesizes requiredfortheanalysis of contingency~ableswhen a,1 -p,andWare fixed.
Itwas knownat the outset ofthe study that severalcontingencytableswouldbe analyzedand thesewere consideredwhenchoosingan appropriatesampleelse. One such tablecross-classified the frequency of doctors' visitB (broken downintothree catcgo,;cs) with threelevelsofhealth practices.Todeter mine the samplesize,0was setat.05,thepowerat .80,and the effect sizeW,at.30- a 'medium ' valuesuggested by Cohenfor contingencytable analysisusingX2tests.With these fixed,fora3x3 table with4 degreesoffreedom,Cohen'stables givethe samplesize of n =133.
IntheAlameda County Study (BeUocand Breslow 1972,Belloc1973, Breslow and Enstrom 1980) respondent! were askedto reporton a total of six health habits.
The outcome wu that J2.4% practised0-3 ofthe health habits,52.3%practised 4-5, and 35.3% practised 6-7.TheMedical CuePlan(Hep)files from St.John's were examinedtoascertain the marpnal distributionofdoctors'consultations.Ofthe patientswhohad&visittoa doctor,63.7%had 1·5visits,20.4%ha.d 6-10,and 15.9%
had~1I.It seemsreesonebletouse thesemarginal distributionsas approximations forthe distributionsofthese var iablesinourstudy.Sothen,this aprioriinformation wasused to calculate theexpecte d valuesinthe cells of the contingencytable.These calculated proportions suggested that a'large' effed mayhave sufficed.With0=.05 and1-,11=.80 asbefore , together withCohen's suggestedvalueofW=.50 for a'large' effect, thesamplesizewu n=48.
It was d...cedthat this contingencytablebecontrolled for by sex and age.Let us think of our two-way contingencytable, doctors'consultationsxllCaltllpracticc.~,1Ul being one layer ofthefour-waytablescx x agc x doctors' consultmtionsxhcll.l tl,prllc- tices.The 1981census gave the marginal distributionforgender in St. John's as males 45% and fema les55%.For age the marginaldistributionWiLlI56.2%,27.6%, and 16.1%for 20-44 years, 45-64 years,and~65years,respectively.Allelsebeing equal,withthe addition of$CXandage, the cellwiththe smallestexpectedpropor- tion of occurrencefordoctors' consultation sxheaIthpractices wouldbethat for males
~65years of age.Itseemsreasonable therefore,to make the followingcalculationLo obtain a minimumrequiredsamplesize.
n= 133/ (.45x.161)= 1836 if W ==.30 or n
=
48/ (.45x .161)=
663 if W=
.50Giventhat we anticipa teda responserate of 90% and a 90%linkageratewith medical ut ilization files, thesample sizewas inflatedto
n"'"1836 / (.90x.90)
=
2267 if W=
.30or n= 663/ (.90x .90)= 819 if W
=
.50The followingwas alsoconsideredwhen determiningthe sample size. Theo- retically,in theanalysisofcont ingencytables using X2tests, the samplesize should besufficientlylarge so as to avoid cellfrequencies thatarc toosm all. Thereis no definitivevalue, however,for's mall'.Fisher(1970)is oneof many whorecommend a minimum expectedcell frequencyof 5. The liter atur eon thesub jecttends tousc thisvalueas anacceptable rule of thumb(Hays1981,Freeman 1987,Kraemer anti Thiemann1987)although Hays,for one, suggests a minimum valueof 10in 2x2 tables.It is alsoput forward - particularlyirthe number of degreesof freedomis large-thatprovidedno more than 1 outof5 cells has a frequency ofless than 5, a minimumexpectedfrequency of 1 is perrnisaablein these cells (seeHays,forex-
amp le).Camilli and Hopkins' (1978)empirical studies found that evenwith small expected cellfrequenciesin 2x2 tables, Pearson'sX2test isveryrobust.
Considering the four-waytable,sex x agexdoct ors' consultati ons x health prac- tices,recall the marginaldistribution ofeachof thesecategoricalvariables. The minimumproportionfor each was asfollows: sex, 45%(males); age,16.1%(~65 years);doctors' consultations,15.9%(~llvisits); healthhabits,12.4% (0·3habib ).
Hence,fora minimumexpectedvalueofI,thesamplesizerequiredis n = 1f( ,45 x.16I x .l 59x .124)
= 700.
To ensure a minimum expectedfrequencyof5 would requireasamplesizeof
n
=
5x700=
3500.Recalling our anticipatedresponserate of90% and linkagerate of90%,we needa ea m pleaiae cl
n ~ 3500/ (.9Ox.90)
=
4321.Withallthe abovecalculat ionsinmindand consideringrestr ictionsontime,cost andmanpower,a samplesizeof3000waschosen.
1.2.5 Select ion ofSamp li ng Un its
As wewere touse a simplerandom sample to selectoursampling units,a FORTRAN programwasused to genera.tea seriesofrandomnumbers.Thenumbers generated wer e associatedwith a givenlineof the directory inthesection ofinter est.Itwas quiteaccuratelyestim atedthat appro ximately 50%ofthe numberswould correspond to non-residentialnumbers;therefore,the quantity ofnumbersgeneratedwastwice that whichwould be requiredforthe sample.This program didthe following:
•generated A randomsample ofgivensizefrom72,333(thenumberoflines in the St.John's sectionofthe directory).
•printed thesenumbers~etherwiththeircorresponding page, column,lind column position inthedirectory.
•randomly a.ssigned An equal quantityofthesenumbersto agivennumberof interviewers.
Weoriginally intendedon asample size of3000 individuals.Using the Statistics Canada figureof2.3adults 20 years of age andolder perhousehold inSt.John's atthattime,this translated into1304households.1bobta inthis we had tosoloct 1304/ .5=2608lines. Priortothe commencementort hesurvey execution,this Ilumhcr was increased whenreconsiderationofourassumed response rate of 90%let! usto decidethat a rate of 80% would perhaps be more realistic.Therefore the numberor linestobe-elect edbytheprogramwasrecalculatedtobe2608(.9)/ .8=2934.
Chapter 2
The Survey Execution
2.1 Pre-test ing
Atthis pointthe questionnairewas ready for pre-testing.Several people wereselected at random from the telephone directoryand thequestionnai re was administered over the telephonetothem.Thiswasdone to ensure thatall questionswerephrasedin awaylhalwas clearlyunderstoodandnot ambiguous. Aswell,itwas importa ntto check thatthelayout ofthequestionnairewaslogicaland easyfor theinterviewers toCallow.H was abonecessary to checkthelenglll of timerequiredto administer tilequestion na ire.Additionalquestions pertaini ngto salaryand MCP numbe rwere includedafterthe pre-testi ng.
Thepre-testingsuggestedsome minoralterationsto thephrasing of some of the quest ions, anda coupleofsectionshadthe actuallayout ofthe questionsaltered tomakeit easier(or the interviewers to followthequest ionsequence.Anadditional sect ionwas placedattheend ofthe questionnair e.This sectioncontainedtheinfer- mation(romthehousehold sheetand the informationregardingthetotal number or refusalsand non-respondent sinagivenhousehold whichwas to befilledout after theinterviewwas completed,
11
2.2 Types of Households
Altho ughthe pre-test ingwu only doneonindividuals,thesurveywas tobe carried out onalladultmembersortheselected. households.Consequentlya household.11C.'Ct wasrequired.Uponfirstcontactwitha household,theinter viewer would berequired to letalistofthose peopleinthe uniteligibletobeinterviewed and eachperson', relat ionshipwiththe'head' of thehousehold. Itwas notpossibleinthissurvey tohave a householdsheetthat coulddearlyca tegorizeeachtype ofdwellingwhich couldbeencou ntered.A form wasconstr uctedwhichwouldcalegorizehouseholdsAS accura telyaspossiblewithoutbeingso complicatedthatitwould causeconfusionor inconsistency on thepartoftheinterviewers.
Inthe end we allowedforthreetypesofhouseholds-afamily household , a household ofunrelatedpeople,anda single adulLhousehold.Eventhen,o( course, noteveryhousehold could be expectedtoconform exactlyto onc ofthese settypes.
Ina.'family'household,for instance,people livingwith inthe dwellingbutunre la ted to householdmemberswerenotconsideredASparto(theunit. In acasewhere two unrela tedfam ilies wereresiding together ,theramilywhose namewaslistedin the telephonedirectoryWAStakento be theselected'b.mily' household.WhereAmArr ied couplewaslivingwithparents /in-laws andthenamelist edin thetelephonedircc:tory wasthatof theyounger marriedcouple, thenthatcou ple const it uted the'husband' and'wife'and the pa rentiof this couple were enteredassuch.Ina householdof liveresidents ,ifonlytwoweresiblingsthenthathouseholdwouldberecordedas an'u nrelated'householdAndthesiblingswould notberecordedasrelated.Onthe otherhand,if(ourof thesepeople were siblings,theywouldbecomemembersofa.
'family'householdand theIHth person would notbeconsideredas amember o(tllat dwclling.
Whenit was unclearas to which categorya.household belonged,interv iewers were instr ucted to contact the supervisionofficewhereadecision would be made and recorded so thatin the event thatsimilarhouse holds cameintothe survey, they wouldbeclassifiedconsistently.
2.3 Train ing: The Interviewer's Manual
Once the question naire was finalized thenext step wasto write an Intervie wer's Manual.Itwas comprisedof informationonthefollowing:
I.Inter viewing Skills:
This briefly stressedthe imparlance of the roleof theinterviewerina survey.
2. Ethic sof Interviewing:
The duty of the inte rviewertobediscretean densureconfidentialitywas em- phas ized.In format ion obtained fromrespondentswasto bedisclosed to no one wit hthe exceptionof supervisors.
The import ance ofinitialing and maintaininga comfortablebut professional interactionwas discussed.Interviewer s werenotto expressapprovalordisap- provalofa subject's response,nor werethey to giveleadingprobes suchas
"Youdo..don'tyou?n 3."Do'gnand"Don'tsnofInterviewing:
Prima rily,thissummarizedinpointformtha twhichhad alreadybeenmen- tioned.Italsomentioned thatinterviewerswerenottointerview friends,ac- quai ntanceorrelat ives and highlighted some ofthe things to be keptinmind wheneditingcompletedquestionnaires.
13
4.Fjeld Wqr kproce d Ure:! ;
Th is includ ed informat iononthenum be rof households whichwouldbeallotted weeklyto each interviewerand thenumberof individual inte rviewsthis wouldbe expected t.oyield.Italso informedtheinterviewersthatweeklymeetings would beheld to assess progress,sortout prob lems, delivercompletedquestion naires and collect new assignment.s.Aspart. of sta ndard practi ce, spot checkswould bemadeby superv isorswith therespondentsofcomp leted questionnai res.
Once households were assigned, the procedu reswereout lined formaking contact and return ing comp leted questionnaires.Inorderto make an initialcontact witha house hold,interviewerswereto make upto seven attempts011different days and t.imes oftheday-fivecallswit.hi nthe first 1I10nthand twoill tile nextmont h.In t.he case of a refusaloran entirehousehold or an individual withinahousehold , a letterrequest.ingparticipa tio nwas tobe sent from the fieldoffice and theinterviewerwas tocall backfour workingdayslat.er.At t.he endofeach week, questionnai resfrom completedhouseholds,togeth er wit h hou seholdsheets andinterviewer record forms,wouldbe turn edintothefield office. Int erviewers wereinst ructedto callthe fieldoffice wheneverthe yhad aqueryor problem so that suchque rieswouldbe handledimmedia tely and con sistentl y.
5. Comp let ingthe HouseholdSheet:
One householdshee t. wastobe comple tedforcadI household.The threetypes of househol d classifications-family,unrelated , andsingleadu lt-were defined.
Not every householdwould fallneatly inloone of thesecategoriesand lnsrruc- tionsweregivenas towhat.to dointh isevent ua lity. In addition,an explanatio n was givenregarding hew to assignanidentification number toeach household
member.
6.Question Instructions:
This section addressed eachofthe69questionsinthequestionn aire.Itclari- fiedquestio ns,explain ingIor example, that 'ani malfats'include food suchas dairycream, table or'real'butt er,whole milk,fattymeatandgravy.Itga ve instruct ions on bowto record answersand howto useprobes.
7.TheFjr st Con tactwit hthe hou seh Qld:
This gavetheinitialsta tementto be useduponfirstcont act withahousehol d.
8.TheInformedCooaco tStatement;
Given here was the informedconsentst a t ementwhichwasto beread to each individu al beforecommencing theinterv iew.
9.~ :
Included amongthe instructionsregardin g edit ing,inte rvie wersweredirected to:
•ed ittheirquestio nnairesassoon as practicable following the complet ion of the interview,ensuringthat every approp riatequestionwas answered .
•t.ra nslcr allinformation tothecodingblocksonthe questionnaires,using '9' , '99'andso onifthequestionwas inapplicableor ifthesu bjectdid not knowhowtoanswer or refusedto answeraquestion.
10.Question~the InterviewerMight beAsked :
Alistofseveralquestions that arespondentmight ask,togetherwit h suggested respon ses,was included.If,forexample ,a su bjectexpressedconcernabout theco n fidentiality ofthestudy,the intervie wercouldrespondby sayingthat
15
everyth ingshe istold isconfidential,is seen onlybythestaff, andthat no personis everidentifiedinany reports.
This guidewas considered tobe anl-nportant docume nt to useintiletrainingof interviewersandfor their reference throughoutthecourseofthefield work.
Seven fem aleint erviewers, two of whomhad previousinterviewing oxpcrlencc in survey-typestudies,were init ially hired. Shor t lyafterselecting these interviewers , a one-weektraining period was scheduledimmediatelypriorto thecommencementof the field work.During thisweek, theInterviewer 's Manuel was coveredmethodically to ensure tha teveryo ne understood theskills,ethics andso forth, involved in survey interviewing. Each itemin the questionnairewas discussed.
Interviewers then pract isedadministeringthequestionnaire on each other and edited andcor rectedeach other's work. Queri es were encouragedand discussed.
Each interviewerwas given alist ofhouseholdswhichshe was expected to contact over the cou rseof twoorthreeevenings. These werefor practiceonly and not included in the analysis.These householdshad also been selected at randomfrom the telephone direct ory but werenot takenfromthelist of households to be usedin thehousehold survey.Thatis, theywere selected indepe ndently ofthe surveysample [alt houghcheckedto ensure therewas nooverlap ).Wit h these 'practice'household s, questionnaires and householdsheetswere tobecompletedand editing was to bedon e immediately upon finishing each interview.Eachdayinterviews completedduring the previousevening were discussedamongthe group.
2.4 The Commencement ofInt e rview ing
Once the trainingperiod was over,the survey started in earnest. Interviewers were instruc tedto completeallclose to 40 questionnairesper week as possible.They were not to go beyond this quota since there wereonlytwo fieldsupervisors who, among their otherduties,wereresponsible foredit ingquestionnaires after delivery to the field office.In addition,there wa.s an upperlimi t on the numberofquestionnaires whichwouldbe entered onto the computersystem each weekat Newfoundland and Labrador ComputerServices.H would be bestiftheinterviewing were carried out al thesame rate asthe editingand data entry so t.hatwhen errors occurred or clarificationwas required onILgiven questionnaire,thisfact wouldhe uncovered as close to the timeof the interview as possible.Thisreason wastwofold. First,ifthe interviewerherself couldanswerthe query,shewould he much more likelytobeable to do so short lyafterthe interviewthan aftera periodof a weekor more.Second,i{a follow-upcalltothe respondent were required,it should be doneas soon as possible.
2.4.1 Interviewers-Keepin g Tabs
Originallyit was intendedthat a certainpercentageof the interviews would take placeatthe field officeunderdirect supervision,but unfortunatelyit didnotturn out to be viable. Physical spacelimitat ion was suchthatthe onlyroom available to usinwhich on-campustelephone interviews couldbe conveniently made, was only largeenough to accommodate one interviewer withone supervisor.Although it was a disadvantagethatinterviews couldnot takeplaceunderdirectsupervision, it was hopedthat other supervisory methods wouldsuffice.
Inadditio nto keepingtrack of the team's work throughmeetingsandcareful editing.some 'running tabulations' were kept. Each week and {or each interviewer
17
the number ofhouseholds was recorded,together with the number of people in each householdless than 20 years of age, thenumberat least 20 years ofage,and the number of refusals,non-respondentsand respondents.Allthis informationwas cb- tained from the householdsheets , From individual questionnairesseveralvarlablcs were recorded.Withthese fewvariables,some comparisonscould bc madebetween interviewersand with census information. We will discusslater a problemuncovered by these runni ngtabulations.
Interviewerswere compared for number of refusals,non-respondents,rcspcu- dentsand number ofcompleted questionnaires.Smalldiscrepancies in the number completed perweek were bothexpectedand accepted.Concern overdifferences in the numberof questionnaires completedwas notas great as lhatover differencesin ratios of refusalsand/or non-respondentswiththe total number of possible rcspcn- dentsfromthe households.For the most part,such ratios did not exhibitstatistical differences betweeninterviewersalthoughsome interviewersgenerallyappearedto elicitmore responses thanothers.
As well as comparing interviewersregardingthe above,theresearchteam was interestedin the response rateitself since,of course,theprojectedresponse rateinflu- encedthe sample size. Also,regardless ofthe number of responses, it was obviously a concernthatthe refusalratebe as low as possibleso as toreducepossible bias in theresults.
Theinterviewers' distributionson variables such as sex,maritalstatus,height , and number of peopleperhouseholdat least 20 years of age,werecompared.Any consistentand significantdifferences between inte rviewerswouldwarrantcloser in- spection.Ifa. given interviewerdeviatedconaist-nt lyfrom her co-workers,it might suggest that the questio nnairewas notbeing administered inthe wayit wasintended
or that short -cut swere being taken. Althoughthequestionsrecordedwere perhap s notthe best touncover if an interviewerwere taking short -cuts,theystillserved their intendedpurpose tosome degree.In lar gepart , the reason forthe choice of these questionsamong allthe possiblequestionswas simplytha tcensusdata, whilegener- allynot readilyavailable onmost variables, wereavailable onthese.This alsoallowed theresear ch teamtocheckthatthe datafromthesample selected wasin keepingwith censusdat aon thesevariables specifically and,therefore,hopefully onothervariables in general.In particular ,the number of peopleperdwellingwho were?:20years old wasof interest;the census figure of 2.3adultsper householdwasusedincalculating the number ofhouseholds to select.Adeviationfromthis couldgreatlyinfluencethe samplesizesince it was householdsand notindividuals whichwereselected from the directory.Our averagewasslightlylessthan this andtocompensat e forthereducti on inthe numberofpossible respondents that this caused,we generated severalmore randomnumbers. Itwasassumedthattheslightdiscrepan cy only indicated a minor changein the popula tion sincethecensusof 1985 ora slightly differentdefinition of a householdfor oursurveythanthatused bythecensus.Hence,increasingthe numberof householdstosamp le wouldnot biasour results.The variable sex wasof interest sincetheratioof malestofemaleswas anothe r fact or inourchoice of sample size.Knowing thesex wasalsoimportantinthat otherstati sti cs (suchasmarital stat usand height)wereavailablein the censusbroken down by gender.As well, the variable heightwasnotusefulunless thesexof therespon dentwasknown.
2.5 D at a Entry, Pro ce ssing, C he cking and Clean- ing
Newfoundland and Labrad or Comput ingServices (NLCS)wasapproached inthe early daysofthestudywhen the proposal Cor the projectitselfwasbeingdrafted.
19
Their serviceswere employedfordataentryontheundersta nd ing that theywould receiveapproximately 200 questionnairesperweek. Although werequested40qucs- tionnaires perweek from eachofourseven interviewers,we werecorrectinouras- sumptionthatwe would not exceed thisnumberona weeklybasis,
Thecoding area of the questionnaire was designed in consultationwiththeir sta ffso asto maximize facility of data entry andhencereducethe nUT. bcr ofdata entryerrors.In addition,theywereto entertheinformation twice and flag any non- matchingentries.A programalsocheckedfor alimitednumber of'out-of-bounds' datapoints.
Each weekwhen questionnaireswere broughtto NLCS, thepreviousweek'swork wascollected and returned to thefield office, togetherwith any tapes onto which the datahadbeen transferred.The tapes werethen copied onto theuniversity's computer system.Once there,programs wereruntotestwhether the measurementsonthe aforementioned 'running tabulation'variableswere sta tistica llythesameamongthe interviewers.These testsbroughttolightthe ratherdisturbing factthatdat afrom oneoftheinterviewerswereconsistently andstatisticallydifferingfromtheothers, Thispromptedthe fieldofficestaffto make callbacksto a sampleofrespondents for each interviewer.Respondentswere informedthat thiswas a standar d random check toensurethatthe inter viewshadbeen conductedproperly by theinterviewersand theywere requested toanswer againaselection ofthequestionsitquicklybecame evident that inthecase ofsixofthe seven interviewers the questions were being answeredby respondentstothefieldofficestaffastheyhadbeentothe interviewers.
Foroneof the interviewers,however,this was not so.orcourse,onemightexpect and accep t slightdiscrepancies betweenthefirst andsecond interview,especially if morethan aweek had passed, but such discrepancieswere much merepronouncedin thecaseoftheone interviewer.Unfort unatelythis inter viewerhadtobe dismissed.
Theapproximately 500questionnaireswhich she had completedwereredis- tributed among the otherinterviewersand readministered. Astatement waspre- paredfor the interviewers toread to theserespondents explaining that it had been discovered that thequestionnairehad perhaps not been carriedoutcorrect ly inthe first instance and requesting that theyrepeat it. These werecompletedagainwith surprising results; rather thanrefusing torepeat the interview or beingaggravated by therequest , the majority oftheserespondents were very obliging.In fact,many seemed pleasedthatthe researchteam was being careful regardingtheir data; others wererelieved, stating that they had not beenimpressed at the way the questionnaire had beenadministered in the first instance. The responserate was verygood.Inret- rospect, thefact that the problemonly became evident aftersevera lweeks makes it moreclear that every effort shouldbe made in thefuture to have atleast apercentage of the interviewsadministeredunder direct supervision.
Althoughthe response rate from these questionnaireswas very good,it was importantto check thatthey werenot differentfromthe other completed question- naires. Severalvariableswere testedfor statisticaldifferencebetweentherepeated questionnairesand the others.When no significant differencessurfaced, the research team was satisfiedto pool thedata from these questionnaires withthe data fromthe others.
As thedata becameavailabletotheresearchteam, thedata cleaningcontinued.
Errors to be checked included those uncovered throughthe program whichflagged errors duringdata entry,'coding' errorssuch as a 3 beingcoded where therecould onlybe a 1 or a 2, and'logical' errorssuchas a person who reported having never smoked lat er stating that he smokeda.package ofcigarettes eachday. Suspected outliers were also checked.The questionnairesfromwhichtheerrors surfacedwere examined.Ifthe valueson the questionnaireandin the data filecorresponded but
21
wereimpossibleorext remelyunlikely, acallbackwas someti mes ill order;ot herwise, inthe event of an impossibleanswer, the value was recededas'missing'.
2.6 Refusals a nd N on-resp onden t s
Itwasanticipat edthat some householdmemberswouldrequest additional lnforrna- tion aboutthe studybeforeagreeingtoparticipateorwould requiremoreinformation pertainingto therequest fortheir MCPnumber.Hencetwoletters weredrawnup- oneexplaining thenatureof thestudyandthe otherjustifying the request ofMer numbers. Bothletter srepeated thepromiseof confident iality. Forwould-berespon- dents who refused toanswer the questionnaire,twoadditionalletterswereprepared _.one forcomplete household refusals and oneforindividual refusals.
When anyofthese situation sarose,interviewerswereinstructedto contact the fieldoffice immediately.Fromthere theappropriate letter would be mailed. After severaldaysthe interviewer wastocontactthathouseholdagain.Ifthepersonstill declined toparticipatein the study,thehouseholdsheet,together withany completed questionn airesfromthat household,was tobe returnedto the office. Once allthe households in thesurvey werecontactedand interviews completed, the refusalswere pooledandredistrib utedamongthe interviewers.Nointerviewerwastoreceiveher ownrefusalsto readminister.Thisyieldedgoodresults with many peoplegranting interviewstoadifferentinterviewer.Once thisstagewaseomplcto, there wasa90%
responserate amongthose householdsin whichat leastoneperson answeredthe questionnaire.The remaining10%werenot all refusals,per se,but rathersomewere 'non-respondents". Theseincludedpeoplewho were perhapstooilltocome to the telephone,butthis subgroupseemed to belargelymadeuporthe hard-of-hearing.
Withrespecttonotbeingable tomake evenaninitialcont act witha household
or anindividual, interviewers were instructed to try atleast seventimes beforere- gardingtbehouseholdor individual as unobtainable.Sometimes one member of the householdwas temporarilyabsentso bad to becontacted several days or even weeks after theinitialcontact.Ifhe were tobe gone for longer than this, he was considered unobtainable.
Summary:Househ oldsforwhic h therewas~1response Frequency %ofTotal Number of Subjects Households
Subjects Respondents Refusals Non-Respondents
1675 3649 3300 195 154
9004 5.3 4.2
Theabove summary refers onlyto those householdsforwhichthere was atleast one response.Thesecorrespond to householdsfor which the household sheet (which recorded the numberof responses,non-responsesand refusals)was completed.It ignores entirely the households whereno responsecouldbe obtained. Thesample listi ng consistedof 2076 househo lds.Ofthese, 1675had atleast one respondent.Of theremaining 401households,179 were contacted and ofthese,148 were complete household refusals and 31were household non-respondents.The remaining 222 con- siste dof householdsfor which thetelep hone number was nolonger in service(N/S) or for which no contact could be made after seven attempts .Twowere households in which all residents were under 20 years of age.
Based on knowledgeofthesampleclustersize of 2.18 adults perhousehold,
(mi =
2.18)the numbe rof adults can be estimated for the households where no household sheet was completed.Theseestimates appearbelow:23
Summary
Total
Irn
4522.58 3300
517.42 221.53 483.63 148( ~)==322.42
31(~)=67_53 222(mf)=483.63 Frequency
1675
364!}401(~)=813.58 3300
195 154 Non-Respondents Other
(N/S,No Answer, etc.]
Households Subjects Respondents Refusals
•Householdsforwhichtherewas '2:1response o Householdsin whichtherewereno respondents
Thereforewe estimatethefollowing responserates:
Householdsfor which there was;::; 1 response Including all householdswhere contact was made (i.e., excluding N/S,No Answer,butincluding complete householdrefusalsandnon-responses}
IncludingaU households
(~ )=90AO%
(~)=81.70%
(~)=72_97%
2.7 Linkage
The data having been collected , two stagesofdatalinkagewerecarried out .Linkage referstothe joining ofthesurvey datawithdata fromanothersource.It was done viaMepnumberswhich wereavailablefor2994 (or 90,7%) of therespondents.The rema ining 306 respondents werethose whorefusedto providetheirMepnumberor did not haveone(foreign stu dents or membersofthesecurity forces, {orexample).
Thedata on the2994 peoplewere then linkedwith the date fromtwoexternal sources.
The firstsource, termed'hospit alutilization'data, was extrac tedfrom com- putertapes from the DepartmentofHealt h and addedto the surveydata base.
Thesedataprovided thenumberofdays a respondent spentinhospital (excluding hospita lizationsdue to pregnancy ordelivery)for the four-yearperiodfrom April 1981 to March 1985 andwas the mostup-to- date that couldbe obtained. Thereason forthehospitalizat ion was not used.
The second externa lsource was termed 'physician consultation'data. Thiswas obtainedthroughcomputer records of doctors'insurance claims madeto the New- foundla ndMedicalCare Commission. Due to the very strictconfidentialityof this information, anOrderinCouncil from the ProvincialCabinetwas required before it couldbe released, aprocess which tookapproximately three months.Once re- leased , it providedthenumber ofphysician consultationsthatarespondent hadin the oneyear period corresponding very closely withcalendar year1985. Sinceding- nostie information was not made available,thesewerefor all consultatio ns,including pregnan cy related visits.Againthis wasthe most up-to-dateinformationavailable.
Su m mary Number of Respondents Number Linked withMCP Data Files
Frequency 3300 2994
Summaryof Those Lin ked WithMCP Files Frequency Numberwith~1 Doctor Visit 2434 over 1year (includingpregnancy) Number wit h2:1 Hospit al Day 599 over4years (excluding pregnancy)
25
2 .8 Sum mar y Suggestions for Future Tel ephone Surveys
Thefollowingare severa l suggestio ns which shouldbe keptinmind whentelep hone surveyssimilar tothisone are undertaken.This list isnot inte nded tobe exhaustive.
•InTELEPHO NESURVEYMETHODS: Sampling,Selection andSUpcf\·jsion, Lavra kas(1987) suggeststhatselectingfrom thetelephonedirectory is Ined- visable ift he proportionof non-coverage is estimatedtobe moretha n 10·15%
andone isintendingtogeneralizehisresultsto thepopulation at large.In this studyt}',(~rate of non-coveragewasestimated by the Newfound landTelephone Com panytobe approximatel y 4%.Ifthisproportionweretoincreasemuch beyond thispoint, randomdigitdialing (rdd)should be seriously considered sincewithrdd those with unlistedtelephone numberswouldbe as likelyto fall intothe sampleas those with listednumbers.This isimportant, whena large propor tion ofhouseholdshaveunliste dnumbersandpeoplebelongingto these householdstendto exhibitcerta in char acteristi cs.Forexample, accordi ngto Lavra kas,in theUnitedStat esthemostlikelygroup ofpeopleto have unlisted numb ers arelowerincomeminorit yAmericans.
•Inest imatin g the number oftelephone numbers requiredinoursamplingpocl tc achievea given number ofcompletedquestionnai res,theclustersize,estim ated respon se rate and the estimatednumberofresident ialnumbers in thesection of interestin thetelephonedirecto rywere considered.Inadd it ion tothese, through a pilot studyorpossibly bycontactin g thetelephonecompa ny,tile numbe rof'not -in-service' numbersamongtheeligiblehouseholds could have beenestimated.An inflationfactormight have also been usedtocompensate for oth er 'non-respondents',suchC13thosewhose numbers produ cedno answer
afterseven calls andthose who could notanswerthequestionnairedueto illness,forexample. Thismight haveeliminatedthe needtoincrease oursample poolsize afterthe study hadstarted. Whenthis was increased itwasdone tocompensatefor the'not-in-service'and 'no answer'numbers and for the decreasein the sampledustersize from the quotedcensuscluster size of 2.3 from Stat isticsCanada. Replacing such householdsifthey were refusals could bias the results,but replacing them dueto 'not-in-service' and'noanswer' numbers should nothave thiseffectunlesspersons belongingtosuchhouseholds arenotrandomlydistributedthroughout the population.Beforereplacingthese numbers inthefuture, itwould beworthwhileto contactthe telephone company fora breakdown of reasons for,and proport ion of,'not-in-service' lines.
•Withrespect to thefield workit is stro ngly advised that the effortbemade to directlysuperviseinterviewers,particularlylessexperienced interviewers, In the event ofspace limitations,oneach dayoneortwointer viewers should bescheduledto conducttheir interviews atthefieldoffice whilethe others carrytheirs outat home,This shouldtakeplace with as many interviews as possibleatthe beginning of thestudywith the frequencyofsupervised interviews dropping oITasit progresses.
•Forfut ure questionnairesit wouldbeadvisableto break downthe'non-response' rateforeach eligiblemember ofthe householdinto several cat egories, suchas 'no response 'after seven atte mpts,due to illnessor due toabsenceduring the survey period.A morethorough breakdownofreasonsfor 'non-response'could be usefulwhen planningasimilar type ofsurvey inthe fut ure.
In addition, all eligiblehouseholds should have ahousehold sheet completed evenwhen no response is elicitedfrom theunitso as torecordwhether this
27
was due to a not-in-servicenumber ordue to completehouseholdrefusal or non-response.
For complete household refusals, theattempt shouldbemade to findcutthe number of eligible household members.This would increasethe accuracyof the estimated numberof refusals among these households.As itWIlS,~henumber of refusalswas estimated basedon the sample cluster size fromthose1675 householdswhere the in£ormation was available.
Despitethese practical problems,the survey was highlysuccessful with a very low non-response rate.
Chapter 3
The Analysis of Data
The purpose of this study is to examinemany socio-medical questions pertaining to people'slifestyles,health practices andutilization of health services. As such, information was obtained on some of the many variablesassociated withthese aspects of people'slives.
The data were collected and first briefly exploredby looking at frequency dis- tributions anddescriptive statistics. As is the casein many studiesin the social sciences, thedata collectedin the present survey were, for the most part,categorical.
This chapter, therefore, will deal withanalyticaltoolsforcategorical data. Since categoricaldata are often presented as cross-tabulations,wewilllook at two-way and multi-waytables. Tests ofhypotheses of independencewill be considered as will severalofthemany measures ofassociation developedfor just such analysis of cate- gorical data.Strengthsand weaknesses ofthese measures willbe discussed.As the emphasis is on theapplication as opposed to themathematical development of these measures ofassociation, they are not rigorously dealt with from the mathematical point of view. After thispreliminaryanalysiswe will furtherexaminethe mannerin which variablesinteractwithone another. To this end we generate models for given sets ofvariables. To do so we employsuch statisticaltechniques as logisticregression
29
and loglinear analysis.
Many interestin gquestions exist ed for the researchteamsothat during the analysismanydifferentvariableswereexplored. From the perspective of the research teamand fromthesocio-medicalpointofview, all those exploredarc of interest.It is not the primarypurposeofthis report,however,topresentmedicalfindings.Forthis reason,onlya small numberof thevaria blesarefocuseduponsince todootherwise would resultin much repetitionin this chapte r. Thissubsetofvariableswillsufficeas illustrat ions throughout the remainderofthis reportandwi11 be discussed invarying det ailatthe time ofillustra tion.
SPSS-X and BMDP weretheprimary statis tica l packagesusedin theanalysis.
Minita bwas alsousedto alesser extent. All analysesweredone onthe VAXCluster runningVMS in theDepartm entofComputi ngServicesat MemorialUniversity of Newfoundland.
3.1 Con tingency Tab le Analysis
Acontingency tableclassifies data according tosome categorical criterion.We may have anrxccont ingency table,forexample, whichcrossesr levels ofvariable Awith elevels of variableB.Ourdata are classifiedaccordingtothe particular category 01'AandBtowhich they belong. Thecategories ofagivenvariablearcmutually exclusive and any givenpersonoritemcan fallinto one andonly one cell orthethe cont ingencytable.
In ex;..ringour data in thisstudy,wewantedtosec if two variables in ourtable wereindependent,andif notindependent,to whatdegree theycould beconsidered relat edorassociat ed.Our hypothesis is given as:
110 :There is no association, versus HI:Thenull hypothesisis not true
Inthe sections which follow, we shall testthis hypothesisand discuss,in gen- eral,the measures of association which may be 'led to examine to what degree the variablesmay be relatedto one another.
3.1.1 Test sof In dependence in Two-WayTables
In studies such asthis one, a simplerandom sample(of households inthis instance) is takenand only the sample size n isfixed. A variety of questions are asked ofthose in the sample. Thisis as opposed to the instancewhen marginaltotals arefixed.
This would be the case, for example, ifprior to the study one were to fix thenumber ofmales and femalesto interview.No marginal totals were fixed in our study.
We considera two-way(rxc)cross-tabulationof two discrete categoricalvari- ables, Aand B, wherelijis the frequencyof observationsin the cell ofthecontin- gency tablecorrespondingtorowiandcolumnj -that is, corresponding to levelsi andjof variablesAanB,respectively.The marginalrow frequencyk=E;=lIii is the sum of the frequencies ofleveli ofvariableAover all levels of variableB.
Similarly,the marginalcolumnfrequency isIJ=D'",1Ii;.The totalfrequency of all subjectsis given byI..=..n. Expressed in terms of observed proportions,Piiis the observedproport ionin rowiand columnj.P(A=i)=P;.=Li=1P;j and P(ll
=
j)=
P,i=
Ei=1P;j. Undertheassumption ofindependenceofA and B, PtA=
i,B=j)=
Pi.Pi=
Pi;wherePi.andPi are the marginalprobabilities and Pi; is the joint probability.In whatfollows,1;;andPi;will denote observed frequenciesandproportions, respectivelywhileFi;andPi;willdenote the corresponding estimat ed expected val-
31
ues. Thestandardmaximumlikelihoodestimates ofthe marginalprobabilities are h=Id»andP.j=Isl».
We may test theindependence ofAand 8 bylookin g atthex2test statistic,
x»,
which iscommon lyusedtotest for homogeneity orindependenceofvariables.Asis wellknown, underour nullhypothesisof independence,X'lhas an approximate Xlr-l)(C- l}distributionwhere
x'
=t t
If,;~F,;)';cl j =1 Fij
The lowerbound on thisstatistic is0, which is achieved whenf;j=
f\
for all i.i-Providedthereare no zeromarginaltotals, theupperboundis n(q-1)where q=min{r,c}. Cramer (1946)statesthatany row or columnconsistingentirelyof zerosmaybe discarde dandBlalock (1912) showshow, underthisassumptionof non-zeromarginalfrequencies,theupperbound onx»
isn(q- I). Withoutthis restrictionth ere isnoupper bound.AlthoughX'lis easy to calculateand applyand isfrequentl y used, it should be used withcautionwhenthesamplesize islarge, asis the case here.Being scnslflve to samplesize,thetest statisticwillgrowas nincreases and hence the nullhypoth esis may well berejected simply because n islarge,rath er tha nbecause thehypothesisis not true. In discussingX~,Reynolds (1977a),for example, comments that "one can always findasignificantrelat ionshipby makingthe samplelargeenough. In public opinionsurveys, wheren often exceeds1500,thedifficulty of separat ingsubstantive from statistica lsignificanceis particularlyacute." Alsorecall fromour section on sample sizethat we mustbecarefulin our relianceon
x»
if ourtable containscells withzerofrequ ency. As mentionedinthat section,itis generallysuggested that this test statisticbe used only ifthere areno cells withzero freq uency and a minimum of 80% or the cells have 5 or more observations.Yates ' x2corr ectedforcontinu ity:
X ;
X2isknown as the Pearson X2testetetlsrlc.Theoretically iti~appropriate only whenthe expected valuesinthe conti ngencytableare large,as onlythen can itbe assumedtohave an approximatex2distrib ut ion.Therefore the suitability-or indeed thevalidity-ofthis teststatistic maybequestiona ble whenthese valuesarc small.Yates suggested afact orto correctforthis situatio nina 2x2table. We will denoteYates'correctedX2byX;where
X2 _0(lf1lh2-f12hl l-
iP
c- !l.fd.tf.2
Maxwell (1961, 1978) is a proponentof
X;
claiming thatit shouldbe favoured overX2eveniftheexpectedvaluesarcatleast 5 andthat,in any event, itmustbe used whenthe samp lesize is small. Everitt (1977)alsorecommendsit whilepointing out that therehasbeen debate regarding itsusein allcases. Fingleton (1984)avoids using it in hisdiscussion,citingFienberg.AndFienbcrg (1977)suggeststhat the usc ofX;may not be appropriateifthe reasonforusingit is to correctX2so thatit moreclosely approximatesa x2distribut ion when thesamplesize islarge.He, like Grizzle(1967)and Conover (1974)beforehim, warns thatX;
maylead to a test that is tooconservative;the nullhypothesisisnot rejectedas frequentlyas it should be in 2x2tables.There aremany cont ributionsto the literaturewhichdebate the merits orthecontinuitycorrectionX;overX2•M er 'sexact te st
Anot heralternativeto X2for 2x2 tablesis Fisher's exacttest (Everitt1977, Reynolds 1977a, Upt on 1978)whichis givenby
p
=
!l.1h.!f)f.2 ! JlI!iJ2!J21!J22!J..!Rather thanapproximating a x2distributio n,thiscalculates the exactprobabilities.
33
As with
X;,
thismaybe used whentheexpectedcellfrequencies arcsmall.Thistest statistic may berecommended whenthe samplingscheme involves fixing marginal totals.Fisher'sexact testis a one-tailedtest as opposed to thetwo-tailedX2and X; tests.In tables withlarge valuesfor cells and for row and column margiuals, this is cumbersome to calculate.For 2x2 tables,statistical packages such as SPSS-X andBMDP calculatePonly whentheminimumexpectedcellfreq uency is less than 20;if this frequency is at least 20, then test statistics whichhave apprcximetcx
2distributions are substituted.
When examiningfourfoldtablesinour study,wedonot requiresuch alternatives tothe X2 test statisticsincewith our large samplesize andou r variablesunder consideration ,we should notha ve cells with such smallexpectedcell Irequcncics as wou ldwarrantthese alternat ives.
Like liho od _Rntio Test'<r
The likelihood-rat iotest,(fl,is alsousedto testfor independence.Again,if the expected cell frequencies arelarge, it approximatesa.xl.-l)(c-l)distribution.It is givenby
where log is the natural logarithm.(J'l,likeX2, shouldbe used with cautionif at all,when expectedcellfrequenciesare small.We are not , as a general rule , seriously affectedby this in our studypart icularlyin lower dimensionaltables.
It has beenknownfor sometimethatsmoking adverselyeffectsone's health.
Giventheamount ofpublicawarenessof and concernaboutthe effectsof smokingon healt h,the research team was interested in st.udyingthe relationshipbetween smoking hab its of the generalpublic andtheirself-assessedhealth status Self-assessedhealth status is a measureofhealth thathas beenproposedin thesocialmedicalliterature
as a validsubsti t ut e for thevery costly evaluationor healthby a medicalteam . Respond entswere asked ques tio nspertainingtosmoking habits,andfromtheir answersa variable wasconstructedwhich categorizedeach respondentas havingnever smoked,asaformer smoker(havin ggiven upsmoking forat leastone year ), oras a currentsmoker.The respondentswere also asked to ratetheirhealthaspoor,fair, good,or excellent.From thesetwovariableswecons iderthe 3x4table below,where thevaluesinparenthesis are the ex pect ed values.
G'=64.322 p=.OOOO
x'
=63.087 p=.0000 (df~6) Table31Heal th Stat us
Smoke poor fair ood excellent Totals
neversmoked 10 172 696 414 1292
(20.8) (214.6) (696.9) (359.8)
formersmoker 14 125 428 250 817
(13.1) (135.7) (140. 7) (227.5 )
curre ntsmoker 29 251 656 255 1191
1119.1\ 1197.8\ 1642.4\ 1331.7)
Totals 53 548 1780 919 3300
Theobserved significa ncelevel,or p-value,which we denote by p,istheproba-- bility of getting a test st at istic valu eatleas t as extreme as thevalue observed . Here we reject the hypothesisofindepe ndence between smokinghabit and self-assessed healthhabits; these two variablesappeartobe related in someway.Wit h theX' andCPstatisli cswe ca nno t assumecausalityalthough ,froma medical perspective, onewouldprob ablysurmi sethat if dependenceisindicate dthenilis morelikely that self-assessedhealthslatu s isdependentupon smokinghabitthan the reverse.
Inthisparticularexamplethe re arcnocells witha freque ncy less than5,but the sa mplesize is quitelargeanditcou ldbe that our teststatisticswerelargeenoughto ca use usto rejectour hypo t hesis not because the variablesare trulyindepend en tbut becauseX'and(paresens itive to thelargesample size.Becauseofthis, with large
35
samplesizes weshould notrelyentirely on values ofx2stat istics.Inalater section we will discuss statisticswhich tryto compensate for this and willalso considermeasures ofassociationwhich may shed moreligh tontherelationship which existsbetween variableswhichare apparently not independent.
3.1.2 PartitioningX2TestStatisticsin Two-Way Tables
Oftenwearenot intereste donly inthehypothesisofindependence between variables in a contingencytablebut also insubhypotheseswithinthistable. For a medical researcherthisis the case wit h the hypothesiswehavejust explored.ItWIISan importanttableandfurther ana lysiswas attempted by examiningsubhypctucsca throughpartit ioning.
There are methods for partitioning tableswhichenable one to dividethe original table into eubtebles onwhichsubhypothesesmay be subsequently tested usingaX2 test statistic suchas Pearson'sX2orth.'ikelihood-ratio(fl. Although different methods existfordoing so,we sha llonlygive an exampleusing the methodusedby Goodm an(1968) (secReynolds 1977aorAgresti1984,(or example).Aspointed out manytimes in the literature,a
x2
statistic can he decomposedinto componentparts suchth at thedegreesoffreedomof theoverallst atisticis equalto the sumofthe degrees offreedom orthoseparts.In anrxc table, for example,wecanpartition ouroverall x2into asmany as(r-I Hc -1)component parts sincethere arcthat many degrees offreedom.In thiscase, each componentpart wouldcorrespondto a 2x2 tableeachwhichwould be testedforindependencewithaX2teststatistic with I degreeof freedom.Pearson'sX2has beenusedwithsuch partitions; howeve r weuse G'sincewhenpartitionedthe componentparts ofX~sumapproxima telyto theX2 of the originaltable whereasthe component parts ofG1sumexactlytoth e overall 0'.Letuslook again atourtableofsmoking><hea/thstatus. Inthat originaltable werejected ourhypothesisof independencebetweenthese twovariables.Priorto examiningthattablewe wereinterestedin theindependence ofthese va riableswith smoking as a dichotomy-eitheronesmokesor doesnot .With thisvariablestill dichotomous,we were also interestedin theindependenceofthetwo variableswhen an individualassesseshis health status as either poorOffair,Ofasgood or excellent.
Tothis end, letus re-examineourJx4 table,applying a methodof partitioninggiven byGoodman as statedin Reynolds (1977a).Under this method we partitionthe original3><4 tableinto two parts.Onesubtableconsists of thefirst 2 rows andall 1 columns to give us a 2><4 table(3.1a).Thatis, we dropthe currentsmokers from ourtable. Oursecondsubtableisalsoa2x4table(3.lb) where oneoftherowswi11 bethe rowignored inthefirst subtable(the currentsmokers)and the otherrow is the sumof the rows used inthatfirstsubtable,namely the formersmokers andthose who neversmoked.Letuslook atthefirstsubtable of our partition.
Table3la
HealthStatus
Smoke poor fair good excellent Totals
never smoked 10 172 696 414 1292
(14.7) (181.9) (688.6) (406.8)
form ersmokcr 14 125 428 250 817
(9.3) (115.1) (435.4) (257.2)
Totals 24 297 1124 664 2109
G~=5.681 p=.1282 X2=5.824 p=.1205
(df_3)
Notingthe values forthe test stati sticsfor this table,we say that theyarc not significantand therefore we do notreject the subhypothesisofindependence of the twn variableswhencurrentsmokersarenot considered.Thisis a rather interesting finding as it implies thatthosewho have givenup smoking for at least one year do noLappear to ratetheirhealthstatusdifferentlythanthose who never smoked.
Nowletuslook attable 3.lb,thesecondpartitionof our3x 4table.
37
Cf2=58.641 p=.0000 .\'2=58.567 p=.0000 (dr= 3) T bl 31ba e.
Health Status
Smoke peer fair -l!:ood excellent Totals
not current 24 297 1124 664 2109
(rormer/ never) (33.9) (350.2) (1137.6) (587.3) current smoker
(1~:ll (l~~161 (6~;~41
(331.7)255 1191Totals 53 548 1780 919 3300
Thisis significant,so wereject the subhypothesisof independenceofthetwo variables whensmoking is dichotomized ascurre ntandne t currentsmokers,Those who do not smokecurrently-whethertheyhavenever smoked orareformer smokers-appear to rate theirhealth st atus differentlythan thosewho are currentsmokers.
Recall thatprior toexamining the originaltablewe werealso interestedin the independenceof the two variable swhen anindividualrates his healthstatusas either pooror fair,or as good or excellent .Continuing to partitio ntable
au,
we consid- r thetableswhich follow. In eachcase thesmokingvariableis dichotomizedasin table3.1b.In the firstsubta ble, 3.lc,we onlylookat thosepeoplewithpoor orIalr sell-assessedhealth status.(fl
=
1.539 p=.2148 X2=1.543 p=.2141 (df~I) T bl 31a e .rcHeal th Status
Smoke poor fair Total s
not current 24 297 321
(28.3) (292.7) currentsmoker
(2~97\ (2;~13\
280Totals 53 548 601
Thisisnotsignificant, hencewedo notrejectthe subhypothesis orindependence of thetwo variablesas theystand here.Itis interest ingthatfor those who raletheir health as less than good,therutthattheyare currentsmokersor notcurrentsmokers is independent ofwhethertheyratethei rhealth as either poor or fair.
In our next subtable,3.1d, we consider the remainder of our respondents, namely those who rated their health as good or excellent.
Table3.1d
HealthStatus Smoke good excellent Totals
notcurrent 1124- 664 1788 G'= 22.866 p= .0000 (1179.2) (608.8) X2= 22,477 p=.0000
current smoker 656 255 911 (df= 1)
(600.8) (310.2)
Totals 17B{) 919 2699
Asthisis significantwe rejectthe subhypothesis of independence of the variables when onlythose withgood or excellent self-assessed healthst atus are conaidc.ed.
For this sub-group,those who do and do not currentlysmoke appear to rate their heahhstat us differently,
Finally we examine more closely another subtable (3.1e)in which we werepar- ticularly interested and which prompted the secondstage of partitioning. Inthis instance,with all respondentsincluded, theself-assessed health status variableis coded as either poor or fair,or as good or excellent.
(J2=34.236 p=.0000 X2=35.111 p= .0000 (df= 1) Tabl 31ee
Health Status Smoke poorfair good excellent Totals
not current 321 \788 2109
(384.1) (1724.9)
currentsmokcr 280 911 1191
(216.91 f974.1l
Totals 601 2699 3300
Sincethis is significant,we again reject the subhypothesis of independence of the twovariables whenthey arc bothdichotomizedas seen in the subtabJe.Those who do not currentlysmokerate theirhealth differentlyfrom those who do smoke.The non-currentsmokers arc more inclined than thecurrent smokers torate their health
39
asgoodorexcellent ratherthan poor orfair.Thisconfirmspreviousworkwhich has acknowledgedforsometimetha tsmoking hasdetrimenta l effectson health.AIt110ugh wecannotassumecausality here,wecanst atetha tusingthisdichotom yapenon's smokingstatusis not independentofhisself-assessedhealt hrati ng.
Notethatthecomponent subt ablesof table3.l b,namely tables3.1c,3.1dand 3.le,giveG1valu es whichsumexa.ct.ly totheCZvaluefortable3.lb.Inthispar- titioning, the(J2associat ed withtable 3.lc contributesmuchlessto the0'of table 3.l bthandoes theCZoftable 3.ld or3.1e.Thecontributions oftheCPan d X' st atisti csobtainedfrom the subtable s ofthe origin al tablearcsumma rized below:
Table 31£
Table Subt able df
a
X piG ) pX'IInitialPart itioning of Table3.1
3.1 origina l 6 64.322 63.081 .0000 .0000
3.la neverV8fonne rsmokers 3 5.681 5.824 .1282 .1200 onassesJinghealt has
poor,fair,geed orexcellent
3.1h not currentvscurren t smokers 3 58.641 58.567 .0000 .0000 onassessing health as
DOOr, fair, codorexcellent
FurtherPartitioningorTable3.l b
3.1c notcurrent vscurrent smokers 1 1.539 1.543 .2148 .2141 on assessinghealthas
healthupoororfair
3.1d notcurrentV1currentsmokers 1 22.866 22.477 .0000 .00 00 on usessin! healthas
good orexcellent
3.l e notcurrent vscurrent smokers 1 34.236 35.111 .0000 .0000 onassessingheal thas
poor/fairor ood/excellent
The methodusedbyGoodman can hefurther extendedso that anyTXttable canbepartitioned into(r-1)(e - l ) 2x2 tables.Foraniceillustrationon howto dothis,see Reynolds (1977a).