L’échantillonnage déterminé selon les répondants peut-il être utilisé pour les enquêtes épidémiologiques auprès de populations difficiles d’accès ou cachées ?

(1)

L’´ echantillonnage d´ etermin´ e selon les r´ epondants peut-il ˆ etre utilis´ e pour les enquˆ etes

´

epid´ emiologiques aupr` es de populations difficiles d’acc` es ou cach´ ees ?

Yann Le Strat, Marie Jauffret-Roustide

D´ epartement des Maladies Infectieuses Institut de Veille Sanitaire (InVS)

Saint-Maurice, France www.invs.sante.fr

1/27

(2)

Notre projet

I R´ ealiser une enquˆ ete ´ epid´ emiologique aupr` es d’usagers de drogue ` a Paris ne fr´ equentant aucune structure, en 2013, dans le cadre de l’enquˆ ete Coquelicot.

I Pour construire l’´ echantillon on ne peut se reposer sur aucune base de sondage !

I Comment faire ?, sachant qu’on ne veut pas faire du non-al´ eatoire mais que l’al´ eatoire semble difficile

I On a explor´ e l’´ echantillonnage d´ etermin´ e selon les r´ epondants (SDR)

2/27

(3)

Constat

I A partir de 1997 (Heckathorn, Social Problems ), la pratique a devanc´ e la m´ ethodologie.

I Cette technique d’´ echantillonnage a ´ et´ e port´ ee aux nues par ceux qui souhaitaient capter des individus difficiles d’acc` es ou cach´ es car incontestablement ils r´ eussissaient ` a capter ces individus, donc ` a construire un ´ echantillon.

I Par contre l’inf´ erence, les biais, l’effet plan sont tr` es peu (voire pas du tout) mentionn´ es dans les articles ce qui est assez choquant par rapport ` a ce qu’on a l’habitude de mentionner/lire lorsqu’on utilise des plans de sondage al´ eatoires.

3/27

(4)

Etudes biologiques et comportementales qui ont utilis´ e SDR

I SDR est largement utilis´ ee en sant´ e publique

I SDR a ´ et´ e r´ ecemment appliqu´ e dans plus de 120 ´ etudes dans plus de 20 pays

I Plus de 32,000 participants ont ainsi ´ et´ e recrut´ es

4/27

(5)

VIH : Etudes ayant utilis´ e SDR

(59%) of these studies combined RDS samples with sam- ples collected using other sampling techniques; five (16%) failed to generate a minimum of three referral waves, four (13%) either did not report whether they had collected data on size of the social networks or reported them inconsis- tently; two (6%) did not analyze their data using proper RDS techniques; one (3%) did not provide sufficient information about RDS recruitment requirements: and one (3%) combined samples from two different RDS studies.

One hundred twenty-three studies met all of our eligi- bility criteria. Of these, one study was completed in 2003, nine studies in 2004, 34 in 2005, 65 in 2006, and 14 in 2007. Studies were conducted in 28 different countries and five continents: Europe (59, 48%), Asia (40, 33%), Latin America (14, 11%), Africa (7, 6%) and Oceania (3, 2%) (Table 1). Six ty-five studies (52%) were among I DU s (Table 2), 39 (32%) among M SM (Table 3), 18 (15%) among SW (Table 4), and one (1%) among H RH men (Table 5). B etween 2003 and October 2007, a total of 32,298 participants were surveyed, of whom 17,434 (54.0%) were I DU s, 10,101 (31.0%) were M SM , 4,342 (13.5%) were SW s, and 421 (1.5%) were H RH men.

One hundred six studies (86%) reported collecting both H I V biological and behavioral data concurrently, and the remaining 17 (14%) were solely behavioral surveys. Six ty- four (53%) collected dried blood spots, 44 (36%) venous blood, 6 (5%) oral fl uid and 25 (21%) urine or penile or vaginal swabs. Of the 112 studies with available informa- tion, 101 (90%) reported conducting some degree of a priori formative research. Although face-to-face methods were the most common means of interviewing (110 studies, 89%), audio computer-assisted structured interviews (AC ASI ) and self-administered instruments were used in eight (7%) and five (3%) studies, respectively. P articipants were enrolled at a variety of sites including governmental hospitals, public health clinics, public health departments, non-governmental organizations providing services for target groups, volun- tary counseling and testing clinics, hotel rooms, rented store

fronts, and mobile vans. Of the 114 studies that reported the number of recruitment sites, 92 (81%) used a single site, but as many as five sites were used. Only six (5%) studies reported using mobile vans as recruitment sites; and in one study, two vans were used but in stable locations.

One hundred twenty (99%) studies reported that seeds were diversified (i.e., selected differently from each other) based on key demographic or risk behavior characteris- tics; three studies did not report on diversification. Thirty- one (43%) of 72 studies with available data reported adding seeds beyond the original seeds. All but three studies set the allowable number of recruits per partici- pant at three. Of 103 studies with available data, 59 (57%) did not limit the time during which participants were allowed to refer their recruits. Among 44 studies that did limit time for recruits to respond, the recruitment period ranged from 7 to 60 days.

Studies used a wide range of primary and secondary incentives for recruitment. Of the 107 studies that reported using primary incentives, a maj ority of 89 (83%) used cash incentives, 11 (10%) gave cash equivalents (e.g., food stamps) or small goods with minimal monetary value and 3 (3%) gave condoms and lubricants; 4 (4%) did not offer any primary incentive. Seventy-eight studies reported data on secondary incentives, and 72 (92%) offered them; these incentives were usually monetary (58 studies, 74%). Sev- enty-eight studies reported data on both primary and secondary incentives; the value of the primary incentive was higher than that of the secondary incentive in 52 (67%) studies, the same in 14 (18%), lower in seven (13%) and undetermined in four (5%) studies. One (1%) study did not offer any kind of incentive. Of these 78 studies, 55 (71%) gave money as both primary and secondary incentives, 8 (11%) provided money only for one of them, and 15 (19%) did not offer monetary incentives at all. I n addition to incentives, studies offered a wide range of additional ser- vices, such as free H I V testing and counseling, referral for clinical follow-up, condoms, lubricants and information and educational materials.

W e also summarize how successfully studies were able to recruit participants (Table 6). On average, RDS studies used 10 seeds (range, 2– 32, median 8.0, intra-quartile range [ I Q R] 6.0– 13.0) and had 1.6 (range 0– 19, median 0, I Q R 0– 2.0) unsuccessful seeds per study. Of 86 studies with available data, 51 (59%) reported having no unsuc- cessful seeds. The median proportion of unsuccessful seeds per study was lower among studies of I DU s (0%, I Q R, 0– 5%) than among SW s (20%, I Q R 14– 30%, z score -3.872, P \ 0.0005). There was no significant difference in the median proportion of unsuccessful seeds per study between M SM and I DU s (z score - 0.915) or M SM and SW s (z score - 1.916). The greatest number of referral waves was among I DU s (34); the average number Table 1 H I V biological and behavioral studies that used RDS by risk

group and continent, 2003– 2007

^a

C ontinent I DU M SM SW H RH

men Total

Africa 2 3 1 1 7 (6%)

Asia 19 14 7 0 40 (33%)

Europe 42 16 1 0 59 (48%)

Latin America 2 5 7 0 14 (11%)

Oceania 0 1 2 0 3 (2%)

Total 65 (53%) 39 (31%) 18 (14%) 1 (1%) 123 (100%)

a

Studies conducted outside the U nited States only

K e y : H RH = high-risk heterosex ual; I DU = inj ecting drug user;

M SM = men who have sex with men; SW = sex worker

S108 AI DS B ehav (2008) 12:S105– S130

123 Malekinejad et al. AIDS Behav (2008) 12:S105-S130

5/27

(6)

Contribution majeure de Matthew Salganik

I Salganik MJ and Heckathorn DD (2004) Sampling and estimation in hidden populations using respondent-driven sampling.

Sociological Methodology, 34, 193-239.

I Salganik MJ (2006) Variance Estimation, Design Effects, and Sample Size Calculations for Respondent-Driven Sampling. Journal of Urban Health, 83, i98-i112

I Goel S and Salganik MJ (2009) Respondent-driven sampling as Markov chain Monte Carlo. Statistics in Medicine; 28, 2202-2229.

I Goel S, Salganik M.J. (2010) Assessing respondent-driven sampling. PNAS, 107, 6743-6747

6/27

(7)

Effets plan : ´ etendue :5.7-58.3, m´ ediane=11

stability of repeated cross-sectional RDS estimates has been examined, again yielding ambiguous results. For example, in studies of men who have sex with men (MSM) in Beijing, China in 2004, 2005, and 2006 (22), year-to-year RDS estimates for age and employment status were stable, whereas estimates of education and sexual orientation were suspiciously volatile.^¶

In contrast to previous approaches, we evaluate the performance of RDS by simulating the sampling and estimation process on 85 real populations mapped in two previous studies. In all cases, both the network structure of the population and demographic traits for each individual are available. We are thus able to directly compare empirical RDS estimates to true population values and, in particular, to measure the variability of estimates.

Data and Methods

Our first source of data, Project 90, was a large, multiyear study that began in 1987 as a prospective examination of the influence of network structure on the propagation of infectious disease (23). As such, researchers attempted to construct a network census of high-risk heterosexuals in Colorado Springs, focusing particularly on sex workers and drug injectors and their sexual and drug partners (23–26). We restrict attention to the giant component of the network, comprising 4,430 individuals and 18,407 edges, representing social, sexual, and drug affiliation.

Our second data source, the National Longitudinal Study of Ado- lescent Health (Add Health), mapped the friendship networks of 84 middle and high schools in the United States (27–29). The giant components for these school networks range in size from 25 to 2,539 students—with a median size of 753—and collectively include a total of 72,262 individuals and 258,688 edges.

Rather than attempting to model the complex social dynamics that play out during the RDS recruitment process, in our simula- tions we assume the same, idealized sampling conditions consid- ered in the theoretical RDS literature (4, 8, 10, 30). Specifically, (i) initial sample members are chosen independently and propor- tional to network degree; (ii) relationships within the population are symmetric (i.e., ifAis a contact ofB, thenBis also a contact ofA); (iii) participants recruit uniformly at random from their contacts; (iv) those who are recruited always participate in the study; (v) individuals can be recruited into the sample more than once; (vi) the number of recruits per participant does not depend on individual traits; and (vii) respondents accurately report their social network degree. The remaining parameters of our simula- tions are modeled after common RDS study features (5). Starting from ten initial seeds, each participant recruits between 0 and 3 other individuals. The exact recruitment distribution mimics an RDS study of drug injectors in Tijuana and Ciudad Juarez, Mexico (31), in which1∕3of participants recruited no one, 1∕6recruited one other participant,1∕6recruited two other participants, and1∕3recruited three other participants, the max- imum allowed. The simulated recruitment procedure continues until a sample size of 500 is reached. The entire sampling process was repeated 10,000 times on each network to generate replicate estimates.

Results

From each simulated sample, the RDS estimator (Eq.1) was used to infer the population proportion of a given trait—for example, the proportion of drug dealers in the Project 90 network or the proportion of students on the soccer team in a particular Add Health school. Consistent with theoretical results (8, 10), we find RDS generates approximately unbiased estimates: Across all networks and traits, both the mean and median bias are less than 0.0005.

The variability of RDS estimates, however, is significantly larger than generally acknowledged. We quantify this variability in terms of design effect (32), which benchmarks the performance of RDS against that of simple random sampling (SRS). Specifi- cally, the design effectdis Varðp^RDSÞ∕Varð^pSRSÞ, wherep^RDSis the RDS estimate andp^SRSis the estimate obtained from SRS.

It follows that an RDS estimate with sample sizenand design effectdhas the same variance as a simple random sample of sizen∕d. A design effect of 10, for example, effectively reduces an RDS sample of nominal size 500 to an SRS sample of size 50.

Consistently large design effects are seen in both Project 90 and Add Health (Fig. 1). The 13 binary traits in Project 90 have design effects that range from 5.7 to 58.3, with a median design effect of 11.0. As a consequence, estimating the 17% unemploy- ment rate in Project 90, which has a design effect of 10, with reasonable precision (!5%, 95% confidence) requires an RDS sample of approximately 2,300 people—a sample size 5 times larger than in nearly all previous RDS studies (5). We observe a similar phenomenon for each of the 46 binary traits in Add Health. The median design effect for traits ranges from 4.2 to 14.4, where for each trait the median is taken across the 84 Add Health schools. The overall median design effect for all traits in all schools is 5.9.

All traits in the Project 90 and the Add Health networks yield design effects larger than what is commonly assumed in the planning stages of RDS studies. A review of 91 studies found that more than half assumed a design effect less than 1.5, and all assumed a design effect less than 2.5 (5). Furthermore, a rule- of-thumb design effect of 2 had been suggested by Salganik (12). Given that we find typical design effects greater than 5, even

PIMP DRUG BOSS HOMELESS RETIRED UNEMPLOYED HOUSEWIFE THIEF FEMALE DISABLED SEX WORKER DRUG DEALER SEX WORK CLIENT NONWHITE

A

1 1 0 2 0 3 0 4 0 5 0 6 0 7 0

De s ig n e ffe c t

MOBILITY AID BRACE ICE HOCKEY HEALTH PROBLEM FIELD HOCKEY ADOPTED BOOK CHEERLEADING SGA HISTORY WRESTLING DEBATE TENNIS OTHER SPORT ARTIFICIAL LIMB ORCHESTRA GERMAN SKIP MEDICAL CARE FFA FRENCH TWIN SWIMMING SPANISH LATIN COMPUTER MATH NEWSPAPER V OLLEYBALL YEARBOOK DRAMA NATIV E SOCCER TRACK LIV E WITH MOM BASEBALL/SOFTBALL SCIENCE LIV E WITH DAD FOOTBALL CHORUS OTHER CLUB HONOR FEMALE BAND BASKETBALL DRINK NONWHITE

B

Fig. 1. Design effects for the 13 binary traits in Project 90 (A) and the 46 binary traits in Add Health (B); in Add Health, each circle indicates the design effect of a trait in one of 84 schools.

¶Year-to-year estimates for the proportion of the population with less than a high school education went from 33% to 70% to 66%, and estimates for the proportion bisexual went from 30% to 55% to 52%.

6 7 4 4 ∣ www.pnas.org/ cgi/ doi/ 10.1073/ pnas.1000261107 G oel and S alganik

7/27

(8)

Illustration : Enquˆ ete des CDC : IDU

Morbidity and Mortality Weekly Report

www.cdc.gov/mmwr

Weekly April 10, 2009 / Vol. 58 / No. 13

HIV-Associated Behaviors Among Injecting-Drug Users — 23 Cities, United States, May 2005–February 2006

Since the late 1980s, incidence of human immunodeficiency virus infection (HIV) has declined 80% among injecting-drug users (IDUs) in the United States; in 2006, an estimated 6,600 (12%) of new HIV infections occurred among IDUs (1).

To assess HIV-associated behaviors among IDUs at risk for HIV infection, CDC analyzed data from the National HIV Behavioral Surveillance System (NHBS) collected during May 2005–February 2006 (the most recent data available).

The results of that analysis indicated that, during that period, 31.8% of participating IDUs reported sharing syringes, and 62.6% had unprotected vaginal sex; 71.5% had been tested for HIV, and 27.4% had participated in an HIV behavioral inter- vention. These data can help guide local, state, and national prevention services tailored to IDUs at risk for HIV infection and other bloodborne or sexually transmitted infections.

NHBS is an ongoing behavioral surveillance system, estab- lished by CDC in 2003 in cities where approximately 60% of all cases of acquired immunodeficiency syndrome (AIDS) had been reported. NHBS assesses trends in HIV risk behaviors, testing, and HIV prevention services among three groups:

IDUs, men who have sex with men (MSM), and heterosexuals. NHBS data are collected in rotating cycles, approximately once every 3 years from each of the three groups; however, the groups are not mutually exclusive (e.g., MSM who inject drugs might be participants in both the MSM and IDU cycles).

For this report, interviews were conducted with IDUs in 23 metropolitan statistical areas,* using respondent-driven sampling, a peer-referral sampling method (2). Recruitment

chains began in each city with fewer than 20 initial participants who either were referred from programs serving the local IDU community or were recruited by NHBS staff members through outreach. Initial participants who completed the interview were asked to recruit three other IDUs through the use of a coded coupon system to track the referrals. Recruitment continued for multiple waves of peer referrals; all participation was voluntary.

Participants were paid $25 for their interview time; those who recruited others were paid an additional $10 for each eligible IDU they recruited to participate. Persons were eligible to participate if they had injected drugs during the preceding 12 months,^† resided in the metropolitan statistical area where they were interviewed, were aged >18 years, and were able to give informed consent. Trained interviewers administered a stan- dardized questionnaire in person, using a handheld computer;

the survey took approximately 45 minutes to complete.

* Atlanta, Georgia; Baltimore, Maryland; Boston, Massachusetts; Chicago, Illinois; Dallas, Texas; Denver, Colorado; Detroit, Michigan; Fort Lauderdale, Florida; Houston, Texas; Las Vegas, Nevada; Los Angeles, California; Miami, Florida; Nassau-Suffolk, New York; New Haven, Connecticut; New York, New York; Newark, New Jersey; Norfolk, Virginia; Philadelphia, Pennsylvania; San Diego, California; San Francisco, California; San Juan, Puerto Rico; St. Louis, Missouri; and Seattle, Washington.

INSIDE

333 Preliminary FoodNet Data on the Incidence of Infection with Pathogens Transmitted Commonly Through Food — 10 States, 2008

337 Cholera Outbreak — Southern Sudan, 2007 342 Notice to Readers

343 QuickStats

† Interviewees were asked, “Have you ever in your life shot up or injected any drugs other than those prescribed for you? By shooting up, I mean any time you might have used drugs with a needle, either by mainlining, skin popping, or muscling.” Those who said “yes” were then asked, “When was the last time you injected any drug? That is, how many days or months or years ago did you last inject?” Those who had injected during the 12 months preceding the date of the interview were eligible to participate.

Department of Health and Human Services Centers for Disease Control and Prevention

The results of that analysis indicated that, during that period,

The results of that analysis indicated that, during that period, 31.8% of participating IDUs reported sharing syringes, and 31.8% of participating IDUs reported sharing syringes, and 62.6% had unprotected vaginal sex; 71.5% had been tested for 62.6% had unprotected vaginal sex; 71.5% had been tested for HIV, and 27.4% had participated in an HIV behavioral inter- vention. These data can help guide local, state, and national

Please note: An erratum has been published for this issue. To view the erratum, please click here.

8/27

(9)

Enquˆ ete des CDC : UDI

I Questions: partage seringues et /ou mat´ eriel d’injection, rapports sexuels non prot´ eg´ es, tests ant´ erieurs VIH, VHC ...

I Crit` eres d’´ eligibilit´ e : s’ˆ etre inject´ e au cours des 12 derniers mois, ˆ etre majeur, r´ esident dans la ville ...

I Moins de 20 racines par ville

I 3 recrutements demand´ es

I Participants pay´ es 25 $ + 10 $ pour chaque UDI recrut´ e.

I Taille d’´ echantillon finale : 10301

Rq : Si l’effet plan est ≈11, on a la pr´ ecision d’un ´ echantillon de taille 930 avec un SAS !

9/27

(10)

Objectif

L’objectif de cette pr´ esentation est d’´ evaluer l’´ echantillonnage d´ etermin´ e selon les r´ epondants en nous basant sur :

I une ´ etude de simulation

I une revue de la litt´ erature des 2 derni` eres ann´ ees L’´ evaluation se focalise g´ en´ eralement sur :

I L’existence d’un biais et son amplitude, notamment lorsque des hypoth` eses sont viol´ ees

I Variance (effet plan)

10/27

(11)

Partie 1 : ´etude de simulation

11/27

(12)

Construction de r´ eseaux sociaux

On a construit 9 r´ eseaux selon une proc´ edure d´ ecrite dans l’article de Salganik et Heckathorn (2004 - Appendix A) avec les

param` etres suivants :

I Taille de la population : 2000

I Pr´ evalence de la maladie : p A variant de 0.1 ` a 0.9

I Interconnexion : I=0.6

I Distribution des degr´ es : d _i ∼ exp(1/6) On a utilis´ e la librairie network dans R.

12/27

(13)

Illustration d’un r´ eseau social simul´ e

rouge: personnes s´ eropositives pour le VHC, bleu: personnes s´ eron´ egatives

13/27

(14)

Construction d’´ echantillons

I Pour chaque r´ eseau, 1000 ´ echantillons de taille 100

I Les individus ont ´ et´ e tir´ es avec ou sans remise

I On a test´ e plusieurs nombres de racines et de coupons distribu´ es

I On a calcul´ e les estimateurs (proportion, variance) et effet plan.

14/27

(15)

Distributions de ˆ p _A selon la vraie pr´ evalence (2 racines, 1 coupon)

True proportion of the population in group A

Estimated proportion in group A

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●

p−0.1pp+0.1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

15/27

(16)

Conclusion 1

I Lorsque toutes les hypoth` eses (d´ ecrites par Volz et

Heckathorn) sont v´ erifi´ ees, on n’observe pas de biais quelle que soit la pr´ evalence.

16/27

(17)

Distributions des effets plan selon la pr´ evalence

0 2 4 6 8 10 12

0.000.050.100.150.200.250.300.35

p = 0.1

0 2 4 6 8 10 12

0.000.050.100.150.200.250.30

p = 0.2

0 2 4 6 8 1012

0.000.050.100.150.200.250.30

p = 0.3

0 2 4 6 8 10 12

0.000.050.100.150.200.25

p = 0.4

0 2 4 6 8 10 12

0.000.050.100.150.200.25 p = 0.5

0 2 4 6 8 10 12

0.000.050.100.150.20 p = 0.6

0 5 10 15

0.000.050.100.15

p = 0.7

0 5 101520253035

0.000.020.040.060.080.10 p = 0.8

0 50 100 150

0.000.010.020.030.04

p = 0.9

468101214

Proportion p in the group A

Median of the design effects

0.1 0.3 0.5 0.7 0.9

17/27

(18)

Conclusion 2

L’effet plan (ρ) est relativement ´ elev´ e par rapport aux effets plan observ´ es dans des enquˆ etes ´ epid´ emiologiques :

I ρ augmente lorsque la pr´ evalence augmente

I M´ ediane de ρ: 2.6-3 for 0.1 6 p 6 0.5

I M´ ediane de ρ: 3-14 for 0.6 6 p 6 0.9

18/27

(19)

Biais, Erreur absolue moyenne et effets plan

Estimateur RDS-II. ´Echantillonnage avec ou sans remise, 2 ou 6 racines.Vert: 1 coupon,rouge: 2 coupons,bleu: 3 coupons

0.000.020.04

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

r = 2, with replacement

Bias

0.0350.0450.0550.065

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 MAE

510152025

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Design effect

0.0000.0100.020

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

r = 2, without replacement 0.0250.0350.045

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

51015202530

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0.0000.0100.0200.030

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

r = 6, with replacement 0.0300.0400.050

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

510152025

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0.0000.0100.020

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

r = 6, without replacement 0.0250.035

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

10203040

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

19/27

(20)

Conclusion 3

I Avec ou sans remise, avec 2 ou 6 racines, lorsque le nombre de coupons augmente, le biais et la variance augmentent.

20/27

(21)

Biais, Erreur absolue moyenne and effets plan

Estimateur RDS-II. ´Echantillonnage sans remise, avec 2 ou 6 coupons.bleu : 2 racines,rouge: 6 racines

0.0020.0060.0100.014

Bias

coupons = 2

0.1 0.3 0.5 0.7 0.9

0.0300.0350.0400.0450.050

MAE

0.1 0.3 0.5 0.7 0.9

5101520

Design effect

0.1 0.3 0.5 0.7 0.9

0.0050.0100.015coupons = 3

0.1 0.3 0.5 0.7 0.9

0.0280.0300.0320.034

0.1 0.3 0.5 0.7 0.9

5101520

0.1 0.3 0.5 0.7 0.9

21/27

(22)

Conclusion 4

I Le choix de 2 ou 6 racines n’a pas un grand impact sur l’erreur absolue moyenne et l’effet plan.

I C’est le nombre de coupons qui est important !

I C’est un param` etre plus important que le type de tirage (avec remise/sans remise) ou le nombre de racines.

Recommandation : il faut limiter le nombre de coupons et avoir suffisamment de racines pour diversifier l’´ echantillon

22/27

(23)

Partie 2 : revue de la litt´erature r´ecente

23/27

(24)

Articles

I Lee R, Ranaldi J, Cumming M et al (2011) Given the increasing bias in random digit dial sampling, could respondent-driven sampling be a practical alternative?

Annals of Epidemiology, 21, 272-279.

I Rudolph AE, Crawford ND, Latkin C et al (2011) Subpopulations of illicit drug users reached by targeted street outreach and respondent-driven sampling strategies: implications for research and public health practice. Annals of Epidemiology, 21, 280-289.

I Gile K. (2011) Improved inference for respondent-driven sampling data with application to HIV prevalence estimation. JASA, 106,135-146

I McCreesh N, Frost SDW, Seeley J et al (2012) Evaluation of respondent-driven sampling. Epidemiology, 23, 138-147.

I Salganik MJ (2012) Respondent-driven sampling in the real world.

Epidemiology, 23, 148-150.

I Lu, Bengtsson, Britton et al (2012) The sensitivity of respondent-driven sampling method. JRSS A,175, 191-216.

24/27

(25)

Conclusion (1)

I L’´ echantillonnage sans remise (utilis´ e en pratique) diminue l’effet plan !

I L’EAM, la variance et l’effet plan diminuent lorsque l’interconnexion du r´ eseau augmente

I Le nombre de coupons a un effet certain sur la variance et l’EAM qui augmentent lorsque le nombre de coupons augmente (lorsqu’il y a moins de coupons, pour une taille d’´ echantillon donn´ ee, les chaˆınes de recrutement sont plus longues et on tend vers un ´ echantillon plus repr´ esentatif de la population - on casse des sous-groupes homog` enes)

25/27

(26)

Conclusion (2)

I Pas d’impact majeur lorsqu’il y a des erreurs dans le report de la taille du r´ eseau ou dans le recrutement si cela est

ind´ ependant de la variable d’int´ erˆ et.

I En revanche s’il y a une corr´ elation entre variable d’int´ erˆ et et recrutement on peut observer des biais et des EAM tr` es importants.

I Si les r´ epondants pr´ ef` erent distribuer les coupons ` a leurs amis proches on peut observer des erreurs plus importantes.

26/27

(27)

Conclusion

”L’´ echantillonnage d´ etermin´ e selon les r´ epondants doit ˆ etre vu comme une forme (potentiellement sup´ erieure) d’´ echantillonnage de convenance et des pr´ ecautions sont n´ ecessaires lorsque l’on interpr` ete les r´ esultats bas´ es sur cette m´ ethode d’´ echantillonnage”

McCreesh et al (2012) Evaluation of respondent-driven sampling.

Epidemiology, 23, 138-147.

27/27

L’échantillonnage déterminé selon les répondants peut-il être utilisé pour les enquêtes épidémiologiques auprès de populations difficiles d’accès ou cachées ?