• Aucun résultat trouvé

VALIDITY AND RELIABILITY OF SELF-REPORTED MEASURES OF

Résumé

L’objectif de cette revue systématique était d’évaluer les preuves de validité et de fidélité concernant les mesures auto-déclarées de l’alimentation durant la grossesse. Les questionnaires de fréquence alimentaire (FFQ) avaient des preuves de validité acceptables lorsqu’ils étaient comparés à des biomarqueurs (͞r : 0,04-0,58; k=19), à des rappels de 24 heures (͞r : 0,12-0,63; k=11) et à des journaux alimentaires (͞r : 0,28-0,65; k=12). L’histoire alimentaire (͞r : 0,07-0,47; k=7) et les journaux alimentaires (͞r : 0,25-0,53; k=7) avaient des preuves de validité acceptables lorsque comparés à des biomarqueurs. Les rappels de 24 heures avaient de faibles preuves de validité lorsqu’ils étaient comparés à des biomarqueurs. Les preuves de fidélité étaient bonnes pour les FFQ, acceptables pour l’histoire alimentaire et non-concluantes pour les rappels de 24 heures. Les résultats suggèrent que les FFQ et les journaux alimentaires possèdent les plus fortes preuves de validité pour mesurer l’alimentation durant la grossesse.

68 Abstract

This systematic review aims to critically appraise evidence on validity and reliability of self- reported measures of foods and nutrients in pregnancy. PubMed and EMBASE were investigated. Fifty-four studies were included. Food-frequency questionnaires had acceptable evidence of validity when compared with biomarkers (͞r between 0.04 and 0.58; k=19), 24- hour recalls (͞r between 0.12 and 0.63; k=11) and food records: (͞r between 0.28 and 0.65; k=12). Dietary history (͞r between 0.07 and 0.47; k=7) and food records (͞r between 0.25 and 0.53; k=7) had acceptable evidence of validity when compared with biomarkers. 24-hour recalls had poor evidence of validity against biomarkers. Evidence on reliability was good for food-frequency questionnaires, acceptable for the dietary history and inconclusive for 24- hour recalls. The results suggest that food-frequency questionnaires and food records have the strongest evidence of validity when assessing nutrition during pregnancy and more studies are needed to validate 24-hour recalls and the dietary history.

69 Introduction

Pregnancy is a special moment during which many physiological changes occur such as an increase in the woman’s nutritional needs and the possibility of experiencing nausea1.

Moreover, there is evidence that social desirability can affect the amount of food that women report consuming2. While there is evidence on the validity of different self-reported measures

of diet among the general population3-5, including for food-frequency questionnaires (FFQ)6,7

and 24-hour recalls8, this does not seem to be the case for pregnant women. In fact, to our

knowledge, only two reviews were conducted among pregnant women. The first review reported information on the validation of dietary assessment methods for micronutrient intake in pregnant women9 while the second reported the relationship between zinc intake

and serum/plasma zinc concentrations in pregnant and lactating women10. Notwithstanding

the very useful information provided by those reviews, they only reported information on micronutrients and validity. The objective of the present systematic review was thus to fill this gap in the literature by critically appraising evidence on validity and reliability of self- reported measures of foods and nutrients in pregnancy.

Methods

Study Eligibility Criteria

To be included in the systematic review, studies had to report proofs of validity and/or reliability for a self-reported measure of nutrition in pregnancy (see Table 1 for definitions of the different types of validity and reliability). Studies reporting validation of various biomarkers were not included given that they are not self-reported measures of nutrition, but studies reporting validation of a self-reported measure compared to biomarkers were included in the review. Studies reporting validation of dietary patterns, dietary indexes or dietary scores were excluded. Studies that reported validation of self-reported measures for a specific beverage, such as alcohol and coffee, were not included, but studies on a specific food such as fish or fruits and vegetables were included in the review. Studies on the use of supplements, such as iron and folic acid, were excluded. Studies on pregnant teenagers or on women with eating disorders were excluded. Finally, studies that reported pre-pregnancy or post-pregnancy intake were not included in the review.

70 Search Strategy

The following databases were investigated: MEDLINE/PubMed (1950+) and EMBASE (1974+). No restriction was placed on the year of publication of the articles. The search was performed by LAVI on February 3, 2014 and it included articles published until January 31, 2014. In MEDLINE/PubMed, a combination of keywords and MeSH terms was used. In EMBASE, only Emtree terms were used (see Table 2 for a complete list of keywords). The search was limited to articles published in English. Additional studies were included by checking the references of the articles included in the systematic review (i.e., secondary references).

Study Selection and Data Extraction

All of the articles were first screened by LAVI according to their title and abstract. Clearly irrelevant articles were excluded. The remaining articles were fully retrieved (full-text) and the two authors independently assessed them for eligibility. Disagreements were resolved by discussion.

Data were independently extracted by the two authors using a standardized data extraction form that was pre-tested on a random sample of five articles included in the review. Types of validity were classified according to the new Standards for Educational and Psychological Testing11. Quality of the studies was assessed using the following four criteria: 1) whether

both the tool to be validated and the comparison tool assessed nutrition in the same trimester of pregnancy given there is evidence of variations in food preferences and intake during pregnancy12, 2) whether cross-sectional studies had a response rate ≥ 60%13and longitudinal

studies had an attrition at follow-up ≤ 20%14, 3) whether cross-sectional studies compared its

respondents to non-respondents for main socio-demographic characteristics (e.g., age, body mass index, socio-economic status) and longitudinal studies compared its completers to its dropouts for main socio-demographic characteristics, and 4) whether a study reported both crude and energy-adjusted correlations15.

71 Data Analyses

Descriptive statistical analyses were performed using SAS version 9.3 (SAS Institute, Cary, NC, USA). An overall effect size ( ͞r ) was computed for studies reporting multiple correlations, such as those reporting separate correlations for each nutrient, using the formula for the Hunter-Schmidt Method16,17 (see Supplementary file 2 for the formula). For studies

reporting information on validity, correlations were classified according to Cohen’s criteria18

that is, a correlation of 0.10 is considered a small effect size, 0.30 a medium effect size and 0.50 a large effect size. For studies reporting information on reliability, intra-class correlations (ICC)19 were classified according to Fermanian’s criteria20 that is, an ICC

between 0 and 0.30 is considered very bad or null, between 0.31 and 0.50 mediocre, between 0.51 and 0.70 moderate, between 0.71 and 0.90 good and over 0.91 very good. For studies reporting kappa statistics, they were classified according to Landis & Koch’s criteria21 that

is, a kappa statistic between 0 and 0.20 is considered slight, between 0.21 and 0.40 fair, between 0.41 and 0.60 moderate, between 0.61 and 0.80 substantial and between 0.81 and 1 almost perfect agreement.

Results

The results of the search strategy are presented in the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow-chart22 in Figure 1. A total of 54

studies were included in the review. In the rest of the text, the letter k will be used to represent the number of studies—not to be confused with the κ used to represent a kappa statistic— and the letter s to represent the number of samples.

Characteristics of the Studies

A summary of the studies is presented in Table 3. Of the 54 studies included in the systematic review, 24 were conducted in North America (United States: 1823-40, Canada: 541-45 and

Mexico: 146), 16 in Europe (United Kingdom: 347-49, Denmark: 350-52, Norway: 253,54;

Portugal: 255,56, Spain: 257,58, Belgium: 159, Finland: 160, Slovenia: 161, The Netherlands: 162),

8 in Asia (Japan: 463-66, China: 167, Indonesia: 168, Israel: 169, Malaysia: 170), 3 in South

America (Brazil71-73), 2 in Oceania (Australia74,75) and 1 in Africa (Nigeria76). Nineteen

72

studies24,25,28,31,32,34,37,40,42,43,45,47,50,51,54-56,58-61,67,68,71,72,74,76 were longitudinal and 9

studies36,49,63-66,70,73 were both cross-sectional and longitudinal.

Quality of the Studies

Almost all of the studies24-30,32-36,39-46,48,52,53,55-66,68-71,73-76 (k = 43) had nutritional information

for the same period of pregnancy covered by the tool to be validated and the comparison tool (see Table 4). Five studies23,31,37,38,49 did not report this information and the remaining six

studies47,50,51,54,67,72 compared dietary intake with tools covering different trimesters of

pregnancy. Twenty-five studies24,29,32,38,39,43,45-48,50-54,58,60,62-64,66-68,70,76 had a response rate ≥

60% or an attrition at follow-up ≤ 20%. Fourteen studies23,26,30,33,35,37,40,41,44,49,55,57,69,75 did not

report this information and the remaining 15 studies25,27,28,31,34,36,42,56,59,61,65,71-74 had lower

response rates or higher attritions at follow-up. Only 12 studies24,25,27,29,33,36,45,50,51,53,56,68

compared whether respondents or completers differed from non-respondents or dropouts. The other studies either did not report this information23,26,30,37,40,41,44,57 (k = 8) or did not

verify this information in their sample28,31,32,34,35,38,39,42,43,46-49,52,54,55,58-67,69-76 (k = 34). Finally,

25 studies25,27,28,36,38,41,46,48,50-54,56,58,60,63-68,70,71,73 reported both crude and energy-adjusted

correlations, 26 studies23,26,29-35,37,40,42-45,47,49,55,57,59,61,62,72,74-76 only reported crude

correlations, 2 studies39,69 did not report this information given their results were not

presented in terms of correlations and 1 study24 reported only energy-adjusted correlations.

Characteristics of the Participants

The mean age of participants at baseline was provided in 36 samples23,26,27,30,31,33,37,40- 43,45,47,48,53,54,56,57,60,62-66,68,70-73,75. The pooled mean age of the pregnant women in those

samples was 29.6±3.4 years (range: 21.7 to 35.3). Twenty-six samples27,31,41,47,48,53-56,59,60,63- 67,70-72 provided information on the pre-pregnancy body mass index (BMI) of their

participants or the information needed to calculate it (i.e., mean weight and height). The pooled mean BMI of the participants in those samples was 22.3±1.7 kg/m2 (range: 20.2 to

25.3), indicating a mean normal weight at baseline. Twenty-two samples32,42,46,47,57,59,61- 66,71,72,74,76 were comprised of healthy women and one sample37 was comprised of obese

women (≥ 20 lbs. above ideal pre-pregnancy weight). One study41 reported information on

73

non-diabetic and insulin-dependent women. Thirty samples30,32,39,41-43,45,47-52,57,61-66,70-73,75

specified baseline gestational age. The pooled mean number of weeks of gestation for the women was 22.5±8.0 weeks (range: 11.0 to 40.0), indicating that the majority of women were in their second trimester of pregnancy at the first assessment of dietary intake.

Evidence on Validity

Sixty-six samples23,24,26,27,29,30,32-36,38-67,69-76 provided information on criterion-related validity

(i.e., validity in relation to another tool) (see Table 5). This number is higher than the number of studies included (k = 54) given that some studies reported validation of a nutritional assessment tool against several other tools (e.g., FFQ vs. 24-hour recalls and biomarkers) and this is also why the results are reported in terms of number of samples. Thirty-six samples reported validity of self-reported intakes during pregnancy only for nutrients23,26,30,40- 42,44,46,51,52,55,57,59,62-66,72,74-76 (s = 26) or for energy and nutrients24,27,32,35,36,38,45,51,56,71 (s = 10)

with a mean number of 5.2±7.2 nutrients (range: 1 to 24). One sample69 reported validity for

foods only, 5 samples50,54 for fruits and vegetables consumption, 6 samples29,33,34,39,43,61 for

fish consumption, 4 samples73 for consumption of 51 polyphenol-rich foods and 1 sample49

for consumption of 10 main food allergens. Eleven samples37,47,48,53,60,67,70 reported validity

for both foods and nutrients and 2 samples54,58 for fruits and vegetables and nutrients.

Food-Frequency Questionnaire vs. Biomarkers. Among the 42 samples24,27,30,32-36,38,39,41-43,45- 51,53-56,58-61,67,69-73 reporting validation of a FFQ, 19 samples30,33,34,39,41-43,46,48,50,51,53- 55,58,61,69,72,73 reported information on an association with biomarkers, making it the category

for which there is the most evidence on validity. Seventeen samples30,33,34,41-43,46,48,50,51,53- 55,58,61,72,73 reported a total of 27 correlations between dietary intakes obtained from a FFQ

and biomarkers. The majority of correlations indicated either small-to-medium (͞r between 0.12 and 0.30; n = 18) or medium-to-large (͞r between 0.31 and 0.47; n = 5) effect sizes. The studies of Innis et al.42 and of Brantsaeter et al.53 had large effect sizes (͞r = 0.54 and 0.58).

Only two correlations—between a FFQ measuring fatty acids consumption and a gluteal subcutaneous adipose tissue sample55 and between a FFQ measuring omega-3 and omega-6

PUFA and maternal erythrocyte PUFA concentration30—were below a small effect size (͞r =

74

between the frequency of foods (bread, milk, meat, fish, eggs, fruits, vegetables, fruits or vegetable juice, etc.) estimated by the FFQ and hemoglobin levels. The study of Williams et al.39 also found a significant (p < 0.001) association between the frequency of fish

consumption estimated by the FFQ and erythrocyte fatty acid levels.

Among the samples that reported both crude and energy-adjusted correlations, energy- adjusted correlations were higher for all of them50,51,58, except for one sample41 where both

correlations were identical. In one study58, the correlations with biomarkers were higher

when assessing fruits and vegetables compared to assessing nutrients. One study54 reported

a slightly higher correlation with biomarkers among women not taking supplements, while another study58 found higher correlations when assessing diet and supplements compared to

nutrients only. Finally, another study53 reported a higher correlation between dietary intakes

obtained from a FFQ and biomarkers when the sample was comprised only of women without nausea.

Food-Frequency Questionnaire vs. 24-Hour Recalls. Eleven samples24,32,35,36,38,47,67,70-73

reported a total of 20 correlations between dietary intakes assessed by a FFQ and 24-hour recalls. The majority of correlations indicated either small-to-medium (͞r between 0.12 and 0.29; n = 6) or medium-to-large effect sizes (͞r between 0.31 and 0.46; n = 10). The studies of Vian et al.73, of Mouratidou et al.47 and of Cheng et al.67 had four correlations that

represented large effect sizes (͞r between 0.51 and 0.63). Among the samples that reported both crude and energy-adjusted correlations, energy-adjusted correlations were lower for all of them38,67,70,71,73, except for one study36 where the correlation was higher when adjusted for

energy intake. The range of correlations for assessment of energy intake was from 0.06 to 0.68. One study70 observed a higher crude correlation when assessing intake of nutrients

compared to foods, but this difference disappeared once energy-adjusted. However, another study47 observed a higher correlation when assessing intake of foods compared to nutrients.

Food-frequency questionnaire vs. food records (weighted and estimated). Twelve samples27,45,48-51,53,54,56,59,60,73 reported a total of 35 correlations between dietary intakes

assessed by a FFQ and food records either weighted or estimated. The vast majority of correlations represented medium-to-large effect sizes (͞r between 0.32 and 0.48; n = 30). The

75

studies of Vian et al.73, of Mikkelsen et al.50 and of De Vriese et al.59 presented four

correlations indicating large effect sizes (͞r between 0.52 and 0.65). One sample had a small- to-medium effect size (͞r = 0.28). The study of Venter et al.49 reported a kappa statistic, which

represented a fair strength of agreement (κ = 0.40) between a FFQ and a food record.

Among the samples that reported both crude and energy-adjusted correlations, the majority of them27,48,50,51,53,54,60 had higher correlations when adjusted for energy intake, except for

three samples56,60,73 where the correlations were lower when energy-adjusted and for two

samples53 where the correlations were identical. The range of correlations for assessment of

energy intake was from 0.19 to 0.49. One study54 had similar correlations for fruits and

vegetables and for nutrients from diet only. Another study60 had identical correlations for

foods and nutrients once they were adjusted for energy intake. In three different studies48,51,53,

crude and energy-adjusted correlations for nutrients were always higher when they included the use of supplements. Finally, one study53 also reported higher correlations among women

without nausea.

Dietary History vs. Biomarkers. Among the 8 samples23,37,40,62-66 reporting validation of a

dietary history, 7 samples23,40,62-66 reported information on an association with biomarkers

for a total of 19 correlations. The large majority of correlations indicated small-to-medium effect sizes (͞r between 0.12 and 0.30; n = 13). Four correlations represented medium-to-large effect sizes (͞r between 0.31 and 0.47) and two correlations were below a small effect size (͞r = 0.07 and 0.10). Among the samples that reported both crude and energy-adjusted correlations, all of them64-66 had higher correlations when adjusted for energy intake, except

for one sample63 where the correlations were lower when energy-adjusted. Correlations

between a dietary history and biomarkers were also higher when only women without nausea were included63,65,66. The study of van den Berg et al.37 reported information on the

association between a dietary history and a food record and observed that the mean values for foods and nutrients estimated by the dietary history were significantly different (p < 0.001) from those estimated by the food record, except for milk (p = 0.90), and meat and fish (p = 0.23).

76

24-hour Recalls vs. Biomarkers. Four samples26,72,73,76 reported a total of five correlations

between dietary intakes obtained by 24-hour recalls and biomarkers. The correlations either represented small-to-medium effect sizes (͞r = 0.19 and 0.22) or were below a small effect size (͞r = 0.06). One study73 reported both crude and energy-adjusted correlations, which were

identical.

Food Records vs. Biomarkers. Seven samples48,50,51,53-55,74 reported a total of 10 correlations

between dietary intakes estimated from food records and biomarkers. Most of them are considered as medium-to-large effect sizes (͞r between 0.35 and 0.44; n = 6). Two correlations represented small-to-medium effect sizes (͞r = 0.25 and 0.28). The studies of Brown et al.74

and of Brantsaeter et al.54 had two correlations indicating large effect sizes (͞r = 0.51 and

0.53). In one study50, adjustment for energy intake resulted in a slightly higher correlation

while in another study51 this adjustment lowered the correlation between dietary intakes

assessed from food records and biomarkers. In one study54, the correlation was lower when

only women not taking supplements were analyzed. The study of Moscovitch et al.44 reported

a correlation between dietary intakes from an estimated food record and the measured food content of diet, which indicated a large effect size (͞r = 0.73).

Other Types of Dietary Assessments. Two samples52,57 reported a total of three correlations

between dietary intakes assessed by interview and biomarkers. They both assessed fatty acids intake and had small-to-medium effect sizes (͞r between 0.16 and 0.20). One study52 had a

slightly higher correlation when adjusted for energy intake. The study of Dar et al.29 reported

a correlation between a questionnaire on fish consumption and serum polychlorinated biphenyl, and observed a large effect size (͞r = 0.67). Finally, the study of Zhou et al.75

reported correlations between an iron checklist and a dietary history, and observed large effect sizes (͞r = 0.69 and 0.99). In this latter study75, including the use of supplements greatly

77 Evidence on Reliability

Twenty-one studies24,25,28,31,36,42,49,56,58-60,63-68,70,72,73 provided information on temporal

stability, a type of reliability (see Table 6). Fifteen studies assessed the reliability of nutritional assessment tools for nutrients only28,42,59,64,72 (k = 5) or energy and

nutrients24,25,31,36,56,63,65,66,68 (s = 10) during pregnancy with a mean number of 10.2±9.8

nutrients (range: 1 to 29). One study73 assessed the reliability of a self-reported measure for

the consumption of 51 polyphenol-rich foods, another study49 for the consumption of 10 main

food allergens, and 4 studies58,60,67,70 for both foods and nutrients.

Food-Frequency Questionnaire. Fourteen studies25,28,31,36,42,49,56,58-60,67,70,72,73 reported

information on the temporal stability of a FFQ. The period of time between each administration varied between 2 weeks to 7 years. It is worth mentioning that in the study28

for which the period of time varied up to 7 years, the women had to recall their diet during a past pregnancy—making it different from the other studies reporting information on reliability. Five studies25,60,70,72,73 reported a total of 10 intra-class correlations (ICC) for FFQ.

Five ICC were moderate (between 0.51 and 0.66), three ICC were good (0.73, 0.86 and 0.88) and two ICC were mediocre (0.41 and 0.47). One study25 reported both crude and energy-

adjusted ICC, with ICC lower when adjusted for energy intake. The range of ICC for energy intake was 0.63 to 0.93. One study25 also reported fairly lower ICC when assessing nutrients

with the use of supplements.

Eight studies28,31,36,42,56,58,59,67 reported a total of 15 correlations between the two

administrations of a FFQ among the same individuals. Seven correlations indicated large (͞r between 0.51 and 0.87), seven correlations, medium-to-large (͞r between 0.34 and 0.49) and one correlation, a small-to-medium effect size (͞r = 0.29). Among the studies that reported both crude and energy-adjusted correlations, energy-adjusted correlations were lower for all of them36,56,58,67, except for one study where the adjustment slightly increased the

78

Dietary History. Four studies63-66 reported information on the temporal stability of a dietary

history measure. The period of time between each administration varied between 4 and 5 weeks. Three studies had moderate ICC (0.62, 0.64 and 0.68) and one study had a good ICC (0.71). The ICC for energy intake was 0.60.

24-Hour Recalls. Three studies24,31,68 reported information on the temporal stability of 24-

hour recalls for assessing nutrients (range: 8 to 24). The period of time between each administration varied between 7 to 37 days. The study of Baer et al.24 had an energy-adjusted

correlation (͞r = 0.33) that represented a medium-to-large effect size, while the study of Forsythe et al.31 had a correlation (͞r = 0.04) indicating an effect below what is considered a

small effect size. The correlation for energy intake was 0.04. The study of Persson et al.68

Documents relatifs