Performance of a food-frequency questionnaire in the US NIH–AARP (National Institutes of Health–American Association of Retired Persons) Diet and Health Study

Frances E Thompson; Victor Kipnis; Douglas Midthune; Laurence S Freedman; Raymond J Carroll; Amy F Subar; Charles C Brown; Matthew S Butcher; Traci Mouw; Michael Leitzmann; Arthur Schatzkin

doi:10.1017/S1368980007000419

Performance of a food-frequency questionnaire in the US NIH–AARP (National Institutes of Health–American Association of Retired Persons) Diet and Health Study

Published online by Cambridge University Press: 01 February 2008

Frances E Thompson ,

Victor Kipnis ,

Douglas Midthune ,

Laurence S Freedman ,

Traci Mouw and

Frances E Thompson*: Affiliation:
US National Cancer Institute, EPN 4016, 9000 Rockville Pike, Bethesda, MD 20893-7344, USA
Victor Kipnis: Affiliation:
US National Cancer Institute, EPN 4016, 9000 Rockville Pike, Bethesda, MD 20893-7344, USA
Douglas Midthune: Affiliation:
US National Cancer Institute, EPN 4016, 9000 Rockville Pike, Bethesda, MD 20893-7344, USA
Laurence S Freedman: Affiliation:
Gertner Institute for Epidemiology, Tel Hashomer, Israel
Raymond J Carroll: Affiliation:
Department of Statistics, Texas A&M University, College Station, TX, USA
Amy F Subar: Affiliation:
US National Cancer Institute, EPN 4016, 9000 Rockville Pike, Bethesda, MD 20893-7344, USA
Charles C Brown: Affiliation:
US National Cancer Institute, EPN 4016, 9000 Rockville Pike, Bethesda, MD 20893-7344, USA
Matthew S Butcher: Affiliation:
Information Management Services, Inc., Silver Spring, MD, USA
Traci Mouw: Affiliation:
US National Cancer Institute, EPN 4016, 9000 Rockville Pike, Bethesda, MD 20893-7344, USA
Michael Leitzmann: Affiliation:
US National Cancer Institute, EPN 4016, 9000 Rockville Pike, Bethesda, MD 20893-7344, USA
Arthur Schatzkin: Affiliation:
US National Cancer Institute, EPN 4016, 9000 Rockville Pike, Bethesda, MD 20893-7344, USA
*: Corresponding author: Email thompsof@mail.nih.gov

Article contents

Abstract
Objective
Design
Setting
Subjects
Results
Conclusion
Methods
Results
Discussion
References

Rights & Permissions

Abstract

Objective

We evaluated the performance of the food-frequency questionnaire (FFQ) administered to participants in the US NIH–AARP (National Institutes of Health–American Association of Retired Persons) Diet and Health Study, a cohort of 566 404 persons living in the USA and aged 50–71 years at baseline in 1995.

Design

The 124-item FFQ was evaluated within a measurement error model using two non-consecutive 24-hour dietary recalls (24HRs) as the reference.

Setting

Participants were from six states (California, Florida, Pennsylvania, New Jersey, North Carolina and Louisiana) and two metropolitan areas (Atlanta, Georgia and Detroit, Michigan).

Subjects

A subgroup of the cohort consisting of 2053 individuals.

Results

For the 26 nutrient constituents examined, estimated correlations with true intake (not energy-adjusted) ranged from 0.22 to 0.67, and attenuation factors ranged from 0.15 to 0.49. When adjusted for reported energy intake, performance improved; estimated correlations with true intake ranged from 0.36 to 0.76, and attenuation factors ranged from 0.24 to 0.68. These results compare favourably with those from other large prospective studies. However, previous biomarker-based studies suggest that, due to correlation of errors in FFQs and self-report reference instruments such as the 24HR, the correlations and attenuation factors observed in most calibration studies, including ours, tend to overestimate FFQ performance.

Conclusion

The performance of the FFQ in the NIH–AARP Diet and Health Study, in conjunction with the study’s large sample size and wide range of dietary intake, is likely to allow detection of moderate (≥1.8) relative risks between many energy-adjusted nutrients and common cancers.

Keywords

Diet Epidemiological methods Validation study Cancer Questionnaires

Type: Research Paper
Information: Public Health Nutrition , Volume 11 , Issue 2 , February 2008 , pp. 183 - 195

DOI: https://doi.org/10.1017/S1368980007000419 [Opens in a new window]
Copyright: Copyright © The Authors 2007

In 1995 the National Institutes of Health (NIH) and the AARP, formerly the American Association of Retired Persons, initiated a large prospective cohort study, the NIH–AARP Diet and Health Study, to study relationships between diet and cancer. The study population consisted of members of AARP in six states (California, Florida, Pennsylvania, New Jersey, North Carolina and Louisiana) and two metropolitan areas (Atlanta, Georgia and Detroit, Michigan) who were 50–71 years of age at baseline.

One criticism of some cohort studies is the lack of heterogeneity in reported diet. In order to address this problem, the original design of the NIH–AARP Diet and Health Study called for a two-stage sampling frame: collection of baseline food intake data for a large pool of respondents, and selection into the cohort based on these data, with over-sampling in extreme categories of dietary intakeReference Schatzkin, Subar, Thompson, Harlan, Tangrea and Hollenbeck¹. Ultimately, this two-stage plan was abandoned, as two unanticipated factors emerged. First, the actual response rate to the baseline questionnaire (17.6%) was lower than the anticipated response (33.0%); second, the range of intake observed was greater than expected. Thus, the decision to include as subjects the entire baseline population led to statistical power approaching that expected for the two-stage designReference Schatzkin, Subar, Thompson, Harlan, Tangrea and Hollenbeck¹.

Baseline dietary intake was assessed by a food-frequency questionnaire (FFQ). Measurement error, both random and systematic, in the FFQ method may be substantialReference Plummer and Clayton²^, Reference Plummer and Clayton³, generally leading to bias in observed relative risks and loss of power to detect diet–disease relationshipsReference Freudenheim and Marshall⁴^, Reference Freedman, Schatzkin and Wax⁵. Thus, a calibration sub-study was carried out in approximately 2000 subjects, with two 24-hour dietary recalls (24HRs) as the reference instrument. The main purpose was to evaluate the performance of the FFQ instrument in this study and to correct estimated diet–disease relationships for measurement error. We present here validation results for 24 nutrients and for servings of fruits and vegetables. Evaluation of the FFQ-reported intake for most other food groups requires a different methodology due to a non-trivial probability of zero reported intake on any given 24HR and will be presented in a separate paper.

Methods

Design

The design of the NIH–AARP Diet and Health Study is described in detail elsewhereReference Schatzkin, Subar, Thompson, Harlan, Tangrea and Hollenbeck¹^, ⁶. Briefly, a baseline questionnaire which included a 124-item FFQ was mailed to 3.5 million members of AARP, in three waves. The first wave of 250 000 questionnaires was mailed in October 1995. The calibration sub-study participants were selected from the 46 970 subjects who had responded to the first wave as of January 1996. For purposes of defining the eligible sample for selection into the calibration sub-study, we excluded individuals for a variety of reasons⁶ leaving a pool of 38 691 eligible for selection into the calibration study.

The calibration study was designed to match the original two-stage design of the entire cohort study. A stratified sampling design using FFQ-derived estimates of intake of percentage energy from fat, fruit and vegetables, fibre and red meat to form strata was implemented⁶. For each of these dietary variables, five categories of intake were specified for each gender, consisting of approximately 10% in each of the lowest and highest categories, with the remaining 80% distributed equally among the middle three categories. The five categories and four dietary variables were used to partition eligible respondents into 20 strata. Two additional strata were added to reflect high alcohol intakes (>90 g per day). A combination of fixed and random sampling within strata was implemented to ensure approximate representation from each gender, enhance minority inclusion and increase dietary heterogeneity for these four dietary variables. Extreme intakes (strata 1 and 5) on any of the four variables were sampled at the highest rate; proportionately fewer were sampled from strata 2 and 4; and even fewer were sampled from stratum 3.

In order to attain a final sample of 2000, a pool of 6150 individuals was chosen, and telephone numbers for 3901 individuals obtained. Recruitment calls were attempted on 3647 and 2923 were reached; 128 were excluded because they could not be located, had language problems, or were too ill to participate. Of the 2795 individuals invited to participate, 2055 agreed to participate. Because of changes in an FFQ exclusion criterion (exclude if >10 rather than >40 scanning errors), two participants were excluded subsequently. Trained interviewers administered two non-consecutive, unannounced 24HRs by telephone, randomly assigned by day of the week. The two recalls were well separated in time (a median of 21 days apart and 75% separated by 14 or more days) from March to September 1996. Of the 2053 participants who completed the first 24HR, 1986 provided a second 24HR. All 2053 were mailed a second FFQ, similar to the baseline instrument, in October 1996; 1445 were returned. The lack of temporal agreement between the recalls and the FFQs may create error, although there is little evidence to suggest that nutrient intakes vary dramatically from season to season.

For these analyses, we excluded 100 individuals from the calibration sub-study, consistent with exclusions made on the overall baseline cohort, for the following reasons: subsequent drop outs (n = 78); pre-baseline registry reports of cancer (n = 9); and death-only reports of cancer (n = 13). Thus, the analysed calibration sample consisted of 1953 individuals (987 men, 966 women) with at least the baseline FFQ and one 24HR. Data collection for the NIH–AARP Diet and Health Study was approved by the National Cancer Institute (NCI) and the Westat, Inc. institutional review boards.

Study instruments

The FFQ used in the NIH–AARP Diet and Health Study was an early version of the Diet History Questionnaire (DHQ) developed at the NCI⁷. Frequency responses were asked for 124 food items; portion sizes for 116. An additional 21 questions asked about specific food choices and cooking practices. The US Department of Agriculture’s (USDA) Continuing Survey of Food Intakes of Individuals (CSFII) survey databases (1989–91 initially, and 1994–96 as it became available) were used to develop a nutrient composition database for the FFQReference Subar, Midthune, Kulldorff, Brown, Thompson and Kipnis⁸. Estimates of individual intake were calculated for 29 nutrients and 30 food groups.

In the 24HR interviews, participants were asked to report all foods and beverages consumed on the day before the interview. Interviewers used a Food Probe List containing standardised probes specific to foods in over 100 food categories. Data were coded using the Food Intake Analysis System (FIAS), version 2.3, developed at the University of Texas; the same nutrient composition database is used for both FIAS and USDA’s CSFII. Data checks were performed on reports with extremely high values for fat, total energy and total fruit and vegetable intakes, and corrections were made to the data when extreme values were due to coding errors.

Statistical methods

Two parameters are important for evaluating the performance of a dietary assessment instrument (Q), in this case, the FFQ: the Pearson correlation coefficient ρ_Q,T between reported and true intake, often referred to as the validity coefficient, and the attenuation factor λ_Q, which is the slope in the regression of true intake on reported intake. If an FFQ-measured exposure is categorised into quantiles, the correlation ρ_Q,T determines the extent of misclassification due to measurement errorReference Walker and Blettner⁹. In a univariate model relating dietary exposure to disease, the observed relative risk for a given change in the exposure is equal to the true relative risk raised to power λ_Q. In dietary studies, λ_Q is usually confined between zero and one, therefore attenuating (biasing toward one) true relative risks. An attenuation factor close to zero indicates severe attenuation and an attenuation factor close to one signals very little attenuation. For example, the true relative risk of 2 for a given change in the exposure would be observed as if λ_Q=0.5 and only if λ_Q=0.1. In addition, the correlation with true intake is related to statistical power to detect diet–disease relationships. Compared with using true intake, the required sample size to reach the desired statistical power should be increased by the inverse of the squared correlation with true intake. For example, if , the required sample size should be increased 1/0.5²=4 times to achieve the nominal power.

Ideally, one would like to measure true intake in the calibration study to estimate the correlation with true intake and attenuation factor. Unfortunately, diet cannot be measured precisely in free-living populations, either using dietary assessment instruments or biomarkers of intake. Estimation of both parameters in this situation can still be performed using statistical modelling but requires administration of a reference instrument. The reference instrument does not have to measure intake perfectly and may contain error of its own, but this error is required to be (a) independent of true intake and (b) independent of error in the FFQReference Kipnis, Midthune, Freedman, Bingham, Schatzkin and Subar¹⁰. Throughout this paper we shall refer to these two conditions as defining a correct reference instrument. Under conditions (a) and (b), the correlation with true intake is equal to the correlation between the FFQ and reference instrument adjusted for within-person random error in the reference measureReference Rosner and Willett¹¹. Its estimation in the calibration study requires that the reference instrument be administered at least twice. The attenuation factor is equal to the slope of the regression of the reference measure on the FFQ, and its estimation requires only one administration of the reference instrumentReference Rosner, Willett and Spiegelman¹².

In the literature, the correlation with true intake and the attenuation factor are usually estimated by applying the method of moments to those individuals in the calibration study who have fully completed the FFQ and necessary reference measurementsReference Spiegelman, Schneeweiss and McDermott¹³^–Reference Wong, Day and Wareham¹⁵. Alternatively, both parameters can be estimated by fitting a measurement error model for the FFQ and reference measurements to all participants of the calibration study using the method of maximum likelihoodReference Kipnis, Midthune, Freedman, Bingham, Schatzkin and Subar¹⁰. Under a multivariate normal distribution, the second approach produces the same results as the method of moments if there are no missing data, but otherwise leads to more efficient estimation. We adopted the latter approach in the present study using the general measurement error model for the FFQ that was suggested by Kipnis et al.Reference Kipnis, Midthune, Freedman, Bingham, Schatzkin and Subar¹⁰ (see Appendix B for a more detailed explanation).

In the NIH–AARP calibration study, the reference instrument was the 24HR. The distributions of most nutrient intakes derived from the 24HRs and FFQs were typically skewed and contained some extreme values that might overly influence parameter estimation. Therefore, the measurement error model was fit after first removing outliers and transforming intakes to approximate normality using the Box–Cox transformation familyReference Box and Cox¹⁶ as explained in detail in Appendix A. Besides normalising the data, the transformation also aimed to improve properties of the 24HR as a reference instrument by making its within-person variation constant and independent of its between-person variation. For FFQ-reported data, the number of removed outliers ranged from one for protein and cholesterol to 24 for vitamin C. For 24HR data, the number of removed outliers ranged from three for saturated fat and vitamin C to 56 for vitamin E.

We estimated correlations with true intake and attenuation factors for both absolute and energy-adjusted nutrient intakes, using the residual energy-adjustment methodReference Willett¹⁷. Due to substantial day-to-day variation in the 24HR, nutrient residuals calculated from its repeat administrations may have substantially correlated within-person random errors, therefore violating requirements for a correct reference measure, even if those requirements were satisfied for the absolute nutrient and total energy intakes. To address this problem, we estimated correlations and attenuation factors for a nutrient residual by fitting simultaneous measurement error modelsReference Carroll, Midthune, Freedman and Kipnis¹⁸ to the nutrient of interest and total energy intakes reported on the FFQ and 24HR, as outlined in Appendix B. The parameters of the measurement error model were estimated using the SAS Mixed procedure¹⁹.

As described above, in AARP the calibration study subjects were selected using stratified sampling; i.e. subjects from the first wave in the main study were selected into the calibration study with probabilities based on their reported intakes on the baseline FFQ for the five strata-determining dietary variables (percentage energy from fat, dietary fibre, fruit and vegetables, red meat and alcohol). Since the missing mechanism depends only on the value of the baseline FFQ, and not on the unobserved 24HR and second FFQ, the data are missing at random. The maximum likelihood estimator ignores missing data as long as they are missing at random, at the expense of including data for the nutrient(s) of interest as well as five strata-determining dietary variables from all subjects in the main study (not just those in the calibration study)Reference Carroll, Freedman and Pee²⁰. On the other hand, the distributions of dietary intake in the calibration study were similar to those in the final baseline cohort, justifying a simple random sample analysis. We fit the measurement error models using both stratified and simple random sample estimation procedures. Since results were similar, we present the simple random sample estimates only.

Results

Comparison of baseline cohort and calibration sub-sample

Demographic and selected exposure characteristics of the eligible baseline cohort and the calibration sub-sample are shown in Table 1. The calibration sub-sample reflects the full baseline cohort with regard to demographic composition (age, race/ethnicity, educational attainment), environmental exposure status (smoking, physical activity, diet, body mass index, hormone use), self-reported health status and family history of cancer. In addition, despite the stratified sampling into the calibration study, the median values of the dietary factors presented (Table 1) and their distributions (results not shown) were similar for both groups. This indicates that the calibration study was practically a simple random sample from the main cohort.

Table 1 Cohort and calibration sub-sample characteristics at baseline by gender: NIH–AARP Diet and Health Study, 1995–1996

NIH – National Institutes of Health; BMI – body mass index; HRT – hormone replacement therapy; FFQ – food-frequency questionnaire.

* Those reporting poor health were excluded from the baseline cohort.

Comparison among median intakes

Estimates of median intakes reported on the FFQ and 24HR after adjustment for within-person variation are presented in Table 2. We present medians rather than means because the distributions are skewed. The estimates for both instruments were obtained by estimating the means on the transformed scale with approximately normal (symmetric) distribution (see Methods section) and then back-transforming the means to the original scale. Under the assumption of a correct reference instrument, this procedure applied to the 24HR produces estimates of median usual intake on the original scale.

Table 2 Estimates of median daily intakes of nutrients/dietary constituents reported on the FFQ and 24HR in the calibration sub-sample, after adjustment for within-person variation: NIH-AARP Diet and Health Study, 1995–1996

FFQ – food-frequency questionnaire; 24HR – 24-hour dietary recall; NIH – National Institutes of Health; RE – retinal equivalents; α-TE – α-tocopherol equivalents.

* Values for nutrients are from dietary intakes only and do not include reported supplements.

† The first FFQ value is ≥15% different from the first 24HR value, on the transformed scale.

Comparison of the first and second administrations of each instrument – the FFQ and the 24HR – generally showed lower reported intakes on the second administration (Table 2). However, most differences were trivial. Comparisons between FFQ and 24HR generally showed few large differences. When we compared the median of the first FFQ with the median of the first 24HR (on the transformed scale), few differences of 15% or more were observed: intakes of protein (men), thiamin (men) and riboflavin (men) on the 24HR were higher than on the FFQ; reports of vitamin B₆ (women), vitamin B₁₂ (women), and fruits and vegetables were higher on the FFQ than the 24HR. Estimates of other nutrients including energy, carbohydrate, fat, fibre and all energy-adjusted macronutrients were similar for the two instruments.

Estimated correlations with true intake and attenuation factors

Using 24HR as a reference instrument, estimates of correlation coefficients between FFQ-reported and true nutrient intakes and of attenuation factors for FFQ are presented in Table 3. Correlations for energy were 0.39 and 0.22 for men and women, respectively. For men, correlations for absolute nutrient intakes ranged from 0.37 (vitamin E) to 0.67 (dietary cholesterol). For women, correlations ranged from 0.23 (vitamin E) to 0.56 (dietary cholesterol). Attenuation factors for energy were 0.26 among men and 0.15 among women. For men, attenuation factors for absolute nutrient intakes ranged from 0.24 (protein) to 0.53 (cholesterol); for women, they ranged from 0.16 (protein) to 0.43 (vitamin C). After energy adjustment, estimated correlations with true intake and attenuation factors generally rose, especially for women. Correlation coefficients ranged from 0.40 (vitamin E) to 0.76 (saturated fat) among men, and from 0.36 (vitamin E) to 0.70 (vitamin B₆) among women. Attenuation factors ranged from 0.26 (protein) to 0.68 (saturated fat) among men and from 0.24 (vitamin E) to 0.62 (vitamin B₆) among women.

Table 3 Estimated correlation coefficients and attenuation factors between the FFQ and truth using a measurement error model, unadjusted and adjusted for energy intake: NIH–AARP Diet and Health Study, 1995–1996

FFQ – food-frequency questionnaire; NIH – National Institutes of Health.

Discussion

The NIH–AARP Diet and Health Study is among the largest prospective studies of diet and cancer in the world, providing large numbers of incident cases for a variety of cancers. It also provides the largest calibration dataset in the USA, with 24HR data on nearly 2000 individuals.

The FFQ used in this study is an early version of the NCI DHQ, which has been evaluated in two other studiesReference Subar, Thompson, Kipnis, Midthune, Hurwitz and McNutt²¹^–Reference Kipnis, Subar, Midthune, Freedman, Ballard-Barbash and Troiano²³. The principal difference between the NIH–AARP Diet and Health Study FFQ and the DHQ is the response format: the AARP FFQ uses a grid while the DHQ does not. In the Eating at America’s Table Study (EATS)Reference Subar, Thompson, Kipnis, Midthune, Hurwitz and McNutt²¹, four 24HRs were used as a reference instrument to evaluate the DHQ as well as the Willett and Block questionnaires. The estimated correlations between DHQ-reported and true nutrient intakes and the attenuation factors for DHQ in EATS were generally high and similar to those in the current study, especially after energy adjustment (see Table 4 below). A second study, the Observing Protein and Energy Nutrition (OPEN) studyReference Subar, Kipnis, Troiano, Midthune, Schoeller and Bingham²²^, Reference Kipnis, Subar, Midthune, Freedman, Ballard-Barbash and Troiano²³, used urinary nitrogen and doubly labelled water as established reference biomarkers for intakes of protein and energy, respectively, to estimate the measurement error structure in the DHQ and 24HR, and to evaluate the performance of the 24HR as a reference instrument for protein, energy and energy-adjusted protein intakes. Implications of the OPEN study, as well as other biomarker studies, on the current analysis are discussed below.

Table 4 Correlation coefficients for nutrients, unadjusted and adjusted for energy intake, by gender and study

NCI – National Cancer Institute; ACS – American Cancer Society; WHI – Women’s Health Initiative; EATS – Eating at America’s Table Study; DHQ – Diet History Questionnaire.

Stram et al.²⁵ used energy from non-alcohol sources only. Estimates in this table are those for whites only.

Flagg et al.²⁶ used retinol for vitamin A.

Patterson et al.²⁸ used retinol for vitamin A.

In the NIH–AARP Diet and Health Study, estimated correlations with true absolute nutrient intakes were generally in the range of 0.4–0.6. Energy adjustment generally improved correlation coefficients and attenuation factors, particularly among women, where relatively low validity coefficients for energy were observed. We compare these correlation coefficients with correlations attained in previously published validation studies (Table 4). Studies included for comparison are those which present gender-specific data with ‘deattenuated’ and/or energy-adjusted correlation coefficients. These studies compared FFQs with multiple days of 24HRsReference Subar, Thompson, Kipnis, Midthune, Hurwitz and McNutt²¹^, Reference Munger, Folsom, Kushi, Kaye and Sellers²⁴^–Reference Flagg, Coates, Calle, Potischman and Thun²⁶ or food recordsReference Rimm, Giovannucci, Stampfer, Colditz, Litin and Willett²⁷ or bothReference Patterson, Kristal, Tinker, Carter, Bolton and Agurs-Collins²⁸. Most studies presented deattenuated energy-adjusted estimatesReference Subar, Thompson, Kipnis, Midthune, Hurwitz and McNutt²¹^, Reference Stram, Hankin, Wilkens, Pike, Monroe and Park²⁵^–Reference Patterson, Kristal, Tinker, Carter, Bolton and Agurs-Collins²⁸. Munger et al.Reference Munger, Folsom, Kushi, Kaye and Sellers²⁴ presented energy-adjusted estimates without first deattenuating. In general, within the range of correlations observed among these studies, those from the NIH–AARP Diet and Health Study are among the higher ones. For example, for energy-adjusted total fat, correlations in other studies range from 0.55 to 0.77, compared with AARP correlations of 0.72 (men) and 0.62 (women). Similarly, for energy-adjusted fibre, correlations in other studies range from 0.24 to 0.80, compared with AARP correlations of 0.72 (men) and 0.66 (women). Although the multitude of studies does give a qualitative sense of how well various nutrients are estimated using FFQs, one should exercise caution in interpreting the results of the direct comparison of validity coefficients from these studies conducted in different populations with different dietary assessment reference instrumentsReference Plummer and Kaaks²⁹.

Correlations with true intake and attenuation factors in the study were estimated using 24HR as a reference instrument, i.e. assuming that 24HR-reported intake is unbiased at the individual level and includes only within-person random error that is independent of true intake and error in the FFQ. Recent studies using reference biomarkersReference Kipnis, Midthune, Freedman, Bingham, Schatzkin and Subar¹⁰^, Reference Kipnis, Subar, Midthune, Freedman, Ballard-Barbash and Troiano²³^, Reference Kipnis, Midthune, Freedman, Bingham, Day and Riboli³⁰ have demonstrated that these assumptions may not hold – for protein, total energy and energy-adjusted protein, the 24HR has both intake-related and person-specific biases correlated with their counterparts in the FFQ. Thus, using the 24HR as a reference instrument may lead to biased estimates of the FFQ performance. For example, Kipnis et al. derived a formula relating the true attenuation factor to the one estimated using the 24HR as a reference instrumentReference Kipnis, Subar, Midthune, Freedman, Ballard-Barbash and Troiano²³. They demonstrate that intake-related bias in 24HR resulting in the flattened slope phenomenon, i.e. when the regression of the 24HR-reported on true intake has a slope smaller than one, tends to underestimate the attenuation factor (overestimate true attenuation). On the other hand, person-specific bias in 24HR as well as its correlation with person-specific bias in the FFQ leads to an overestimated attenuation factor (underestimated true attenuation). The resulting bias in the estimated attenuation factor therefore depends on the relative values of these three parameters that characterise deviations of 24HR from the requirements to a correct reference measure. In the OPEN study, the interplay of the flattened slope and person-specific bias in 24HR as well as correlation between person-specific biases in the 24HR and FFQ led to a minimal overestimation of the FFQ performance (20% or less for both correlation and attenuation) for absolute protein intake. The resulting overestimation for total energy intake, however, ranged from 120% (correlation with true intake in men) to 230% (attenuation in women)Reference Kipnis, Subar, Midthune, Freedman, Ballard-Barbash and Troiano²³. Biases this large would make it difficult to estimate moderate diet–cancer relationships using non-energy-adjusted models, even in a study as large as AARP. For example, if the estimated attenuation factor of 0.28 for absolute fat intake in women were biased upward by 20%, then the true attenuation factor would be about 0.23, and a true fat–breast cancer relative risk of 1.80 would be observed as 1.14. If the estimate were biased upward by 100%, then the true attenuation factor would be about 0.14, and a relative risk of 1.80 would be observed as only 1.09.

Fortunately, energy adjustment appears to improve the situation substantially. The original rationale for this procedure was to evaluate the nutrient effect independent of overall energy intakeReference Willett and Stampfer³¹. A second rationale for the procedure is to reduce measurement errorReference Willett³². Because the error in reported energy intake is correlated with the error in reported intake for most other nutrients on the same instrument, controlling for energy reduces this correlated error. The published OPEN resultsReference Kipnis, Subar, Midthune, Freedman, Ballard-Barbash and Troiano²³ used the density method rather than the residual method to energy-adjust protein, but the results for residual protein were similar to those for protein density. In OPEN, energy-adjusted protein on 24HR had substantially reduced person-specific bias and somewhat more pronounced flattened slope compared with absolute reported protein intake. Correlation between person-specific biases in the FFQ and 24HR, however, increased twofold in men and fourfold in women. As a result of the interplay among these factors, using the 24HR as a reference instrument produced an essentially unbiased estimate of the attenuation factor for energy-adjusted protein intake in men, while in women the estimated attenuation factor was biased upward by about 40%. If similar biases are assumed for energy-adjusted fat intake in the AARP calibration study, the estimated attenuation factor of 0.45 for residual fat intake in women would correspond to a true attenuation factor of about 0.32. Without residual confounding from energy, a true relative risk of 1.8 between the medians of the first and fifth quintiles of the residual fat distribution would be observed as 1.21. With a projected 3700 cases of breast cancer after 5 years of follow-upReference Schatzkin, Subar, Thompson, Harlan, Tangrea and Hollenbeck¹, the statistical power to detect this relative risk using a two-sided test with a 5% significance level would be 98%.

While we do not have reference biomarkers for nutrients other than protein and energy, it seems likely that the 24HR-based estimates of the FFQ performance for some other nutrients are biased. Thus, we think it best to consider the estimated correlations with true intakes and attenuation factors in Table 3 as relative rather than absolute measures of the FFQ performance, and emphasise comparison with corresponding estimates in other studies. However, similar to protein in the OPEN study, the estimated correlation coefficients and attenuation factors improved substantially for most nutrients in Table 3 after energy adjustment. This improvement might be expected to hold for true correlations and attenuation factors.

While we do not know the precise direction and magnitude of biases in evaluating the FFQ using 24HR as a reference instrument, it is reasonable to conclude that reported energy intake may be most useful as an adjusting covariate for estimating intake of energy-adjusted nutrients. The FFQ’s ability to capture energy-adjusted intake appears to be superior to its ability to capture absolute intake. Thus, energy adjustment not only allows one to assess the presence of an isoenergetic nutrient effect, but may also increase the statistical power to detect diet–disease relationships. Based on the estimated correlations with true intake and attenuation factors in Table 3, and even taking into account that some of them may overestimate the actual values, for energy-adjusted nutrients, the sample size of the NIH–AARP cohort study appears to be large enough and its range of intake wide enough to compensate for the loss of power due to measurement error in the FFQ and to detect moderate associations (≥1.8) between diet and common cancers.

Acknowledgements

Sources of funding: This research was supported (in part) by the Intramural Research Program of the NIH, NCI.

Conflict of interest declaration: None.

Authorship responsibilities: Study design – A.S., C.C.B., L.S.F., R.J.C., A.F.S., F.E.T.; study management – T.M., A.S., M.L.; analysis – V.K., D.M., C.C.B., F.E.T., M.S.B., L.S.F., R.J.C.; writing – F.E.T., V.K., D.M., A.F.S., A.S., L.S.F., R.J.C., M.L.

Acknowledgements: We are indebted to the participants in the NIH–AARP Diet and Health Study for their outstanding cooperation. In addition, the authors gratefully acknowledge the work of Westat, Inc. in conducting the NIH–AARP Diet and Health Study.

Appendix A

Prior to modelling, we performed the following steps for each nutrient and for each gender, in order to appropriately transform the data and remove outliers:

1. For each of the two 24HRs, temporarily remove values that fall below the 25th percentile of the distribution of reported intake minus three interquartile ranges or above the 75th percentile plus three interquartile ranges.
2. Find the Box–Cox power transformationReference Box and Cox¹⁶ of the remaining 24HR values that maximises the Shapiro–Wilk test statistic for normalityReference Shapiro and Wilk³³^, Reference Shapiro and Francia³⁴ for the average of the two transformed 24HRs.
3. Apply this transformation to the values of each 24HR and each FFQ, after adding back the values removed in step (1).
4. For each 24HR and each FFQ, remove as outliers the transformed values that fall below the 25th percentile of the distribution of transformed reported intake minus two interquartile ranges or above the 75th percentile plus two interquartile ranges.

The correlations with true intake and attenuation factors were estimated for each nutrient on their respective transformed scales. Because the Box–Cox transformation is monotonic, adjustment for energy does not depend on the scale to which energy is transformed. Therefore, the correlations with true intake and attenuation factors for energy-adjusted nutrients should be interpreted on the scale to which each particular nutrient was transformed.

Appendix B Measurement error model

For a given person, let T_N and T_E denote true intakes of nutrient N of interest and total energy E, respectively. Let and denote the FFQ- and 24HR-reported intakes of nutrient on occasion , respectively. On appropriately transformed scales where all random variables are (approximately) normally distributed, the simultaneous measurement error modelReference Carroll, Midthune, Freedman and Kipnis¹⁸ for nutrients N and E is given as

Among the random variables, has mean and variance . Parameters and , denote overall biases independent of true intake for the FFQ and 24HR, respectively; for the same nutrient, they are allowed to have different values to reflect the tendency to report lower mean intakes on repeat administrations of the same instrument. It is assumed that the first 24HR is unbiased, so that . Slopes reflect intake-related bias in the FFQ; their values smaller than 1 lead to the well-known flattened slope phenomenon, when people with lower than average intake overreport it and people with higher than average intake underreport it. Random variables r_N and r_E with means of zero and variances and , respectively, denote person-specific biases in the FFQ that are independent of true intake (but not of each other) and represent the differences between total within-person biases and their intake-related components. For nutrient and repeat administration , random variables denote within-person random errors in the FFQ and 24HR, respectively, which are assumed to be independent of true intakes and person-specific biases. Because the FFQs and 24HRs were administered well separated in time from each other and between their respective repeats, within-person random errors for two different instruments and two different administrations for the same instrument are assumed to be independent. However, for the same administration of each instrument, within-person random errors for nutrients N and E are allowed to be correlated. It is critical that, under this model, the 24HR is assumed a correct reference instrument with errors that are independent of true intakes, errors in the FFQ and, for repeat administrations, of each other.

Since we are mainly interested in estimating the FFQ correlations with true intake and attenuation factors, it is convenient to re-parameterise measurement error model (B1) in terms of the first and second moments of the joint distribution of the intakes reported on two FFQs and two 24HRs. Define the observed data vector by . Denote its mean vector by μ and its variance covariance matrix by Σ. Note that, according to the model assumptions, and . Define

From model (B1), it follows that the variance–covariance matrix of vector D is then given by

so that the observed data follow the model

The maximum likelihood estimates for the unknown first and second moments are computed by fitting model (B3) using the SAS Mixed procedure.

For a given nutrient L, , the correlation with true intake and attenuation factor are given by

and

respectively. Under our assumptions, it follows from (B2) that

so that the correlation with true intake and attenuation factor for nutrients N and E are estimated as

and

respectively.

For nutrient residual R, we have the following respective expressions for the true residual and the residual calculated using the FFQ:

and

with

and

From formulas (B2) it follows that, for the nutrient residual, the correlation with true intake and attenuation factor are estimated as

and

respectively.

References

1Schatzkin, A, Subar, AF, Thompson, FE, Harlan, LC, Tangrea, J, Hollenbeck, AR, et al. . Design and serendipity in establishing a large cohort with wide dietary intake distributions: the National Institutes of Health–American Association of Retired Persons Diet and Health Study. American Journal of Epidemiology 2001; 154: 1119–1125.CrossRef Google Scholar PubMed

2Plummer, M, Clayton, D. Measurement error in dietary assessment: an investigation using covariance structure models. Part I. Statistics in Medicine 1993; 12: 925–935.CrossRef Google Scholar PubMed

3Plummer, M, Clayton, D. Measurement error in dietary assessment: an investigation using covariance structure models. Part II. Statistics in Medicine 1993; 12: 937–948.CrossRef Google Scholar PubMed

4Freudenheim, JL, Marshall, JR. The problem of profound mismeasurement and the power of epidemiologic studies of diet and cancer. Nutrition and Cancer 1988; 11: 243–250.CrossRef Google Scholar PubMed

5Freedman, LS, Schatzkin, A, Wax, Y. The impact of dietary measurement on planning sample size required in a cohort study. American Journal of Epidemiology 1990; 132: 1185–1195.CrossRef Google Scholar PubMed

6 US National Cancer Institute, Division of Cancer Epidemiology and Genetics. NIH–AARP Diet and Health Study [homepage]. http://www.dietandhealth.cancer.gov.Google Scholar

7 US National Cancer Institute, Division of Cancer Control and Population Sciences. Risk Factor Monitoring and Methods, Diet History Questionnaire [online], 8 February 2007. Available at http://riskfactor.cancer.gov/DHQ/. Accessed June 2007.Google Scholar

8Subar, AF, Midthune, D, Kulldorff, M, Brown, CC, Thompson, FE, Kipnis, V, et al. . Evaluation of alternative approaches to assign nutrient values to food groups in food frequency questionnaires. American Journal of Epidemiology 2000; 152: 279–286.CrossRef Google Scholar PubMed

9Walker, AM, Blettner, M. Comparing imperfect measures of exposure. American Journal of Epidemiology 1985; 121: 783–790.CrossRef Google Scholar PubMed

10Kipnis, V, Midthune, D, Freedman, LS, Bingham, S, Schatzkin, A, Subar, A, et al. . Empirical evidence of correlated biases in dietary assessment instruments and its implications. American Journal of Epidemiology 2001; 153: 394–403.CrossRef Google Scholar PubMed

11Rosner, B, Willett, WC. Interval estimates for correlation coefficients corrected for within-person variation: implications for study design and hypothesis testing. American Journal of Epidemiology 1988; 127: 377–386.CrossRef Google Scholar PubMed

12Rosner, B, Willett, WC, Spiegelman, D. Correction of logistic regression relative risk estimates and confidence intervals for systematic within-person measurement error. Statistics in Medicine 1989; 8: 1051–1069.CrossRef Google Scholar PubMed

13Spiegelman, D, Schneeweiss, S, McDermott, A. Measurement error correction for logistic regression models with an ‘alloyed gold standard’. American Journal of Epidemiology 1997; 145: 184–196.CrossRef Google Scholar PubMed

14Wong, MY, Day, NE, Bashir, SA, Duffy, SW. Measurement error in epidemiology: the design of validation studies I: univariate situation. Statistics in Medicine 1999; 18: 2815–2829.3.0.CO;2-#>CrossRef Google Scholar PubMed

15Wong, MY, Day, NE, Wareham, NJ. Measurement error in epidemiology: the design of validation studies II: bivariate situation. Statistics in Medicine 1999; 18: 2831–2845.3.0.CO;2-3>CrossRef Google Scholar PubMed

16Box, G, Cox, D. An analysis of transformations. Journal of the Royal Statistical Society 1964; B26: 211–252.Google Scholar

17Willett, WC. Nutritional Epidemiology, 2nd ed. New York: Oxford University Press, 1998.CrossRef Google Scholar

18Carroll, RJ, Midthune, D, Freedman, LS, Kipnis, V. Seemingly unrelated measurement error models with application to nutritional epidemiology. Biometrics 2006; 62: 75–84.CrossRef Google Scholar PubMed

19SAS Institute, Inc. SAS/STAT User’s Guide. Cary, NC: SAS Institute, Inc., 2000.Google Scholar

20Carroll, RJ, Freedman, L, Pee, D. Design aspects of calibration studies in nutrition, with analysis of missing data in linear measurement error models. Biometrics 1997; 53: 1440–1457.CrossRef Google Scholar PubMed

21Subar, AF, Thompson, FE, Kipnis, V, Midthune, D, Hurwitz, P, McNutt, S, et al. . Comparative validation of the Block, Willett, and National Cancer Institute food frequency questionnaires: the Eating at America’s Table Study. American Journal of Epidemiology 2001; 154: 1089–1099.CrossRef Google Scholar

22Subar, AF, Kipnis, V, Troiano, RP, Midthune, D, Schoeller, DA, Bingham, S, et al. . Using intake biomarkers to evaluate the extent of dietary misreporting in a large sample of adults: the OPEN Study. American Journal of Epidemiology 2003; 158: 1–13.CrossRef Google Scholar

23Kipnis, V, Subar, AF, Midthune, D, Freedman, LS, Ballard-Barbash, R, Troiano, RP, et al. . Structure of dietary measurement error: results of the OPEN Biomarker Study. American Journal of Epidemiology 2003; 158: 14–21.CrossRef Google Scholar PubMed

24Munger, RG, Folsom, AR, Kushi, LH, Kaye, SA, Sellers, TA. Dietary assessment of older Iowa women with a food frequency questionnaire: nutrient intake, reproducibility, and comparison with 24-hour dietary recall interviews. American Journal of Epidemiology 1992; 136: 192–200.CrossRef Google Scholar PubMed

25Stram, DO, Hankin, JH, Wilkens, LR, Pike, MC, Monroe, KR, Park, S, et al. . Calibration of the dietary questionnaire for a multiethnic cohort in Hawaii and Los Angeles. American Journal of Epidemiology 2000; 151: 358–370.CrossRef Google Scholar PubMed

26Flagg, EW, Coates, RJ, Calle, EE, Potischman, N, Thun, MJ. Validation of the American Cancer Society Cancer Prevention Study II Nutrition Survey Cohort Food Frequency Questionnaire. Epidemiology 2000; 11: 462–468.CrossRef Google Scholar PubMed

27Rimm, EB, Giovannucci, EL, Stampfer, MJ, Colditz, GA, Litin, LB, Willett, WC. Reproducibility and validity of an expanded self-administered semiquantitative food frequency questionnaire among male health professionals. American Journal of Epidemiology 1992; 135: 1114–1126.CrossRef Google Scholar PubMed

28Patterson, RE, Kristal, AR, Tinker, LF, Carter, RA, Bolton, MP, Agurs-Collins, T. Measurement characteristics of the Women’s Health Initiative food frequency questionnaire. Annals of Epidemiology 1999; 9: 178–187.CrossRef Google Scholar PubMed

29Plummer, M, Kaaks, R. Commentary: An OPEN assessment of dietary measurement error. International Journal of Epidemiology 2003; 32: 1062–1063.CrossRef Google Scholar

30Kipnis, V, Midthune, D, Freedman, L, Bingham, S, Day, NE, Riboli, E, et al. . Bias in dietary-report instruments and its implications for nutritional epidemiology. Public Health Nutrition 2002; 5: 915–923.CrossRef Google Scholar PubMed

31Willett, W, Stampfer, MJ. Total energy intake: implications for epidemiologic analyses. American Journal of Epidemiology 1986; 124: 17–27.CrossRef Google Scholar PubMed

32Willett, W. Dietary diaries versus food frequency questionnaires – a case of undigestible data. International Journal of Epidemiology 2001; 30: 317–319.CrossRef Google Scholar PubMed

33Shapiro, SS, Wilk, MB. An analysis of variance test for normality (complete samples). Biometrika 1965; 52: 591–611.CrossRef Google Scholar

34Shapiro, SS, Francia, RS. An approximate analysis of variance test for normality. Journal of the American Statistical Association 1972; 67: 215–216.CrossRef Google Scholar

Table 1 Cohort and calibration sub-sample characteristics at baseline by gender: NIH–AARP Diet and Health Study, 1995–1996

Table 4 Correlation coefficients for nutrients, unadjusted and adjusted for energy intake, by gender and study

Article contents

Performance of a food-frequency questionnaire in the US NIH–AARP (National Institutes of Health–American Association of Retired Persons) Diet and Health Study

Abstract

Keywords

Methods

Design

Study instruments

Statistical methods

Results

Comparison of baseline cohort and calibration sub-sample

Comparison among median intakes

Estimated correlations with true intake and attenuation factors

Discussion

Acknowledgements

Appendix A

Appendix B Measurement error model

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests