The use of FFQ to assess habitual intake of populations is the subject of discussion, because they may lack the ability to capture full heterogeneity in reported intake. One of the major reasons is that respondents are not well able to summarise their food intake over a longer period(Reference Kipnis, Subar and Midthune1). Because of this measurement error, FFQ may be a less powerful instrument to detect relationships between diet and disease in epidemiological studies than food records or 24 h recalls(Reference Bingham, Luben and Welch2–Reference Schatzkin, Kipnis and Carroll4). On the other hand, FFQ appear to be still an essential instrument for epidemiological studies, because they are the cheapest and most feasible method to assess long-term consumption. As FFQ can be applied in larger populations than recalls or records, their use improves the power of a study(Reference Thompson, Kipnis and Midthune5). It is also shown that for some purposes, reports by FFQ are better than those by other instruments, and therefore some researchers recommended not to abandon FFQ but to combine them with open-ended methods(Reference Lissner, Troiano and Midthune6).
The performance of FFQ may be improved by adjusting reported nutrients for energy intake(Reference Subar, Thompson and Kipnis7). Also, to assess the independent effect of nutrients on disease, energy adjustment is needed(Reference Livingstone and Black8, Reference Willett, Howe and Kushi9). However, it is questionable whether this can be done without introducing error. Validation studies have shown that energy intake by FFQ is misreported(Reference Livingstone and Black8, Reference Andersen, Tomten and Haggarty10–Reference Subar, Kipnis and Troiano12). But because of limitations in the studies, it is not clear how large this error is. An important limitation is that the time frame of the reference method does not match that of the FFQ. Often methods with a shorter reference period than the FFQ, such as 24 h recalls, food records or biomarkers, are used for comparison. In addition, the number of subjects in many validation studies is limited.
At our division, we apply dietary assessment methods before starting a controlled dietary trial to estimate the required energy level of participants to maintain stable body weights during the trial. Until 1995, we have used a 3 d estimated or weighed food record(Reference de Vries, Zock and Mensink13). Since 1996, the FFQ has been used because the method is less time-consuming. As the controlled experimental diets had known energy contents and the trials lasted at least 3 weeks, we have information of an almost ‘gold standard’ reference method for reported habitual energy intake by the FFQ for participants with a stable body weight. By combining the results of eleven trials, we were able to evaluate self-reported intake by FFQ in 516 subjects. For this evaluation, we aimed to assess how accurately participants report their energy intake by FFQ for assessing absolute individual levels of energy intake and ranking of individuals according to energy intake.
Methods
Study design
We compared energy intakes reported by FFQ with those required to maintain body weights during controlled dietary trials. We obtained data of subjects participating in eleven controlled dietary trials (Table 1). The present study was conducted according to the guidelines laid down in the Declaration of Helsinki, and all procedures involving human participants were approved by the Ethics Committee of Wageningen University. Written informed consent was obtained from all participants. Other details and results of the trials have been described elsewhere(Reference Alles, Hartemink and Meyboom14–Reference Winkels, Brouwer and Siebelink24).
We asked participants to fill in a FFQ 2–6 weeks before entering the trial in order to estimate their energy needs. At the start of the trial, participants were allocated to a diet with an energy content according to their FFQ report. During the trials, body weights of the participants were kept stable. Their body weights were measured twice a week, and if their body weight decreased or increased by more than 0·2 kg between two measurements, the participants received a diet with a higher or lower energy content, respectively.
Participants
Participants were recruited from the Wageningen University and from the population of the city of Wageningen and its surroundings. In total, 604 participants were included in the eleven trials. For the present analysis, we left out participants who withdrew before the end of the trial (n 11) and whose body weights changed more than 2·0 kg from the end of week 2 until the end of the trial (n 34). Of those who participated in more than one trial, only the data from the first trial were used (n 43). Among the remaining 516 volunteers, 342 were women and 174 were men (Table 2). Most volunteers were students or staff members of the Wageningen University. On average, they had a normal body weight, and 67 % were younger than 25 years.
Assessment of energy intake by FFQ
During a screening visit, the participants filled out a FFQ. Trained dietitians checked whether it was filled out properly, and if necessary, additional information was obtained about unusual or missing reports.
The FFQ were developed by selecting foods from the latest food consumption data of the Dutch National Food Consumption Surveys(25–Reference Hulshof and Van Staveren27), which contributed >0·5 % to the intake of fat, fatty acids and cholesterol, and covered at least 90 % of energy intake. Thereafter, foods were added to achieve face validity. The reference period of the questionnaire was 4 weeks, and portion size questions were included for spreads, cheese, milk in coffee, gravy, candy bars and beer.
In trials 1–8, a FFQ with a 104-item food list in a table format was used. This FFQ was a modification of the Vet Express(Reference Feunekes, Van Staveren and De Vries28), a FFQ developed at Wageningen University in 1992 to assess energy, total fat, fatty acids and cholesterol. This FFQ was updated several times using data from a more recent food consumption survey(25, 26) and newer versions of the Dutch Nutrient Databases(29, 30). Also, the table format was abandoned, and the questions in the FFQ were asked according to a nested approach(Reference Subar, Thompson and Smith31). This resulted in a new questionnaire with 125 items, which was used for trial 9. For other studies at our division, this 125-item FFQ was extended with questions to enable the assessment of dietary fibre and specific micronutrients(Reference Verkleij-Hagoort, de Vries and Stegers32). The result was a 183-item questionnaire, which was used in trials 10 and 11.
Energy intake during the trials
Based on our previous study with food records(Reference de Vries, Zock and Mensink13), we assumed that the participants needed the first 2 weeks of the trials to adapt to the experimental diets. Therefore, we defined actual required energy intake as the mean energy intake calculated from provided experimental diets from day 14 until the end of the trial.
The energy content of the experimental diets ranged from 7 to 20 MJ. In trials 1–3, the experimental diets were supplied at twenty-seven energy levels, in increments of 0·5 MJ (120 kcal), and in the other trials at fourteen energy levels, in increments of 1 MJ (239 kcal).
The experimental diets were composed with nutrient contents according to the specific demands of each trial, while the other nutrients met the RDA of the Dutch Health Council(33). Actual energy intakes during the trials were calculated from all consumed foods and beverages using the most recent Dutch food composition database(29, 30, 34, 35). For calculations of the experimental diets, we used the Food Calculation System (BAS nutrition software 2004, Arnhem, The Netherlands) in which the most recent Dutch food composition database was included.
In general, the diets consisted of conventional foods, but in some trials, specific test foods were provided such as margarines with a specific fat composition or milk fortified with folic acid. A total of eighty-nine participants (17 %), who were lacto-vegetarians or disliked some types of meat, received meat replacers, resulting in a diet with a similar nutrient composition to that of their non-vegetarian counterparts.
During weekdays at lunch time, the participants consumed their hot meal at the division. All other foods were supplied daily as a package and consumed at home. On Fridays, the participants received a package with foods and beverages for the breakfast, lunch and hot meals of the weekend plus instructions for the preparation of these foods. We provided about 90 % of the total daily energy. The remaining 10 % of energy had to be chosen by the participants from a so-called free-food item list. For each dietary trial, this list was adapted to the specific demands of the trial (e.g. items low in fat or in β-carotene in trials 3 and 8, respectively). Furthermore, we allowed participants unrestricted consumption of non-energy foods such as coffee and tea without milk and sugar, water, herbs and spices, lemon juice, vinegar and non-energy soft drinks. The daily choice of free-food items and any deviations from the guidelines were recorded by each subject in a diary. As participants visited the division each weekday, it was possible to advise them about their diets on these days if necessary. Participants were urged not to change their physical activities or smoking habits and asked to record any change in lifestyle in their diary.
Checks of body weight and diets
Body weights were measured twice a week before participants used their hot meal, with participants wearing no shoes or heavy clothing and with empty pockets. If body weight was changed more than 0·2 kg, energy intake was adjusted to a higher or lower energy level. On average, body weight decreased by 0·25 (sd 0·22) kg between days 1 and 7, and 0·17 (sd 0·17) kg between days 8 and 14. Between day 14 and the end of the trial, the participants lost their body weight on average 0·20 (sd 0·72) kg or 6·2 (sd 39·0) g/d. During this period of 1–8 weeks, eighty-one (16 %) participants lost or gained between 1·0 and 2·0 kg body weight; for the other participants (84 %), weight loss or gain was less than 1·0 kg.
Duplicate portions of the provided experimental diets were collected every day for an imaginary participant with a daily energy intake of 11 MJ, stored at − 20°C and analysed for protein and fat after the trials. Carbohydrates were calculated by difference (carbohydrates (g) = 100 − fat (g) − protein (g) − ash (g) − water (g)). Energy content was calculated from the macronutrient composition of the duplicate portions using Atwater factors(Reference Southgate and Durnin36) and combined with the calculated energy content of the free-choice items using the most recent Dutch food composition database. The daily energy content of the provided experimental diets according to chemical analysis of duplicate portions and calculated composition of the free-choice items was on average 10·6 MJ (2536 kcal) and lower than the a priori calculated mean energy content of 11 MJ.
The participants’ diaries were checked regularly, and anonymous questionnaires on compliance were filled out in trials 2 and 10. Both did not reveal deviations from the protocol that might have affected the results.
Statistical analysis
The reported average energy intakes and their 95 % CI were computed from the FFQ reports. To show systematic differences, we plotted differences between the reported energy intake and the required energy intake against the average of the two methods in a so-called Bland–Altman plot. The difference in reported intakes expressed as percentage of actual intake between men and women was tested by unpaired Student's t test. Because of a non-normal distribution of the biases, the difference in bias between men and women was tested by the Mann–Whitney test. To assess associations between reported and actual energy intakes, we used Pearson's correlation coefficients and applied Fisher's Z transformations to calculate 95 % CI of the correlation coefficients. We classified the participants in those reporting < 90 %, between 90 and 110 % regarded as accurate intakes and >110 % of actual energy intake during the trials and tested the differences in BMI between these groups using one-way ANOVA and the post hoc Tukey test. Regression analysis was used to determine the relationships of sex, age, BMI, type of list (table format v. nested approach) and season to the difference between reported and actual intakes. All statistical tests were performed in SPPS for Windows version 15.0 (SPSS, Inc., Chicago, IL, USA).
Results
The reported intake as a percentage of actual intake required to maintain a stable body weight ranged from 92·9 to 100·7 % in the eleven trials (Table 1). Mean reported energy intake was significantly lower than actual energy intake for all participants and for men, but not for women (difference between sexes: P = 0·004; Table 3). As a consequence, the FFQ underestimated the difference in energy intake between sexes compared with the reference method. The difference between the reported energy intakes of men and women was 3·2 (95 % CI 2·9, 3·5) MJ and that between their actual energy intakes was 3·8 (95 % CI 3·0, 4·6) MJ.
* Mean value was significantly different from the bias in women (P = 0·004).
† There are two missing values for age.
The reported energy intake was highly correlated with the actual intake: Pearson's correlation coefficients were 0·82 (95 % CI 0·80, 0·85) for all participants, 0·74 (95 % CI 0·69, 0·78) for women and 0·80 (95 % CI 0·74, 0·85) for men.
Some participants misreported energy intakes more than others. The individual reported energy intakes as a percentage of actual energy intakes showed a large variation and ranged from 56·3 to 159·6 % for women and from 43·8 to 151·0 % for men.
On the individual level, the Bland–Altman plot (Fig. 1) showed both over- and under-reporting of energy intake. The under and upper 95 % limits of agreement varied from − 3·3 to 3·8 MJ. The plot also showed a general trend of under-reporting at lower intakes and over-reporting at higher intakes for both men and women.
Significant differences between reported and actual intakes were found for participants aged ≤ 30 years, with a BMI>25 kg/m2, and for those who reported in autumn or by a FFQ with a table format, but no statistically significant differences were found between strata. Subsequent analyses showed that reported energy intake was inversely associated with BMI for women (r − 0·13; P = 0·013) and men (r − 0·26; P = 0·001), and with age for men (r − 0·21; P = 0·006) but not for women (r − 0·0002; P = 0·971). In addition, mean BMI of those reporting >110 % of actual energy intake was statistically significantly lower than in those reporting accurately (90–110 %) or below 90 % of actual intake (Table 4).
a,b Mean values with unlike superscript letters were significantly different (P < 0·05).
The correlation coefficients of the reported energy intake using the FFQ in a table format (trials 1–8: n 350, r 0·84) when compared with actual energy intake were similar to that using the two versions of the FFQ with questions according to a nested approach (trial 9: n 70, r 0·82; trials 10–11: n 96, r 0·73).
Also, of the variables introduced into the regression model, sex, age and BMI contributed statistically significantly to the model, whereas the type of list and season (spring, summer, autumn or winter) in which the FFQ were filled out did not.
The regression analysis provided the following regression equation:
where y is the difference between the reported and actual energy intakes in MJ, sex is 0 for men and 1 for women, age is reported in years and BMI in kg/m2.
Discussion
Energy intake reported by FFQ showed, on average, very good agreement with actual energy intake during controlled feeding trials, while body weights were kept stable. Also, FFQ were very well able to rank the participants according to energy intakes, but on the individual level, we found large differences. The present study provided a unique design to evaluate the reported energy intake of a large sample of participants using an almost ‘gold standard’ reference. Studies using references of this quality, including doubly labelled water and indirect calorimetry, are often too expensive or too difficult to apply in a large sample. However, it may be questioned whether our reference method is truly a ‘gold standard’.
The energy intakes assessed by the reference method might have been affected by changes in body weight or physical activity during the controlled trials and errors in the food composition table. To avoid the effects of changes in body weight, we used only the data of the participants after day 14 of the trials when they were used to the test diets and expected to have stable body weights. The average decrease in body weight from day 14 to the end of the study was only 6·2 (sd 39·0) g/d. This means that on average, the actual energy requirement of the participants was 0·2 MJ/d higher than the energy intake calculated from the provided diets, assuming that 1 kg of body weight equals 30 124·8 kJ (7200 kcal)(Reference Shils, Shile and Ross37). In addition, we used the same food composition tables for calculating the energy intake from the FFQ as for calculating the actual energy intake during the trials. Therefore, errors in both methods originating from the food composition table are not independent. Assuming that systematic errors introduced by the food composition table in FFQ and reference method were the same, these errors would not have changed our conclusions for the mean estimated differences between the methods on the group and individual levels. Also, random errors in the food composition tables would not have affected these differences. However, because of these correlated errors, ranking of individuals according to their energy intake estimated from the FFQ may have agreed better with the reference method than with true intake, resulting in higher correlation coefficients. Yet, comparison of the energy content of the diets by chemical analysis for the energy level of 11 MJ only showed a small overestimation of, on average, 0·4 MJ for calculated energy intake during the trials. Although chemical analysis was only conducted for one average energy level of all test diets, we may expect the same difference for the other energy levels as diets were devised in a very standardised way. Therefore, we think that our reference method can be regarded as an almost ‘gold standard’ of energy intake.
Another limitation may have been a change in energy requirement between the period of filling out the FFQ and the trial because of a change in lifestyle. The FFQ were filled out about 2–6 weeks before the start of the trial. However, we do not expect large differences, because of the short term between FFQ reports and the start of the trials and because participants were asked not to change their smoking habits and physical activity. According to the food diaries, the participants complied well with these guidelines.
We evaluated the three types of FFQ and information from different seasons in one analysis. We think this is justified because of a similar development of the FFQ and the fact that they all accounted for at least 90 % of the energy intake. Also, we did not find differences in performance between different types of FFQ or FFQ applied in different seasons. However, FFQ including questions with a nested approach(Reference Subar, Thompson and Smith31) may be easier to fill out, and longer FFQ may perform better than shorter FFQ(Reference Molag, de Vries and Ocke38). In the present study, we could not confirm this. An explanation may be that the reports by the longer FFQ in other studies were reported by older participants in whom under-reporting was more common(Reference Tooze, Subar and Thompson39). According to the literature(Reference Livingstone and Black8, Reference Tooze, Subar and Thompson39), accurate reporting of energy intake is influenced by several factors including sex, age, educational level, BMI, psychosocial factors and lifestyle. Inclusion of sex, age and BMI into our model confirmed these associations, even in our rather homogeneous population.
The FFQ in the present study were developed to cover an energy intake of at least 90 % of actual energy intake. Theoretically, this implies that reports by our FFQ may underestimate energy intake by maximally 10 %. Thus, if we would for this reason compare the FFQ reports with 90 % of actual energy intake during the trials, the conclusion of our evaluation would have been that the FFQ, on average, slightly overestimate energy intake.
It may have been expected that our participants would yield good reports of their food consumption, because they were young and mostly highly educated(Reference Black, Prentice and Goldberg40). They were motivated to enter a controlled dietary trial, to fill out a FFQ and to be aware that the purpose of the FFQ was to estimate their required energy intake during the trial. It is not unthinkable that some over-reported their consumption because they were afraid to receive too little food during the trial. Thus, our FFQ may not provide similar good results in other studies or populations as in the present study.
Reports of energy intakes by the FFQ during our study showed better results both on the group mean level and to rank participants according to their intake than other studies. In other studies, underestimation on the group level ranged from 10 to 36 %. Andersen et al. (Reference Andersen, Tomten and Haggarty10) reported an underestimation of 11 % in a group of seventeen women with a comparable age with that of our population. Subar et al. (Reference Subar, Thompson and Kipnis7) found an underestimation of 36 % for women (n 206) and 34 % for men (n 245) in the age range of 40–69 years. Kroke et al. (Reference Kroke, Klipstein-Grobusch and Voss11) found an underestimation of 19 % for a group of twenty-eight women and men aged between 35 and 67 years. Our participants were on average much younger, but if we compare the results of a similar age subgroup of our population, the difference between reported intake and the reference level is also much smaller than in other studies. We found a mean overestimation of 1·9 % for women (n 57) and 4·6 % underestimation for men (n 37) in the participants of ≥ 40 years (results not shown). In a subgroup of our population (n 17) aged 65–86 years with an average BMI of 24·5 kg/m2, the underestimation was somewhat larger with 13 % (1·5 MJ) but still in the lower range compared with other studies(Reference de Vries, de Groot and van Staveren41). When evaluating ranking of participants, we found correlation coefficients of 0·74 for women and 0·80 for men, whereas Subar et al. (Reference Subar, Thompson and Kipnis7) reported correlation coefficients of 0·10 (n 206) for women and 0·19 (n 245) for men. Even in the older men (n 17), the correlation coefficient between the reported energy intake and the actual energy intake of 0·67 can still be considered as reasonably good compared with other studies(Reference de Vries, de Groot and van Staveren41).
Although the agreement for ranking individuals was good, the differences found on the individual level ranging from − 50 to +50 % between reported and actual energy intakes were large but similar to those of other studies(Reference Andersen, Tomten and Haggarty10, Reference Kroke, Klipstein-Grobusch and Voss11). As we define reported energy intakes on the individual level within ± 10 % of actual energy intakes as acceptable, only 57 % of the participants reported within that range. Men had higher requirements and wider ranges in misreporting than women. The bias appeared to be intake-related, with under-reporting at lower intakes and over-reporting at higher intakes for both men and women.
Another explanation for the better performance of the FFQ compared with other FFQ for assessing mean group intake and ranking individuals to their intake may be their shorter reference period. It was 1 month, whereas many other FFQ use 1 year. In general, people find it hard to report their intake over a long period, taking all seasonal variation into account. Although for energy intake a month is expected to be sufficient(Reference Margetts, Thompson and Key42), this may be different for nutrients or foods with a larger variation. On the other hand, we used a relatively long reference period for our reference method (1–8 weeks), whereas other studies had a maximum reference duration of 14 d using the doubly labelled water method(Reference Andersen, Tomten and Haggarty10–Reference Subar, Kipnis and Troiano12).
It is surprising that our FFQ perform better on the group level and for ranking of individuals than other FFQ, but that their performance on the individual level is as inaccurate. A better performance on the group level may be explained by the fact that in our selected population, under- and overestimation occurred to the same extent resulting in, on average, a small difference. The better ranking that we found may be, but only for a small part, explained by correlated errors due to the use of the same food composition table to calculate energy intake for both the FFQ and the reference method.
Thus, reported intake by our FFQ, on average, equals the actual energy intake to maintain body weight and ranks the participants reasonably well according to their energy intake but is not accurate at the individual level. For adjustment of energy intake when studying nutrient–disease relationships(Reference Willett, Howe and Kushi9), it is required that FFQ accurately determine absolute energy intakes. Data from the Observing Protein and Energy Nutrition study(Reference Kipnis, Subar and Midthune1) showed a failure of FFQ to provide a sufficiently accurate report of absolute energy intakes to enable the detection of their moderate associations with disease. Yet, it was also shown that because of correlated errors in reporting protein and energy, energy-adjusted protein was less affected by measurement error than absolute protein intake. The results of our validation could suggest that adjustment of nutrients to energy intake reported by FFQ may result in the introduction of substantial error in epidemiological studies. However, in case of correlated errors between the nutrient of interest and energy, error might be reduced by adjusting for energy.
We conclude that despite the large differences in accuracy between individuals, the FFQ used in the present study can be useful to pick up dietary changes in trials if the population is large enough, because systematic errors are only small on the group level. In addition, our FFQ can be applied in epidemiological studies to rank individuals accurately according to their energy intake, but if nutrient intakes are adjusted for energy as reported by FFQ, this may affect the results of these studies in an unknown direction.
Acknowledgements
E. S. and J. H. M. d. V. designed the study; E. S. collected and analysed the data, and wrote the manuscript; A. G. supervised the data analysis, and together with J. H. M. d. V. revised the earlier versions of the manuscript. None of the authors had a personal or financial conflict of interest. The present study received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.