What foods we choose to consume can be one of the most important factors that influence our health. The prevalence of overweight, obesity and CVD has highly increased in the recent years and is strongly related with the type of diet(Reference Martínez-González, Sánchez-Villegas and De Irala1–Reference Psaltopoulou, Naska and Orfanos4). To provide public health recommendations to prevent chronic diseases, we must first assess the food and nutrient intake of the population.
There are many methods to estimate food and nutrient intake, such as dietary records, 24 h dietary recalls, dietary history and FFQ. Some of these are very complicated and more prone to error to be self-reported by participants as a dietary record or 24 h dietary recall(Reference Thompson and Byers5). Epidemiological studies have commonly used the FFQ to assess usual food consumption. Although an FFQ does not have the same accuracy as a dietary record or a 24 h dietary recall, the FFQ can reasonably report intake over a large period of time and with limited resources(Reference Sampson6), which is very important in order to study a large sample of the population.
In Spain, the Seguimiento Universidad de Navarra (SUN) project is an open-enrollment cohort with currently more than 19 000 university graduates. It studies how dietary behaviour is related to the incidence of chronic disease(Reference Martínez-González7). To evaluate dietary intake, we use an FFQ, which is included in the baseline questionnaire of the SUN project and was previously validated for the Spanish population(Reference Martín-Moreno, Boyle and Gorgojo8). However, the reproducibility of the FFQ has not been reported. Therefore, the aim of our study was to assess the reproducibility of the SUN FFQ.
Material and methods
Study population
The SUN project was designed in collaboration with the Harvard School of Public Health in 1998 and the methodology is similar to that used in large American cohorts such as the Nurses’ Health Study(Reference Liu, Manson and Stampfer9) and the Health Professionals Follow-up Study(Reference Hu, Rimm and Stampfer10). The recruitment of the cohort started in December 1999 and as a dynamic prospective cohort it is permanently open. So far, the cohort consists of >19 000 university graduates. Among others, the main areas of investigation of the SUN cohort are centred on hypertension and other CVD, cancer, obesity, diabetes, depression, fertility and injuries by traffic accidents(Reference Seguí-Gómez, de la Fuente and Vázquez11).
For the present analyses, we assessed 326 participants of the SUN cohort: 115 men (35·3 %) and 211 women (64·7 %). The participants completed a self-administered optically readable FFQ two times, sixty-six of them in less than 1 year and 260 of them in more than 1 year. They were not randomly selected; they were participants who answered the FFQ twice by mistake. Though our participant selection was not randomized, and therefore this represents a study limitation, relevant differences did not exist between the subsample and the whole cohort (Table 1).
SUN, Seguimiento Universidad de Navarra.
*P value from Student’s t test <0·05.
**P value from χ 2 test <0·05.
†Classification according to the WHO.
We wanted to assess FFQ reproducibility and not actual changes in the diet. For this reason, we divided the sample in two groups, depending on the period of time between the answer from the first and the second questionnaires. Participants who answered within a time difference of less than 1 year were included in group 1; the mean (sd) difference was 7·06 (4·12) months, the minimum difference was for the questionnaire answered the same day and the maximum value was for questionnaires answered with 11·9 months of difference. In group 2, we included participants with a time difference greater than 1 year; the mean (sd) was 26·91 (14·84) months. The questionnaire answered with the lowest value of time difference was 12·4 months and the highest value of difference was 7 years (84·64 months).
Dietary assessment
To assess dietary exposures, we used a semi-quantitative 136-item FFQ. For each food item, a commonly used portion size was specified (slices, glass, teaspoons, etc.), and the participants were asked how often they had consumed that unit on average over the previous year. Emphasis was added to ensure that the answers were related to long-term dietary exposures and not to recent changes in diet. Nine options for frequency of consumption were offered: never or hardly ever, one to three times a month, once a week, two to four times a week, five to six times a week, once a day, two to three times a day, four to six times a day and more than six times a day. All completed questionnaires were checked by a dietitian for accuracy and completeness. In the present study, we selected only completed FFQ. In addition, particular questions regarding oil consumption used in frying, as a spread, or as salad dressing, and the type of fat used in frying were specifically assessed.
A dietitian updated the nutrient databank using the latest available information included in the food composition tables for Spain(Reference Moreiras, Carbajal and Cabrera12, Reference Mataix Verdú13), after receiving and processing the FFQ. Nutrient intake scores were computed with an ad hoc computer program that was specifically developed for this purpose, by calculating it as the sum of frequency of consumption multiplied by nutrient composition of a specified portion size(Reference Willett14). The selected frequency item was converted to a daily intake. For example, if a response was 5–6 times a week, it was converted to 0·78 servings per day (5·5 week/7 d).
Food groupings are specified in Table 2.
Statistical analyses
We selected only complete FFQ for the analyses (93·4 % of 347). We compared self-reported variables such as sex, age, weight, BMI, smoking habit, energy and macronutrient intake and alcohol consumption from the SUN cohort and from the reproducibility study subsample, to ensure that there were no relevant differences. Weight and BMI were previously validated in a subsample of our cohort (correlation coefficient was 0·99 for weight and 0·94 for BMI)(Reference Bes-Rastrollo, Pérez Valdivieso and Sánchez-Villegas15).
To evaluate the magnitude of the association and the comparison between the two time periods, Pearson correlation coefficients were computed between both measures in both study groups (95 % CI). Pearson correlation coefficients are presented since we analysed a sufficiently large sample, and for that reason, we assumed that the outcomes were normally distributed(Reference Lumley, Diehr and Emerson16).
All foods, groups of food, drinks and nutritional variables were adjusted for total energy intake through the residual method: total energy intake was used as an independent variable in a regression model with the nutrient intake as a dependent variable. Residuals were added to the expected nutrient intake for the mean energy intake of the sample, giving, as a result, a nutrient score uncorrelated with total energy intake(Reference Martínez-González, Palma and Toledo17–Reference Willett19).
The presence of intra-individual variations tends to attenuate the correlation between the two FFQ, and for that reason, we calculated the Pearson correlation coefficients deattenuated for within-person variability(Reference Rosner and Willett20, Reference Rimm, Giovannucci and Stampfer21) based on the adjusted values. We corrected for within-person variablility using the following formula: , where rc is the corrected correlation coefficient, r 0 is the observed correlation coefficient for adjusted nutrient intake, is the within-person variation, is the between-person variation and n is the number of replicate measurements(Reference Willett, Sampson and Stampfer18).
The average of Pearson correlation coefficients for foods and drinks and for nutrients was calculated taking coefficients as a continuous variable to give a measurement of central tendency.
To compare the correlation coefficients r between the two groups (<1 year v. ≥ 1 year), an approximate variance-stabilising transformation for r (the Fisher transformation) was used. This transformation gets the outcome that the variance of the transformed coefficient is approximately constant and allows hypothesis testing using a conventional approach (unpaired t test)(Reference Rosner22).
To assess gross misclassification, participants were categorised into quintiles of nutrient intake or food consumption according to the measures from the first and second questionnaires. The percentage of misclassification was estimated. Data were considered as misclassified, if the difference in classification by both questionnaires was in the lowest quintile in FFQ1 and in the highest quintile in FFQ2 or the highest quintile in FFQ1 and the lowest quintile in FFQ2. We considered a reasonable classification when an item was in the same or adjacent quintile in the first and the second questionnaires(Reference Willett, Sampson and Stampfer18, Reference Friis, Krüger and Stripp23, Reference Deschamps, de Lauzon-Guillain and Lafay24, Reference Messerer, Johansson and Wolk25).
All analyses were performed with Statistical Package for the Social Sciences statistical software package version 15·0 for Windows (SPSS Inc., Chicago, IL, USA).
Results
The study subsample was small in comparison with the whole cohort. Subsample selection was not done randomly, which is a limitation for our study, but we considered these results as being applicable to the other participants. Data of sex, age, weight, height and smoking status were also self-reported along with the FFQ. As shown in Table 1, differences did not exist between the subsample and the whole cohort except for weight and BMI in which we observed significant differences. By comparing BMI difference classified by categories according to the WHO, we also found a statistically significant difference.
Table 3 shows Pearson correlation coefficients and corrected correlations for foods and food groups between the two FFQ stratified by time between the completion and P values for between-group comparisons. The foods reported less than 1 year apart and that had the highest corrected correlation values were butter and animal fats (r = 0·94), vegetable fats (r = 0·80), French fries (r = 0·72), processed pastries (r = 0·70), high-fat dairy products and meat products (r = 0·66); olive oil (r = 0·99), low-fat dairy products (r = 0·86) and French fries (r = 0·74) were among the foods with the highest corrected correlation reported after 1 year.
*P < 0·05 for adjusted correlation coefficients.
**P < 0·01 for adjusted correlation coefficients.
†P value between groups was calculated using an approximate variance-stabilising transformation for r (the Fisher transformation) , where and Reference Rosner(22).
‡Items adjusted for total energy intake.
§Corrected for within-person variablility using the following formula: , where rc is the corrected correlation coefficient, r 0 is the observed correlation coefficient for adjusted nutrient intake, is the within-person variation, is the between-person variation and n is the number of replicate measurementsReference Willett, Sampson and Stampfer(18).
All the correlations for the drink items that were analysed (Table 3) were statistically significant (P < 0·05). The highest corrected correlations were observed in the group with less than a 1-year period between both FFQ (group 1), for beer (r = 0·93), diet soft drinks (r = 0·90), alcoholic drinks (r = 0·89), wine (r = 0·72) and soft drinks (r = 0·66).
To sum up, we observed that for foods and drinks, the average corrected correlation for questionnaires answered less than 1 year apart, was 0·56, and the values ranged from 0·22 for eggs to 0·94 for animal fats and butter. However, for questionnaires answered more than 1 year apart, the average corrected correlation was 0·48, and values ranged from 0·17 for distilled liqueurs to 0·99 for olive oil.
Table 4 presents corrected Pearson’s correlation coefficients of nutrients and P values between groups. For the first group, noteworthy results were seen for PUFA (r = 0·99), alcohol (r = 0·85), caffeine (r = 0·80), folic acid (r = 0·78), iron (r = 0·77), vitamin B2, magnesium and vegetable fibre (r = 0·69), potassium (r = 0·64), and vitamin B1 and vitamin C (r = 0·62). Otherwise, the lowest values of correlation were shown for fruit fibre (r = 0·21), MUFA (r = 0·25), selenium (r = 0·24), glycaemic load (r = 0·25) and cereal fibre (r = 0·27) for FFQ reported less than 1 year apart. In contrast, all the nutrients had significant correlations when reported more than a year apart. The highest values were observed for cereal fibre (r = 0·89), fruit fibre (0·66), caffeine (r = 0·66), magnesium (r = 0·65) and PUFA (r = 0·64).
*P < 0·05 for adjusted correlation coefficients.
**P < 0·01 for adjusted correlation coefficients.
†P value between groups was calculated using an approximate variance-stabilising transformation for r (the Fisher transformation) , where and Reference Rosner(22).
‡Items adjusted for total energy intake.
§Corrected for within-person variablility using the following formula: , where rc is the corrected correlation coefficient, r 0 is the observed correlation coefficient for adjusted nutrient intake, is the within-person variation, is the between-person variation and n is the number of replicate measurementsReference Willett, Sampson and Stampfer(18).
To summarise Table 4, for nutrients, the average correlation was 0·53 with a range from 0·21 for fruit fibre to 0·99 for PUFA, and 0·51 with a value ranging from 0·37 for glycaemic load and vitamin E to 0·89 for cereal fibre, among the questionnaires answered less than 1 year and more than 1 year apart, respectively.
With regard to P values in between-group comparisons, statistically significant differences were observed for PUFA, folic acid, vitamin B2, vitamin E, Iron, Sodium, fruit fibre, cereal fibre and alcohol.
In the misclassification analysis (Table 5), no gross misclassification was apparent for eggs, vegetables, alcoholic drinks, vitamin B6 and folic acid in group 1 and for fruits in group 2. The highest misclassification in group 1 (<1 year apart) was observed for fish or seafood (7·6 % of misclassification), cereals (7·6 % of misclassification) and n-3 fatty acids (7·6 % of misclassification). The worst values in group 2 (≥1 year apart) were for fat (4·6 % of misclassification) and glycaemic load (4·6 % of misclassification).
In the same or adjacent quintile, we observed the best results for alcoholic drinks (86·4 %), energy (83·3 %), carbohydrates (83·3 %) and eggs (80·3 %) for participants who answered both questionnaires in less than 1 year. For those who answered the questionnaires with more than 1 year in between, the highest proportions of participants in the same or adjacent quintile were observed for fibre (75·4 %), fruits (74·2 %) and folic acid (73·5 %).
Discussion
Our study suggests that FFQ reproducibility might be acceptable for most nutrients and food items, supporting the finding that this FFQ is a valid tool for nutritional epidemiology. Participants were not aware that any reproducibility study was being conducted.
Our results are consistent with findings from previous European studies. An FFQ was self-administered twice to a sample of volunteers of a Mediterranean region of Spain, with a 6-week interval. The correlation values ranged from 0·60 to 0·95 (mean = 0·86) and from 0·52 to 0·94 (mean = 0·83) for Pearson’s and intra-class correlation coefficients, respectively(Reference Schröder, Covas and Marrugat26).
In a questionnaire carried out by the German part of the European Prospective Investigation into Cancer and Nutrition study (EPIC), results on reproducibility and relative validity of measurement of food group intake were reported. The repeated administration of the FFQ to the same study subjects was carried out at a 6-month interval. Spearman test–retest correlations ranged from 0·49 for bread to 0·89 for alcoholic beverages (median = 0·70). In that study, two different versions of their FFQ were administered. Correlations were also improved by correction for attenuation due to within-person error in the reference method(Reference Bohlscheid-Thomas, Hoting and Boeing27).
In another study in middle-aged Danish women, after having completed the FFQ twice at a 1-year interval, the Pearson correlation coefficients between the mean nutrient intakes from the two questionnaires ranged from 0·53 to 0·76(Reference Friis, Krüger and Stripp23).
In Finland, a sample of pregnant women completed the FFQ twice at a 1-month interval. The intra-class correlation coefficients between questionnaires ranged from 0·44 to 0·91 for foods. The correlation coefficients were highest for the items consumed daily, such as coffee (0·91), low-fat milk (0·85) and butter (0·81), and lowest for rarely eaten foods such as ice cream (0·44), oils (0·54) and low-fat spreads (0·55). The intra-class correlation coefficients for nutrients ranged from 0·42 (ethanol) to 0·72 (sucrose, riboflavin and calcium). The average of all correlation coefficients for foods and nutrients was 0·65(Reference Erkkola, Karppinen and Javanainen28).
Similar results were shown in other worldwide studies. In the Nurses’ Health Study, the average correlation coefficients between repeated questionnaires administered at an interval of about 1 year was 0·57. For 23 % of the food items, the correlation coefficient was ≥0·70, and for 73 % was ≥0·50. This level of reproducibility is comparable to that of many biological measurements that are strong predictors of disease in epidemiological studies(Reference Salvini, Hunter and Sampson29).
A study from the University of Toronto suggested that an FFQ is comparable with an interviewer-administered diet history as a predictor of nutrients as estimated from a 7 d food record(Reference Jain, Howe and Rohan30).
Similar results were shown in a North Indian population. There was good a correlation between the nutrient values as calculated by the FFQ and a 5 d diet record. The correlation for energy intake was 0·80, and for other nutrients (after adjusting for calories) varied between 0·45 and 0·68. In general, the FFQ overestimated the energy-adjusted nutrient intake by 6 %–17 %. Referring to reproducibility, after the readministration of the FFQ (3 months interval), a moderate-to-strong correlation (energy-adjusted r = 0·49–0·90) was observed between the two evaluations for various nutrients(Reference Pandey, Bhatia and Boddula31).
A major limitation of the present study is that the sample was not randomly chosen. We sent unintentionally twice the FFQ to potential participants (university graduates, regional associations of physicians, nurses, pharmacists, dentists and engineers). They were supposed to answer once (specified in the invitation letter), but some participants filled the questionnaire twice. This kind of selection of a reproducibility subsample could have biased the estimate of the FFQ reproducibility. If participants completed twice the questionnaire because they forgot to have already completed it, the estimate could be lowered because the memory of these participants was probably worse than that of the whole cohort. On the contrary, if these participants intentionally completed the questionnaire twice, they could be more health conscious and recall better, thus leading to an overestimation of the reproducibility(Reference Messerer, Johansson and Wolk25). However, we believe that the second possibility is less likely to have happened and the main explanation is that participants forgot that they had already answered the questionnaire.
Another limitation is the time passed between the two FFQ. Controversy does exist referring to this issue. We know that there is no perfect method. It is unrealistic to administer the FFQ at a very short interval, such as a few days or weeks, as subjects may simply tend to remember their previous responses(Reference Willett19). In contrast, when a longer interval of time is used (more than 1 year), true change in dietary intake as well as variation in response contributes to reducing reproducibility. This explanation closely fit our results, which are better in the group that answered twice in less than 1 year.
As we argued in the Results section, BMI in the subsample used for our reproducibility study was significantly lower than that of the whole SUN cohort. The lower rate of obesity in the subsample could over-estimate a correlation, because as is shown in other studies, under-reporting is positively associated with obesity, special diets, smoking and age(Reference Mendez, Wynter and Wilks32).
Our analysis showed that Pearson’s corrected correlation coefficients were lower for individual foods than for food groups. Results for whole-wheat bread and white bread showed lower values of reproducibility than the cereal group in which they are included. We observed the same tendency for chicken v. the group (chicken, turkey and rabbit), chicken v. meat, olive oil v. vegetable fats, butter v. animal fats, beer v. alcoholic drinks and sugar-sweetened soft drinks v. soft drinks. This might be explained by a compensatory effect. It seemed to be easier for participants to remember their intake as a whole depending on food group than for individual foods. Also, effects of underestimation and overestimation of separated foods could be compensated within the same group of foods.
Despite the results of the present study suggesting that the FFQ is appropriate for use in a particular study, it is important to be aware of the strengths and limitations of the method. To conclude, we would like to emphasise that no dietary method can measure dietary intake without error(Reference Cade, Thompson and Burley33). Although improvement of dietary assessment methods is a worthy pursuit, to abandon the FFQ, which is highly informative in epidemiological applications, before alternatives are shown to be superior would be unwise(Reference Willett and Hu34).
As there are several studies about seasonal influences in diet(Reference Capita and Alonso-Calleja35), we propose more studies to evaluate the influence of the different seasons in which the questionnaires are completed. More studies are needed to test the best way to assess diet among population subgroups.
Conclusion
Despite the fact that our participant selection was not random, relevant differences did not exist between the subsample and the whole SUN cohort and the results of the present study can be applied to the whole cohort. In conclusion, our study suggests that FFQ reproducibility might be acceptable for participants who answered the questionnaires in less than 1 year and we could consider the SUN FFQ as a useful tool for measuring diet(Reference Willett19).
Acknowledgements
All authors have participated in the concept and design, interpretation of data, drafting or revising of the manuscript. The authors do not have any conflict of interest. The SUN Study has received funding from the Spanish Ministry of Health (Grants PI030678, PI040233, PI070240, PI070312 and PI081943), the Navarra Regional Government (PI141/2005) and the University of Navarra. C.F.-A. and Z.V.-R. were responsible for study concept and design. C.F.-A., Z.V.-R. and M.B.-R. were responsible for acquisition and analysis of the data. M.B.-R., L.S. and M.A.M.-G. were responsible for critical revision of the manuscript for important intellectual content. All authors approved the final version of the manuscript. The authors thank all members of the SUN Study Group for administrative, technical and material support (specially to S. Benito for her tireless task of checking duplicated questionnaires). The authors also thank participants of the SUN Study for continued cooperation and participation. Finally, the authors appreciate C.N. Lopez for her editing of the English version of the article.