In the past few decades, researchers have started to rethink how food intake should be assessed and interpreted. Historically, FFQ have been predominantly used in large cohort studies( Reference Shim, Shin and Oh 1 – Reference Kirkpatrick, Vanderlee and Raffoul 3 ). According to Kirkpatrick et al.( Reference Kirkpatrick, Vanderlee and Raffoul 3 ), up to 64% of the previous and ongoing Canadian studies rely on FFQ or dietary screeners while only 14% are using 24 h recalls. However, studies with recovery biomarkers have consistently reported that multiple 24 h recalls describe energy and protein intakes with higher precision than FFQ( Reference Freedman, Commins and Moler 4 , Reference Prentice, Mossavar-Rahmani and Huang 5 ). While multiple 24 h recalls are considered expensive and time-consuming, web technology now opens the way to a new wave of self-administered automatic tools for use in large cohorts( Reference Kohlmeier, Mendez and McDuffie 6 , Reference Taren, Dwyer and Freedman 7 ). However, the validity of these new automated 24 h recalls has to be demonstrated. To be considered valid and reliable, they have to measure what they are meant to measure consistently over time( Reference Margetts and Nelson 8 ). On the one hand, they should provide an adequate estimation of nutrient intakes and identify deficiencies( Reference Gleason, Harris and Sheean 9 ). On the other hand, they are also supposed to capture usual intakes. Therefore, reported energy intake (rEI) should be consistent with energy needs to sustain normal activities.
Under-reporting is usually described as implausibly low rEI. To be categorized as such, energy intake has to be significantly lower than estimated or measured daily energy expenditure( Reference Goldberg and Black 10 ). The use of doubly labelled water is an unbiased way of assessing daily energy expenditure in real-life settings( Reference Schoeller 11 ). In a review published in 2001, Hill and Davies( Reference Hill and Davies 12 ) revealed that compared with doubly labelled water, usual rEI from any food assessment tool was associated with a certain degree of under-reporting. Nevertheless, it has been suggested that repeated 24 h recalls would be one of the food assessment methods with the lowest rate of under-reporting, ranging from 10 to 20%( Reference Freedman, Commins and Moler 4 , Reference Prentice, Mossavar-Rahmani and Huang 5 , Reference Subar, Kipnis and Troiano 13 – Reference Tooze, Subar and Thompson 15 ). In the absence of nutritional biomarkers, under-reporting is usually assessed as the ratio between rEI and BMR below the lower limit of physical activity level considered plausible. Goldberg et al.( Reference Goldberg and Black 10 ) suggested that when rEI:BMR is below 1·35, this would be indicative of under-reporting while over-reporting would correspond to rEI:BMR above 2·5. As described by Willett et al.( Reference Willett, Howe and Kushi 16 ), nutrient intakes can be further adjusted for energy intake to improve diet description and to strengthen the associations with health outcomes.
An ideal gold standard for dietary intake assessment is difficult to find. Some studies use direct observation, but this is possible only in a clinical setting and not representative of usual intakes. Recovery biomarkers such as doubly labelled water for energy intake are also interesting options. However, such biomarkers mirror specific aspects of the diet, but they cannot reflect global dietary patterns( Reference Margetts and Nelson 8 ). Newly developed techniques are therefore usually compared with an established one to determine if they can produce equivalent results within predetermined limits( Reference Hanneman 17 ). This approach refers to relative validity( Reference Gleason, Harris and Sheean 9 ). In studies evaluating relative validity, authors often use similar established statistical approaches( Reference Cade, Burley and Warm 18 , Reference Willett 19 ). Most often, reported macro- and micronutrient intakes from the new tool are compared with reported intakes obtained from a reference tool. It is expected that this reference method demonstrates a good level of validity, although not necessarily providing a perfect assessment of dietary intakes( Reference Gleason, Harris and Sheean 9 , Reference Lombard, Steyn and Charlton 20 ). The food record (FR) has been shown to perform reasonably well compared with biological markers, especially when subjects are asked to weigh their foods and report specific recipes corresponding to what they actually ate( Reference Barrett and Gibson 21 – Reference Ortega, Pérez-Rodrigo and López-Sobaler 24 ). This method has been favoured in two recent web-based 24 h recall validation studies because of its independence with the new tools in terms of assessment bias( Reference De Keyzer, Huybrechts and De Vriendt 25 , Reference Comrie, Masson and McNeill 26 ).
The R24W is a new, automated, self-administered, web-based 24 h recall designed to assess nutritional intakes in the French-Canadian population. This tool uses a data collection approach inspired by the automated multiple-pass method from the US Department of Agriculture( Reference Moshfegh, Rhodes and Baer 27 ). A total of 2568 different food items and 687 recipes are available in the R24W( Reference Jacques, Lemieux and Lamarche 28 ). Respondents are guided to recall their previous day’s intake, meal by meal. Pictures of up to eight portion sizes are proposed for each food item described by unit and/or volume. Its development has been discussed in detail elsewhere( Reference Jacques, Lemieux and Lamarche 28 ). A first validation study was conducted in a context of fully controlled feeding studies, in which we showed that there was no systematic bias in portion size estimation with the R24W( Reference Lafrenière, Lamarche and Laramée 29 ). The aim of the present study was to assess the relative validity of the R24W, for assessment of energy and nutrient intakes among French Canadians, using established statistical validation approaches and intakes from a 3d FR as reference. We hypothesized that the R24W accurately estimates participants’ usual energy and nutrient intakes with fewer than 20% of under-reporters.
Methods
Population
Seventy-five women and seventy-five men between 18 and 65 years of age from the Québec City metropolitan area were recruited through electronic messages sent to the Laval University community as well as via the electronic newsletter of the research institute that reaches individuals outside the university. Exclusion criteria were pregnancy, lactation and digestive problems causing malabsorption, to avoid any interaction in the analysis of blood biomarkers taken for an upcoming analysis. All women and seventy-two men completed all the study requirements. The protocol was in accordance with the Declaration of Helsinki and was certified by the Laval University Ethics Committee. All subjects signed a consent form prior taking part in the project.
Study protocol and measurements
Participants were invited to an initial visit at the research institute where their body weight, height and body composition were assessed (TANITA body composition analyser BC-418; Tanita Corporation, IL, USA). Then, they received verbal and written instructions by a dietitian on how to fill out the 3d FR, with the intent to reduce social desirability biases( Reference Turgeon O’Brien and Dufour 30 ). They had to complete the record on a weekend day and on two weekdays and they were asked to weigh and measure what they ate as well as to attach recipes or food labels of items consumed to improve accuracy of food assessment. Every FR was revised by a trained dietitian upon return to ensure that the information provided was complete and clear. This was done in order to minimize estimation and reporting errors, since the FR was being used as the reference method in this validation study. Coding was also conducted by trained staff with Nutrific software (Laval University, QC, Canada), which was linked to the Canadian Nutrient File database (Health Canada, 2010).
Afterwards, participants received emails on unannounced days inviting them to complete the R24W four times during a 20d period. If participants did not complete the 24 h recall on the day they received the email, the access was cancelled and another email was sent on another unannounced day. Briefly, R24W is inspired by the automated multiple-pass method of the US Department of Agriculture( Reference Moshfegh, Rhodes and Baer 27 ), but opposed to the automated multiple-pass method, the R24W is using a meal-based approach in the first step. When completing the R24W, the respondent can add an unlimited number of meals or snacks per 24 h period. In terms of data management, R24W allows automatic calculation of different diet quality scores in addition to energy and nutrient intakes. A detailed description of the R24W has been published elsewhere( Reference Jacques, Lemieux and Lamarche 28 ). As there was no schedule imposed for the completion of the R24W, for the purposes of the current analysis, data of subjects who completed two weekdays and one weekend day were gathered for the comparison with the FR (107 participants; fifty-seven women and fifty men). In cases where all four recalls were eligible, we chose the first two weekdays and the first weekend day completed. Mean intakes from the 3d FR and from the three R24W days were used in the analyses. During the testing period, subjects were asked not to make any noticeable changes in their usual diet. Use of diet supplements was not taken into account for this validation analysis. Each participant also had to complete questionnaires to gather information about medical history (including questions about weight stability) and sociodemographic variables.
Statistical approaches
Mean daily intakes and standard deviations for energy and twenty-four nutrients were assessed with the R24W and the FR. More precisely, carbohydrates, proteins, fat, percentage of energy from carbohydrates, percentage of energy from proteins and percentage of energy from fat, fibre, vitamin A, thiamin, riboflavin, niacin, vitamin B6, folic acid, vitamin B12, vitamin C, vitamin D, Mg, Zn, Fe, Ca and K were selected because they are recognized as key nutrients in Canada’s Food Guide( 31 ). SFA, Na and alcohol were also assessed because of their importance in the aetiology of metabolic diseases( Reference Eckel, Jakicic and Ard 32 , Reference Corrao, Rubbiati and Bagnardi 33 ). Student’s paired t test was used to determine whether there was a significant difference between the two methods in the assessment of each selected nutrient. Then, the strength of the association between reported intakes using the R24W and reported intakes with the FR was assessed for each nutrient with the Pearson correlation coefficient. Analyses were conducted on raw and on deattenuated sex- and energy-adjusted data. The adjustment for energy was calculated using the residual method( Reference Willett, Howe and Kushi 16 ). The deattenuation was computed using the ratio of within- to between-person variability of each tool and the number of days of data collection, to adjust for day-to-day variation in intakes( Reference Willett 19 ). Cross-classification (percentage of agreement) and weighted kappa (κ w) were assessed to determine if both methods tended to classify respondents in the same quartile. Then, Bland–Altman plots were used to assess agreement at an individual level across the range of intakes. Bland–Altman plots show the relationship between the difference and the average of two measures. A significant association demonstrates a proportional bias between these two measures( Reference Bland and Altman 34 ). Lastly, the relative validity outcome of each test was compared with criteria proposed by Lombard et al.( Reference Lombard, Steyn and Charlton 20 ), based on the work of other authors( Reference Masson, McNeill and Tomany 35 – Reference Willett, Sampson and Stampfer 38 ) and categorized as good, acceptable or poor to provide an overview of the relative validity of all nutrients tested. Relative validity was considered good in each of these situations: deattenuated sex- and energy-adjusted correlation coefficient ≥0·50; classification of ≥50% of respondents in the same quartile; classification of <10% of respondents in opposite quartiles; κ w≥0·61; difference between measures from both methods≤10·9 %; non-significant Student’s t test (P≥0·05); and non-significant slope in the Bland–Altman plot (P≥0·05). Relative validity could be judged as being acceptable when the deattenuated sex- and energy-adjusted correlation coefficient was between 0·20 and 0·49; when κ w was between 0·20 and 0·60; and when the difference between measures from both methods was between 11 and 20%. Finally, relative validity was considered poor when the deattenuated sex- and energy-adjusted correlation coefficient was <0·20; when <50% of respondents were classified in the same quartile; when ≥10% of respondents were classified in opposite quartiles; when κ w was <0·20; when the difference between measures from both methods was ≥20%; and when results from Student’s t test and the slope from the Bland–Altman plot were significant (P≤0·05). Agreement between tests and overall relative validity were then evaluated by the total of good, acceptable and poor validity scores obtained for each nutrient.
To determine the relative validity of energy intake assessed by the R24W, a comparison between reported intakes and estimated energy needs was conducted to identify under-reporters, adequate reporters and over-reporters. BMR was estimated with the Mifflin–St Jeor equation( Reference Frankenfield, Roth-Yousey and Compher 39 ) and under-reporters were classified as individuals with rEI:BMR <1·35 while over-reporters were classified as those with rEI:BMR >2·5( Reference Goldberg and Black 10 ). Lastly, to determine if a similar number of under-reporters was identified with the new tool and the FR, the McNemar χ 2 test for paired data was used to compare under-reporters (rEI:BMR <1·35) and non-under-reporters (rEI:BMR ≥1·35) between the two dietary assessment methods.
Log-transformed data were used to improve normality for all variables. Statistical analyses were conducted with the statistical software package SAS version 9.4.
Results
The main characteristics of the participants are presented in Table 1. Mean age of the participants was 47·4 (sd 13·3) years and they had a mean BMI of 25·5 (sd 4·4) kg/m2. Fifty-seven per cent of them reported being weight stable for the last 3 months. Ninety-six per cent of participants were Caucasian and 63·6% had a university degree.
Table 2 presents percentage differences as well as correlations between R24W and FR for energy and nutrient intakes. Mean values of eighteen out of twenty-five variables assessed with R24W (72%) were within 10% of the mean values obtained with FR. The largest differences were observed for niacin (−54·8%) and alcohol (+40·0%). rEI, fat intake, alcohol intake, percentage of energy from carbohydrates and proteins, SFA intake as well as intakes of eight micronutrients were significantly different between the R24W and the FR (P<0·05). However, all raw correlations (r=0·28–0·61) and all but one (Zn at r=0·02) sex- and energy-adjusted deattenuated correlations (r=0·35–0·72) were significant (P<0·01).
%E, percentage of energy.
*Student’s t test and Pearson correlation with a P value of <0·05.
The cross-classification analysis indicated that, on average, the participants were classified in the same quartile in 40·0% of the cases (range: 29·9–50·5%) and in the adjacent quartile in 40·0% of the cases (range: 27·1–46·7%), while they were grossly misclassified (e.g. classified in quartile 1 with one method and quartile 4 with the other method) in 3·6% of the cases (range: 0·9–6·5%; Table 3). The κ w values ranged from 0·16 to 0·47 with a mean of 0·33 (Table 3). The Bland–Altman analysis showed a proportional bias for some of the nutrients, but with different patterns. For fat, alcohol, vitamin D and Zn, intakes assessed with the R24W were on average higher than intakes from the FR and the degree of overestimation was proportional to levels of intake. For vitamin A and Mg, there was a noticeable difference in intakes assessed by both tools only in those who reported consuming the largest amounts of these nutrients. Intakes of niacin were underestimated by the R24W compared with the FR and this underestimation became more important in those who consumed a larger amount of niacin. Finally, the intake of vitamin C seemed to be overestimated by the R24W in those who consumed a smaller amount and underestimated in those who consumed a larger amount relatively to the FR (see plots in the online supplementary material). Next, Table 4 combines the relative validation assessment of the six tests performed. Protein was the nutrient for which assessment with both tools demonstrated the highest agreement, while all tests resulted in good or acceptable relative validity outcomes. Carbohydrate, percentage of energy from fat, folic acid, vitamin C, Fe, K and fibre also received mostly results of good or acceptable relative validity outcomes and had only one poor outcome which was related to the proportion of classification in the same quartile (below 50%). However, for niacin, vitamin C and Zn, results for the majority of the tests (4/7) corresponded to poor outcomes.
κ w, weighted kappa; %E, percentage of energy.
κ w, weighted kappa; %E, percentage of energy.
Lastly, based on data from the R24W, 15·0% of participants were characterized as under-reporters, compared with 23·4% with the FR (Table 5). When we classified the participants as under-reporters or non-under-reporters, we observed that the difference in the proportion of under-reporters between methods did not reach statistical significance (P=0·07). Almost three out of every four participants (72·9%) were classified within the same category by both tools, 26·2% were one category apart (e.g. under-reporter with one method and adequate reporter with the other one) while only one participant (0·9%) was grossly misclassified (identified as an under-reporter with the R24W and as an over-reporter with the FR). Lastly, the proportion of participants who reported a recent weight loss was not higher in the under-reporter group (18·8% in under-reporters v. 22·0% in adequate reporters and 0% in over-reporters, P=0·79, as assessed by the R24W; 28·0% in under-reporters v. 18·8% in adequate reporters and 0% in over-reporters, P=0·48, as assessed by the FR).
Discussion
It is of first importance to test the validity of newly developed food assessment tools. The present study showed an acceptable level of agreement for energy and nutrient intakes between data generated by a newly developed 24 h recall, the R24W, and data from the 3d FR, used as the reference method.
In terms of nutrient intakes, our results are comparable to those of the first Belgian food consumption survey using the EPIC-SOFT program, in which intakes from computer-assisted 24 h recalls were compared with intakes assessed by an FR. In that study, raw correlation coefficients between the two methods for energy and nutrient intakes ranged from 0·16 to 0·62( Reference De Keyzer, Huybrechts and De Vriendt 25 ). Many nutrients for which significant differences were observed in the first Belgian food consumption survey are the same as the ones for which we observed differences in our study. Indeed, in both studies, there was a difference in reported intakes of energy, fat, SFA, vitamin C, thiamin and riboflavin. Furthermore, results from both studies revealed that energy intake was higher when assessed with 24 h recalls than with the FR, and it was associated with a higher reported intake of fat. As stated by the authors( Reference De Keyzer, Huybrechts and De Vriendt 25 ), the higher value of reported fat intake with 24 h recalls than with the FR could be related to the numerous questions included about frequently forgotten food items like added fat, spreads or sauces. This higher value of reported fat intake could indeed reflect a more reliable assessment of fat, a nutrient known to be often underestimated from biomarker studies( Reference Heitmann, Lissner and Osler 40 ).
Our analysis showed that the mean sex- and energy-adjusted deattenuated correlation coefficient was 0·52, which respects the criterion for a good relative validity outcome( Reference Masson, McNeill and Tomany 35 ). Regarding cross-classification, although all nutrients except one (protein) did not reach the criterion for good relative validity, our results are comparable to those of others. Indeed, for all nutrients, an average of 80% of participants were classified in the same or the adjacent quartile. Moreover, fewer than 10% (range: 0·9–6·5%) of the participants were classified in the opposite quartile, showing a very low proportion of extreme misclassification. These results are similar to those of a study in which an FFQ was validated with an FR in a similar population where, on average, 77·0% of participants were classified in the same or the adjacent quartile and 5·0% were grossly misclassified (opposite quartiles)( Reference Labonté, Cyr and Baril-Gravel 41 ). The Bland–Altman analysis revealed that the magnitude of the difference between both tools was not equal through the range of mean intakes for eight nutrients. This means that the mean difference between the two tools increases in the larger or the smaller values. This is not an unusual observation. In a study aiming to evaluate the validity of a new FFQ designed for assessing adolescents’ intakes, Ambrosini et al.( Reference Ambrosini, de Klerk and O’Sullivan 42 ) observed that 19/22 nutrients tested showed a significant proportional bias in either boys or girls as illustrated by the regression line of the Bland–Altman plot.
The relative validity was not the same for all nutrients studied. However, for fibre, SFA and Na, which are nutrients frequently associated with metabolic health( Reference Eckel, Jakicic and Ard 32 , Reference Chiuve, Fung and Rimm 43 ), we mostly obtained results associated with good or acceptable relative validity. This suggests that the R24W would be an adequate tool to assess dietary intakes in nutritional epidemiological studies addressing issues related to metabolic health. It is worth mentioning that reported intakes for SFA, Na and alcohol are higher with the R24W than with the FR. This supports the idea that social desirability bias is reduced with the web-based dietary assessment tool.
Overall, there are three nutrients in our study for which the relative validity is questionable. For niacin and vitamin C, poor validity outcomes are mainly related to criteria of agreement at a group level; while for Zn, associations and agreement at the individual level as well as agreement at a group level seem to be poor. Since each self-reported dietary assessment tool has some limitations, it is not possible to determine based on our results that the R24W would systematically produce erroneous estimation for these specific nutrients. However, it would be wiser to interpret with caution estimation of those nutrients evaluated with the R24W. Our next step will be to identify food items that could explain the large discrepancies between the two methods compared. In a larger perspective, it seems that the tests used to evaluate agreement at a group level (percentage difference, Student’s t test and Bland–Altman) and those evaluating agreement at an individual level (Pearson correlations, cross-classification and κ w) were characterized by an equivalent number of good and poor outcomes for the majority of nutrients tested. This, combined with the small proportion of the cohort characterized as under-reporters, suggests that this new, web-based 24 h recall would be suitable to assess dietary intakes in research projects aiming to evaluate intakes at either a group or an individual level.
Under-reporting of dietary intakes has been identified as a major issue for which dietary assessment tools are often criticized. However, the current study demonstrated that the R24W did not produce a higher prevalence of under-reporters compared with the FR. Prentice et al. conducted a study where they compared reported energy intake using a 4d FR and three 24 h recalls with doubly labelled water as a biomarker of energy intake. Similar to what we found, they noticed only a slight difference between the two methods in the proportion of participants identified as under-reporters( Reference Prentice, Mossavar-Rahmani and Huang 5 ).
It is important to mention that the present study aimed to compare usual intakes as assessed by two different tools and that a perfect agreement was not expected. Indeed, data were collected on different days with two self-reported methods associated with some degree of imprecision. Furthermore, even if the FR is considered a gold standard, it has been widely reported that individuals who fill in FR tend to modify what they eat because they know they are being evaluated. This is called reactivity bias. It could result in an underestimation of some nutrients such as fat and alcohol( Reference Midanik 44 ) and in an apparent overestimation generated by other tools in a comparative context( Reference Willett 19 ). This could explain the discrepancies observed between the FR and the R24W for these two nutrients. The reactivity bias is not a problem with a 24 h recall because participants do not know in advance which days will be assessed. However, if participants experience difficulties with short-term memory, assessment by the 24 h recall would be affected( 45 ). Moreover, the R24W offers a wide selection of food items and mixed dishes( Reference Jacques, Lemieux and Lamarche 28 ) but, contrary to the FR where participants could write virtually any possible item, in the R24W choices are limited, which could force some respondents to use predetermined recipes slightly different from what they actually ate.
We also stress that we conducted the current study with a rather small homogeneous cohort of highly educated adults that is not fully representative of the French-Canadian population. These characteristics of the sample limit the generalizability of the results to different populations. Furthermore, we only used three days of FR and of R24W. For the purposes of the present study, we stipulated that this period represented a good estimation of usual intakes. We decided to do so to limit the burden on participants and also because we wanted to validate the tool in a context suitable for larger studies. It is also of importance to mention that we decided not to exclude the under-reporters from the analysis to keep a representative sample.
Compared with most of the validation studies published so far, we improved the analysis by pooling the results of six validation tests to get an overview of the validity for each nutrient. This approach allowed us to identify for which nutrient the tool was more effective, using the FR as a reference method.
Conclusion
The present paper assessed the relative validity of a new, web-based, self-administered 24 h recall, the R24W, for intakes of energy and twenty-four selected nutrients using six different statistical tests in a cohort of French-Canadian adults. This comparative analysis with the FR suggests that the R24W has an acceptable level of relative validity for most nutrients as well as for energy. However, assessment of niacin, vitamin C and Zn with the R24W should be interpreted with caution considering results obtained in the present study.
Acknowledgements
Acknowledgements: The authors would like to acknowledge the work of Pascale Bélanger, Myriam Landry, Amélie Bergeron and Caroline Trahan in the coding of food records. Financial support: This work was funded by the Canadian Institutes of Health Research (grant no. FHG 129921). The Canadian Institutes of Health Research had no role in the design, analysis or writing of this article. Conflicts of interest: None. Authorship: J.L., J.R., B.L. and S.L. were involved in formulating the research question and designing the study. C.L. was involved in carrying out the study J.L. was in charge of analysing the data and writing the article. All authors reviewed and approved the final version. Ethics of human subject participation: This study was conducted according to the guidelines laid down in the Declaration of Helsinki and all procedures involving human subjects were approved by the Laval University Ethics Committee. Written informed consent was obtained from all subjects.
Supplementary material
To view supplementary material for this article, please visit https://doi.org/10.1017/S1368980018001611