Hostname: page-component-cd9895bd7-dzt6s Total loading time: 0 Render date: 2024-12-27T08:57:33.322Z Has data issue: false hasContentIssue false

Reproducibility of data-driven dietary patterns in two groups of adult Spanish women from different studies

Published online by Cambridge University Press:  04 July 2016

Adela Castelló*
Affiliation:
Cancer Epidemiology Unit, National Center for Epidemiology, Instituto de Salud Carlos III, Avenida Monforte de Lemos, 5, 28029, Madrid, Spain Consortium for Biomedical Research in Epidemiology & Public Health (CIBERESP), Instituto de Salud Carlos III, Avenida Monforte de Lemos, 5, 28029, Madrid, Spain Cancer Epidemiology Research Group, Oncology and Hematology Area, IIS Puerta de Hierro (IDIPHIM), Calle Manuel de Falla, 1, 28222 Majadahonda, Madrid, Spain
Virginia Lope
Affiliation:
Cancer Epidemiology Unit, National Center for Epidemiology, Instituto de Salud Carlos III, Avenida Monforte de Lemos, 5, 28029, Madrid, Spain Consortium for Biomedical Research in Epidemiology & Public Health (CIBERESP), Instituto de Salud Carlos III, Avenida Monforte de Lemos, 5, 28029, Madrid, Spain Cancer Epidemiology Research Group, Oncology and Hematology Area, IIS Puerta de Hierro (IDIPHIM), Calle Manuel de Falla, 1, 28222 Majadahonda, Madrid, Spain
Jesús Vioque
Affiliation:
Consortium for Biomedical Research in Epidemiology & Public Health (CIBERESP), Instituto de Salud Carlos III, Avenida Monforte de Lemos, 5, 28029, Madrid, Spain Universidad Miguel Hernandez, Crta. Nacional 332, s/n, 03550, Sant Joan D’Alacant, Alicante, Spain
Carmen Santamariña
Affiliation:
Galician Breast Cancer Screening Program, Galician Regional Health Authority, C/Gregorio Hernández, 2, 4, 15011, A Coruña, Spain
Carmen Pedraz-Pingarrón
Affiliation:
Castile-León Breast Cancer Screening Program, General Directorate of Public Health, Avenida Sierra de Atapuerca, S/N, 09071, Burgos, Spain
Soledad Abad
Affiliation:
Aragón Breast Cancer Screening Program, Aragon Health Service, C/Ronda de Liberación, 1, 44002, Teruel, Zaragoza, Spain
Maria Ederra
Affiliation:
Navarre Breast Cancer Screening Program, Public Health Institute, C/ Leire, 15, 31003, Pamplona, Spain
Dolores Salas-Trejo
Affiliation:
Valencian Breast Cancer Screening Program, General Directorate of Public Health, C/ Micer Mascó, 31, 46010, Valencia, Spain
Carmen Vidal
Affiliation:
Cancer Prevention and Control Unit, Catalonian Institute of Oncology (ICO), Avenida Gran Vía, S/N, km 2.7, 08907, L´Hospitalet de Llobregat, Barcelona, Spain
Carmen Sánchez-Contador
Affiliation:
Balearic Islands Breast Cancer Screening Program, Regional Authority for Health & Consumer Affairs, C/ Cecilio Metelo, 18, 07012, Palma de Mallorca, Islas Baleares, Spain
Nuria Aragonés
Affiliation:
Cancer Epidemiology Unit, National Center for Epidemiology, Instituto de Salud Carlos III, Avenida Monforte de Lemos, 5, 28029, Madrid, Spain Consortium for Biomedical Research in Epidemiology & Public Health (CIBERESP), Instituto de Salud Carlos III, Avenida Monforte de Lemos, 5, 28029, Madrid, Spain Cancer Epidemiology Research Group, Oncology and Hematology Area, IIS Puerta de Hierro (IDIPHIM), Calle Manuel de Falla, 1, 28222 Majadahonda, Madrid, Spain
Beatriz Pérez-Gómez
Affiliation:
Cancer Epidemiology Unit, National Center for Epidemiology, Instituto de Salud Carlos III, Avenida Monforte de Lemos, 5, 28029, Madrid, Spain Consortium for Biomedical Research in Epidemiology & Public Health (CIBERESP), Instituto de Salud Carlos III, Avenida Monforte de Lemos, 5, 28029, Madrid, Spain Cancer Epidemiology Research Group, Oncology and Hematology Area, IIS Puerta de Hierro (IDIPHIM), Calle Manuel de Falla, 1, 28222 Majadahonda, Madrid, Spain
Marina Pollán
Affiliation:
Cancer Epidemiology Unit, National Center for Epidemiology, Instituto de Salud Carlos III, Avenida Monforte de Lemos, 5, 28029, Madrid, Spain Consortium for Biomedical Research in Epidemiology & Public Health (CIBERESP), Instituto de Salud Carlos III, Avenida Monforte de Lemos, 5, 28029, Madrid, Spain Cancer Epidemiology Research Group, Oncology and Hematology Area, IIS Puerta de Hierro (IDIPHIM), Calle Manuel de Falla, 1, 28222 Majadahonda, Madrid, Spain
*
*Corresponding author: Dr A. Castelló, fax +34 91 387 7815, email acastello@isciii.es
Rights & Permissions [Opens in a new window]

Abstract

The objective of the present study was to assess the reproducibility of data-driven dietary patterns in different samples extracted from similar populations. Dietary patterns were extracted by applying principal component analyses to the dietary information collected from a sample of 3550 women recruited from seven screening centres belonging to the Spanish breast cancer (BC) screening network (Determinants of Mammographic Density in Spain (DDM-Spain) study). The resulting patterns were compared with three dietary patterns obtained from a previous Spanish case–control study on female BC (Epidemiological study of the Spanish group for breast cancer research (GEICAM: grupo Español de investigación en cáncer de mama)) using the dietary intake data of 973 healthy participants. The level of agreement between patterns was determined using both the congruence coefficient (CC) between the pattern loadings (considering patterns with a CC≥0·85 as fairly similar) and the linear correlation between patterns scores (considering as fairly similar those patterns with a statistically significant correlation). The conclusions reached with both methods were compared. This is the first study exploring the reproducibility of data-driven patterns from two studies and the first using the CC to determine pattern similarity. We were able to reproduce the EpiGEICAM Western pattern in the DDM-Spain sample (CC=0·90). However, the reproducibility of the Prudent (CC=0·76) and Mediterranean (CC=0·77) patterns was not as good. The linear correlation between pattern scores was statistically significant in all cases, highlighting its arbitrariness for determining pattern similarity. We conclude that the reproducibility of widely prevalent dietary patterns is better than the reproducibility of more population-specific patterns. More methodological studies are needed to establish an objective measurement and threshold to determine pattern similarity.

Type
Full Papers
Copyright
Copyright © The Authors 2016 

Diet is a key modifiable risk factor, but the exploration of its role in disease occurrence is complicated because of methodological issues related to the dietary assessment method used( Reference Bingham, Luben and Welch 1 Reference Willett 3 ), food and nutrient interactions( Reference Jacobs and Steffen 4 , Reference Messina, Lampe and Birt 5 ) and differences in food consumption across populations( Reference Irala-Estevez, Groth and Johansson 6 Reference Teufel 8 ). Traditionally, nutritionists and researchers have explored the effect of individual dietary factors in disease occurrence. However, some authors advocate the use of dietary patterns instead of individual foods and nutrients, arguing that they may better capture variability in the population’s diet, while allowing the evaluation of interactions between dietary factors( Reference Barkoukis 9 Reference Jacques and Tucker 11 ).

These patterns can be identified with data-driven methods such as principal component analysis (PCA), factor analysis (FA) and cluster analysis or can be represented by investigator-driven patterns known as dietary quality indices. Investigator-driven patterns assign a set of scores based on individuals’ fulfilment of a set of fixed recommendations. Therefore, they are widely applicable, facilitating the exploration of the reproducibility of their association with different diseases in independent populations( Reference George, Ballard-Barbash and Manson 12 Reference Reedy, Krebs-Smith and Miller 16 ). However, they present the disadvantage of being very disease dependent, given that they are mainly based on existing evidence of the association between diet and CVD( Reference Fung, McCullough and Newby 17 ). On the other hand, data-driven dietary patterns are more representative of the diet of the specific population from which they have been extracted and independent of the diseases, but many authors argue that the patterns obtained are very population-dependent, and therefore difficult to reproduce in other settings( Reference Jacques and Tucker 11 , Reference Martinez, Marshall and Sechrest 18 , Reference Slattery and Boucher 19 ). The reproducibility of data-driven dietary patterns has been assessed previously by various authors using dietary information obtained with common assessment tools at different moments of time within the same sample( Reference Hu, Rimm and Smith-Warner 20 Reference Newby, Weismayer and Akesson 23 ). However, no previous studies have explored the reproducibility of data-driven dietary patterns extracted from different samples.

The objective of this study was to assess the reproducibility of data-driven dietary patterns in different samples extracted from similar populations. We compared the results from a previous case–control study Epidemiological study of the Spanish group for breast cancer research (GEICAM: grupo Español de investigación en cáncer de mama) on diet and female breast cancer (BC) in Spain( Reference Castelló, Pollan and Buijsse 24 ) with those obtained from a sample of Spanish women attending BC screening programmes (Determinantes de la Densidad Mamográfica en España – Determinants of Mammographic Density in Spain (DDM-Spain)), by evaluating the correlation between pattern scores and the congruence between the composition of patterns in both populations.

Methods

Study population and data collection

We used information on three dietary patterns obtained from a previous case–control study on female BC (EpiGEICAM study) using the dietary intake data of 973 healthy participants, aged 22–71 years, and recruited from fourteen Spanish provinces during the period 2006–2011( Reference Castelló, Pollan and Buijsse 24 ). These patterns will be used as a reference to explore their reproducibility in a different sample using data from the DDM-Spain participants. DDM-Spain is a cross-sectional, multicentre study carried out in seven screening centres belonging to the Spanish BC screening network and located throughout the Spanish peninsula( Reference Lope, Perez-Gomez and Sanchez-Contador 25 , Reference Pollan, Lope and Miranda-Garcia 26 ). In Spain, all women aged 50–69 years (45–69 years in some regions), regardless of nationality or legal status, are invited to be screened under these government-sponsored programmes every 2 years. Women were randomly selected among all screening attendants and invited to participate on a daily basis until the minimum sample size of 500 for each centre was reached. A total of 3550 women were recruited between 2007 and 2008, with an average participation rate of 74·5 % (range 64·7–84·0 % across centres). Women were interviewed at the screening centres by trained interviewers who collected demographic, anthropometric, physical activity, gynaecologic, obstetric and occupational data, as well as family and personal history (including weight and height at age 18 years). Information on smoking included current status and months since quitting for ex-smokers. Current smokers were defined as women who smoked at the time of mammography or had quit <6 months before. Dietary intake during the preceding year was collected using a validated 117-item FFQ( Reference Vioque, Navarrete-Munoz and Gimenez-Monzo 27 , Reference Willett, Sampson and Stampfer 28 ). Postmenopausal status was defined as self-reported absence of menstruation in the previous 12 months. Interviewers measured weight, height and waist and hip circumferences twice using the same protocol and identical balance scales, stadiometers and measuring tapes. A third measure was taken when the first two were not equal.

The DDM-Spain study was conducted according to the guidelines laid down in the Declaration of Helsinki, and all procedures involving human subjects were approved by the bioethics and animal welfare committee at the Carlos III Institute of Health. All participants signed a consent form, including permission to publish results from the current research.

Dietary patterns

The FFQ used in both studies were designed to assess the whole diet, had similar structures and were based on a validated FFQ( Reference Vioque, Navarrete-Munoz and Gimenez-Monzo 27 , Reference Willett, Sampson and Stampfer 28 ). However, the FFQ of the DDM-Spain study included some additional food items that were not contained in the FFQ of the EpiGEICAM study( Reference Lope, Perez-Gomez and Sanchez-Contador 25 , Reference Pollan, Lope and Miranda-Garcia 26 ): the FFQ used in the EpiGEICAM study contained ninety-nine items from which eighty-six were used to create the food groups (after excluding the non-energetic and alcoholic beverages), whereas the FFQ from DDM-Spain included 117 items (the same ninety-nine from DDM-Spain plus eighteen additional foods) from which ninety-nine were used to create the food groups (after excluding non-energetic and alcoholic beverages). In both cases, the dietary information collected was grouped into the exact same twenty-six food groups that are summarised in Table 1, where the items only included in the DDM-Spain study are represented in italics.

Table 1 Description of food groups used in principal component analyses

* Log-transformed intake in grams.

Weighted within the high- and low-fat dairy product categories according to the consumption of whole, semi-skimmed and skimmed milk.

w1=whole/(whole+semi-skimmed+skimmed).

w2=(semi-skimmed+skimmed)/(whole+semi-skimmed+skimmed).

w1 and w2 were 0·5 if consumption was 0 g for whole, semi-skimmed and skimmed milk.

In The additional items included only in the FFQ from the Determinants of Mammographic Density in Spain study that were not collected in the FFQ from the EpiGEICAM study are italic.

§ All the n-3-enriched milk brands that have been consulted are skimmed or semi-skimmed.

The EpiGEICAM study identified three dietary patterns over twenty-six food groups: a Western pattern characterised by elevated intakes of high-fat dairy products, processed meat, refined grains, sweets, energetic drinks and other convenience foods and sauces and by low intakes of low-fat dairy products and whole grains; a Prudent pattern defined by high intakes of low-fat dairy products, vegetables, fruits, whole grains and juices; and a Mediterranean pattern represented by a high intake of fish, vegetables, legumes, boiled potatoes, fruits, olives and vegetable oil and a low intake of juices. These patterns explained 16, 13 and 8 % of the total variability in food intake, respectively( Reference Castelló, Pollan and Buijsse 24 ). We assessed the reproducibility of these three patterns by comparing them with the patterns extracted by applying the same PCA analysis to the same twenty-six food groups from the DDM-Spain sample.

Statistical analysis

Major existing dietary patterns were identified in the DDM-Spain sample using the same technique applied to the EpiGEICAM data( Reference Castelló, Pollan and Buijsse 24 ): applying PCA without rotation to the variance–covariance matrix over twenty-six inter-correlated food groups that were reduced to a set of principal components (dietary patterns in this case). The first few components with eigenvalues >1 were selected for initial exploration. The PCA reports, for a given pattern, a set of weights associated with each food group (commonly called component/pattern weights) that is used to calculate pattern scores, defined, for each individual, as a weighted sum of the food group consumption. Afterwards, these scores were correlated with the food group consumption to calculate the pattern loadings, which indicate the importance of individual food groups in each pattern. Pattern weights and pattern loadings give similar information, except that they are measured on different scales (weights are standardised into Z score form)( Reference Burt 29 ). As only information on pattern loadings was provided by the EpiGEICAM study, these were used to compare dietary patterns from both studies. For comparison purposes, we considered that food groups with pattern loadings ≥|0·3| were the main contributors to a dietary pattern.

To evaluate the level of agreement between the food composition of patterns extracted in the DDM-Spain study and those reported in the EpiGEICAM study, we calculated congruence coefficients (CC)( Reference Burt 29 , Reference Tucker 30 ) between the pattern loadings from both studies. CC represents the correlation between pattern loadings based on their deviations from 0 (instead of being based on the deviations from the mean of the factor loadings as the Pearson’s correlation is) and it is the preferred measure for component/factor similarity extracted with PCA/FA( Reference Haven and Berge 31 ). CC ranges from −1 to 1, and a value in the range 0·85–0·94 corresponds to fair similarity, whereas a value ≥0·95 implies that the two compared components/factors can be considered equivalent( Reference Haven and Berge 31 Reference Nesselroade and Baltes 33 ).

The CC between the pattern loadings of a given pattern from EpiGEICAM (l 1j ) and the pattern loadings of a given pattern from DDM-Spain (l 2j ) for each of the j=1, … ,26 food groups were calculated as follows:

$${\rm CC}\,{\equals}\,{{\sum_{j\,{\equals}\,1}^{26} l_{{1j}} \times l_{{2j}} } \over {\root 2 \of {\left( {\sum_{j\,{\equals}\,1}^{26} l_{{1j}}^{2} } \right)\times\left( {\sum_{j\,{\equals}\,1}^{26} l_{{2j}}^{2} } \right)} }}.$$

In addition, to follow the same methodology commonly used in studies exploring the reproducibility of dietary patterns, Spearman’s correlation coefficients (Corr) between the EpiGEICAM and the DDM-Spain pattern scores were calculated. For that purpose, patterns scores (which reflect the level of compliance of each woman with each one of the dietary patterns) were calculated as the linear combination of consumption of food groups weighted by the pattern loadings from EpiGEICAM Western, Prudent and Mediterranean patterns and from the set of selected patterns resulting from applying PCA to the DDM-Spain data as follows( Reference Schulze, Hoffmann and Kroke 34 ):

$$P_{{ki}} \,{\equals}\,\mathop \sum\limits_j (L_{{kj}} \cdot C_{{ji}} ),$$

where P is the pattern score, L the loading score, C the centred food consumption, k the Western, Prudent and Mediterranean patterns from EpiGEICAM and Western, Prudent and Mediterranean patterns from DDM-Spain, i=1, …, 3550 women and j=1, …, 26 food groups.

CC is the preferred measure for component/factor similarity extracted with PCA/FA because its validity is supported by methodological research( Reference Haven and Berge 31 Reference Nesselroade and Baltes 33 ). In addition, a recent study has questioned the ability of using solely Pearson’s correlation (Corr) coefficient to assess pattern similarity( Reference Castello, Buijsse and Martin 35 ). However, the majority of studies exploring the reproducibility of dietary patterns base their conclusions on the latter measure, considering any significant correlation as being indicative of pattern similarity regardless of its value( Reference Hu, Rimm and Smith-Warner 20 Reference Newby, Weismayer and Akesson 23 ). In this study, we provide the correlation coefficient for the sake of comparability with published data, but we will base our final conclusion regarding pattern reproducibility on the CC.

To take into account sampling variability in the estimation of pattern loadings using DDM-Spain data, and subsequently in the estimation of the agreement measurements between the patterns identified within the EpiGEICAM and the DDM-Spain studies, we performed a non-parametric bootstrap estimation with 5000 replications. Using sampling replacement, the bootstrap obtained 5000 replicates of the original DDM-Spain data set. PCA was then applied in each replication, and the three principal components that proved to be more similar to those reported in the EpiGEICAM were selected on the basis of the distance between the pattern loadings (more details are given in the online Supplementary Method 1). The 95 % percentile CI for each parameter were represented by percentiles 2·5 and 97·5 of the 5000 bootstrap point estimates’ distribution.

Similar analyses were carried out by applying the PCA to food groups from the DDM-Spain study, which included the same exact eighty-six items considered in the EpiGEICAM analysis (online Supplementary Table S1 and Fig. S1).

Analyses were performed using STATA/MP 14.0.

Results

The anthropometric, reproductive and socio-demographic characteristics of the EpiGEICAM controls( Reference Castelló, Pollan and Buijsse 24 ) and DDM-Spain women are summarised in Table 2. The DDM-Spain study recruited a higher percentage of older and postmenopausal women (77 v. 47 %), women with higher energy intake (on average 656 kJ/d (157 kcal/d) more in the DDM-Spain group), women with higher BMI and a higher percentage of women who practised physical activity with moderate-to-vigorous intensity (76 v. 63 %). On the other hand, these women reported lower intake of alcohol, lower educational level (34 % with primary school or less in DDM and 16 % in EpiGEICAM), lower percentage of family history of BC (7 v. 20 %), lower age at first delivery (43 % of parous women in the DDM had their first child before 25 years of age, whereas this proportion was 26 % in EpiGEICAM) and there was a lower percentage of nulliparous (9 v. 23 %) women. The distribution of age at menarche and smoking appeared to be fairly similar in both studies.

Table 2 Anthropometric, reproductive and socio-demographic characteristics of EpiGEICAM controls and Determinants of Mammographic Density in Spain (DDM-Spain) women (Mean values and standard deviations; medians and interquartile ranges (IQR); numbers and percentages)

v.e., Total variability in food group intakes explained by the pattern; BC, breast cancer.

* Descriptive data extracted from the scientific article of Castello et al.( Reference Castelló, Pollan and Buijsse 24 ).

As distribution of the prudent score was skewed, the median and IQR were used to describe this score.

Fig. 13 show the comparison between the original loadings from the EpiGEICAM study with their corresponding values in the DDM-Spain study. Western patterns from both studies were characterised by high intakes of high-fat dairy products, refined grains, energetic drinks and convenience food and sauces and low intakes of low-fat dairy products and whole grains. Correlations with the intake of red and/or processed meat and with sweets were also close to the 0·3 threshold. Moreover, the DDM-Spain Western pattern seemed to be negatively correlated with the consumption of white fish, a result that was not observed in EpiGEICAM. Despite these small differences, the elevated CC between patterns (CC=0·90) indicates a fair similarity between the Western patterns extracted from the EpiGEICAM and the DDM-Spain data (Fig. 1).

Fig. 1 Pattern loadings of the Western dietary pattern extracted from the EpiGEICAM study( Reference Castelló, Pollan and Buijsse 24 ) (left) and pattern loadings and 95 % percentile CI of the Western pattern extracted from Determinants of Mammographic Density in Spain (DDM-Spain) data (right). * Congruence coefficient (CC) and 95 % percentile CI between EpiGEICAM and DDM-Spain pattern loadings. † Correlation coefficient (Corr) and 95 % percentile CI between EpiGEICAM and DDM-Spain pattern scores. All correlations were significant at a 95 % confidence level.

Fig. 2 Pattern loadings of the Prudent dietary pattern extracted from the EpiGEICAM study( Reference Castelló, Pollan and Buijsse 24 ) (left) and pattern loadings and 95 % percentile CI of the Prudent pattern extracted from Determinants of Mammographic Density in Spain (DDM-Spain) data (right). * Congruence coefficient (CC) and 95 % percentile CI between EpiGEICAM and DDM-Spain pattern loadings. † Correlation coefficient (Corr) and 95 % percentile CI between EpiGEICAM and DDM-Spain pattern scores. All correlations were significant at a 95 % confidence level.

Fig. 3 Pattern loadings of the Mediterranean dietary pattern extracted from the EpiGEICAM study( Reference Castelló, Pollan and Buijsse 24 ) (left) and pattern loadings and 95 % percentile CI of the Mediterranean pattern extracted from Determinants of Mammographic Density in Spain (DDM-Spain) data (right). * Congruence coefficient and 95 % percentile CI between EpiGEICAM and DDM-Spain pattern loadings. † Correlation coefficient (Corr) and 95 % percentile CI between EpiGEICAM and DDM-Spain pattern scores. All correlations were significant at a 95 % confidence level.

We did not identify a pattern among women of the DDM-Spain study that was highly congruent with the EpiGEICAM Prudent pattern. The most similar pattern presented a high consumption of whole grains and juices but failed to correlate with low-fat dairy products, vegetables and fruits (Fig. 2). Something similar was observed with the Mediterranean pattern: several high correlations were observed with some vegetables, legumes, potatoes and nuts. However, the pattern from the DDM-Spain study did not include other typical factors of the Mediterranean diet, such as fish, olive oil and fruits (even if pattern loadings for these food groups were not low), whereas other foods more common in the Prudent diet, such as low-fat dairy products, or in the Western diet, such as sweets, and sugary and convenience foods, were included with high correlations. According to the CC (0·77), the EpiGEICAM and the DDM-Spain Mediterranean patterns cannot be considered similar (Fig. 3).

Finally, had we considered any significant correlation as being indicative of similarity, we would have concluded that all patterns extracted from the EpiGEICAM data were reproducible in the DDM-Spain study.

Discussion

To the best of our knowledge, this is the first study exploring the reproducibility of data-driven patterns in two different samples extracted from similar populations. We were able to reproduce the Western pattern identified in women from the EpiGEICAM study among women attending BC screening programmes who participated in the DDM-Spain study. However, the reproducibility of the Prudent and Mediterranean patterns cannot be considered good.

The association between dietary patterns and BC has been explored in many studies in different settings. Most of these studies identified a Western/Unhealthy pattern, which shares the most important characteristics with the Western patterns identified in EpiGEICAM and DDM-Spain, such as high consumption of fatty dairy products, red/processed meat, refined grains, sweets and convenience foods( Reference Agurs-Collins, Rosenberg and Makambi 36 Reference Wu, Yu and Tseng 41 ). However, the Mediterranean and Prudent patterns have often been mixed under the names of Vegetable, Prudent, Healthy or Mediterranean diet. These patterns are characterised by a high consumption of vegetables and fruits( Reference Agurs-Collins, Rosenberg and Makambi 36 Reference Zhang, Ho and Fu 47 ) that are an important part of the Mediterranean diet, but fail to include other items such as olive oil( Reference Agurs-Collins, Rosenberg and Makambi 36 , Reference Cui, Dai and Tseng 38 Reference Wu, Yu and Tseng 41 , Reference De Stefani, Deneo-Pellegrini and Boffetta 44 Reference Zhang, Ho and Fu 47 ), nuts( Reference Agurs-Collins, Rosenberg and Makambi 36 Reference Wu, Yu and Tseng 41 , Reference Bessaoud, Tretarre and Daures 43 Reference Zhang, Ho and Fu 47 ), legumes( Reference Cottet, Touvier and Fournier 37 , Reference Terry, Suzuki and Hu 39 Reference Wu, Yu and Tseng 41 , Reference De Stefani, Deneo-Pellegrini and Boffetta 44 , Reference Hirose, Matsuo and Iwata 46 , Reference Zhang, Ho and Fu 47 ) or fish( Reference Cui, Dai and Tseng 38 , Reference Wu, Yu and Tseng 41 ), which are key foods to differentiate the so-called Prudent or Healthy patterns from the Mediterranean.

None of the above-mentioned studies have been able to identify both, a Prudent and a Mediterranean pattern in the same population, probably reflecting the difficulty in differentiating them in contexts where the Mediterranean diet is not very prevalent. On the other hand, the higher agreement in the definition of a Western pattern across studies is consistent with the greater reproducibility of this pattern observed in our study.

As noted earlier in this study, PCA reduces a set of inter-correlated variables to a group of principal components (dietary patterns in this case) so that the maximum correlation between the variables within components and the minimum correlation among components are obtained( Reference Rencher 48 ). Therefore, the greater the variability in diet, the easier it will be to find clearly differentiated independent patterns. In our study, although EpiGEICAM included women from fourteen Spanish provinces (four of them on the Mediterranean coast), DDM-Spain participants were recruited from screening centres located in seven provinces (three of them located on the Mediterranean coast). Therefore, the greater geographical distribution in the EpiGEICAM study may imply a greater representativeness of all diets across the Spanish territory. In addition, distribution of age among DDM-Spain women was more homogeneous (range=45–69) than that observed in the EpiGEICAM participants (range=22–71). As García-Arenzana et al.( Reference García-Arenzana, Navarrete-Munoz and Peris 49 ) previously described, older women tend to have healthier dietary habits than younger women, which may have produced a more heterogeneous distribution of dietary habits in the EpiGEICAM study. This heterogeneity might have facilitated the identification of more specific patterns, not only limited to the discrimination of two antagonistic patterns (Western v. Healthy/Prudent/Mediterranean) but also allowing the clear differentiation of patterns with subtle differences, such as the Prudent and Mediterranean patterns.

Regarding the pre-established thresholds for the CC that define the similarity of dietary patterns in both studies, we based our decision on three published pieces of research that evaluated concordance coefficients in light of the subjective opinion of several experienced researchers judging the equivalence between different components( Reference Haven and Berge 31 Reference Nesselroade and Baltes 33 ). Haven and Nesselroade( Reference Haven and Berge 31 , Reference Nesselroade and Baltes 33 ) argue that values over 0·80 are enough to assume fair similarity between components, whereas Lorenzo-Seva & Berge( Reference Lorenzo-Seva and Berge 32 ) maintain a more conservative approach setting the cut-off point for fair similarity at 0·85 and preventing a CC below this value from being interpreted as indicative of similarity. All three articles agree on the difficulty in setting up a cut-off point under which patterns should be considered clearly different. Despite the fact that the CC is considered a good measure of agreement between components or factors extracted with PCA or FA( Reference Haven and Berge 31 Reference Nesselroade and Baltes 33 ), the existing bibliography evaluating the reproducibility of data-driven dietary patterns does not use this measure and bases its conclusions only on the correlations between pattern scores, considering any significant correlation as being indicative of similarity regardless of its value( Reference Hu, Rimm and Smith-Warner 20 Reference Newby, Weismayer and Akesson 23 ), which can be as low as 0·27( Reference Newby, Weismayer and Akesson 23 ). In our case, the correlations were significant and high for all three patterns (Fig. 13). However, according to the CC, only the Western pattern can be considered fairly similar between studies, which highlights the arbitrariness of the significance of the linear correlation to define pattern similarity and the need to choose an appropriate measure and a concrete threshold for such a measure to determine the level of congruence between patterns. In this regard, we have recently explored the applicability of previously reported dietary patterns in a different setting and we found that, for CC between pattern loadings ≥0·82 or correlations between pattern scores ≥0·57, patterns not only appear to have a very similar composition but also are similarly associated with BC risk( Reference Castello, Buijsse and Martin 35 ). The same direction of the associations but loss of significance was observed for values of the CC between pattern loadings ≤0·77 and values of the correlation between pattern scores ≤0·52. In the present study, taking into account only the methodological studies published regarding the threshold of the CC for pattern similarity( Reference Haven and Berge 31 Reference Nesselroade and Baltes 33 ), we followed the most conservative approach and considered dietary patterns to be fairly similar if CC values were ≥0·85.

A major limitation of the use of dietary patterns is the potential for subjective interpretations by the investigator to be introduced at various stages of the dietary patterns’ construction. Subjective decisions that might affect the comparability between studies are as follows: which foods should be included in each of the defined groups, the thresholds chosen to determine the contribution of food groups to the identified dietary patterns and the assignation of a label to each of these patterns( Reference Barkoukis 9 Reference Jacques and Tucker 11 , Reference Martinez, Marshall and Sechrest 18 , Reference Slattery and Boucher 19 ). However, we have demonstrated that this limitation can be overcome by a detailed analysis when comprehensive information on food grouping and loadings is provided by Castello et al.( Reference Castello, Buijsse and Martin 35 ). On the other hand, both FFQ from EpiGEICAM and DDM-Spain collected information on ninety-nine identical foods, except for the fact that DDM-Spain included eighteen additional foods that were not included in EpiGEICAM. In addition, the same group of researchers took principal responsibility for the analysis of the data; therefore, food grouping and labelling were very similar in both studies.

Finally, we summarise the main strengths of the present study. As previously mentioned, various studies have assessed the reproducibility of investigator-driven patterns( Reference George, Ballard-Barbash and Manson 12 Reference Reedy, Krebs-Smith and Miller 16 ). The reproducibility of data-driven dietary patterns extracted from the same sample using the dietary information obtained with different assessment tools or in different time points( Reference Hu, Rimm and Smith-Warner 20 Reference Newby, Weismayer and Akesson 23 ) has also been explored. However, to our knowledge, this is the first study assessing the reproducibility of data-driven dietary patterns in different samples from similar populations and the first using the CC to evaluate their similarity. In addition, most of the published studies on reproducibility of data-driven dietary patterns based their conclusions on limited sample sizes that ranged from 124–498( Reference Hu, Rimm and Smith-Warner 20 Reference Nanri, Shimazu and Ishihara 22 ). Dietary patterns from EpiGEICAM were extracted over 973 healthy women, and for DDM-Spain the sample size was 3550, a size only exceeded by the Newby et al. study( Reference Newby, Weismayer and Akesson 23 ).

Conclusions

The reproducibility of widely prevalent dietary patterns such as the Western pattern is better than the reproducibility of patterns more specific to certain populations, such as the Mediterranean. More methodological studies exploring the reproducibility of dietary patterns are needed to establish a more objective threshold for the CC between pattern loadings and their equivalent Corr between pattern scores that define pattern similarity.

Acknowledgements

The authors thank the DDM-Spain study participants for their contribution to breast cancer research and all collaborator researchers: Pilar Moreo, María Pilar Moreno, María Soledad Abad, Francisca Collado, Francisco Casanova, Jose Antonio Vázquez, Nieves Ascunce, Milagros García, Manuela Alcaraz, María Soledad Laso, Josefa Miranda and Francisco Ruiz Perales.

This study was supported by Carlos III Institute of Health FIS (Spanish Public Health Research Fund: PI060386 FIS; PS09/00790 and PI15CIII/0029 research grants), the Spanish Ministry of Health (EC11-273), the Spanish Ministry of Economy and Competitiveness (IJCI-2014-20900), the Spanish Federation of Breast Cancer Patients (FECMA: EPY 1169-10) and the Association of Women with Breast Cancer from Elche (AMACMEC: EPY 1394/15). None of the funders had any role in the design, analysis or writing of this article.

V. L., N. A., B. P.-G. and M. P. designed the study; A. C., J. V., C. S., C. P.-P., S. A., M. E., D. S.-T., C. V. and C. S.-C. collected the data and/or prepared the database. A. C. performed statistical analysis and wrote the initial version of the manuscript that M. P. revised and corrected in its different versions. All the authors have read and approved the final version of the manuscript.

The authors declare that there are no conflicts of interest.

Supplementary material

For supplementary material/s referred to in this article, please visit http://dx.doi.org/10.1017/S000711451600252X

Footnotes

Membership of the DDM-Spain research group is provided in the Acknowledgements section.

References

1. Bingham, SA, Luben, R, Welch, A, et al. (2003) Are imprecise methods obscuring a relation between fat and breast cancer? Lancet 362, 212214.CrossRefGoogle ScholarPubMed
2. Kelemen, LE (2007) GI Epidemiology: nutritional epidemiology. Aliment Pharmacol Ther 25, 401407.CrossRefGoogle ScholarPubMed
3. Willett, W (2001) Commentary: dietary diaries versus food frequency questionnaires – a case of undigestible data. Int J Epidemiol 30, 317319.CrossRefGoogle ScholarPubMed
4. Jacobs, DR Jr & Steffen, LM (2003) Nutrients, foods, and dietary patterns as exposures in research: a framework for food synergy. Am J Clin Nutr 78, 508S513S.CrossRefGoogle ScholarPubMed
5. Messina, M, Lampe, JW, Birt, DF, et al. (2001) Reductionism and the narrowing nutrition perspective: time for reevaluation and emphasis on food synergy. J Am Diet Assoc 101, 14161419.CrossRefGoogle ScholarPubMed
6. Irala-Estevez, JD, Groth, M, Johansson, L, et al. (2000) A systematic review of socio-economic differences in food habits in Europe: consumption of fruit and vegetables. Eur J Clin Nutr 54, 706714.CrossRefGoogle ScholarPubMed
7. Sanchez-Villegas, A, Martinez, JA, Prattala, R, et al. (2003) A systematic review of socioeconomic differences in food habits in Europe: consumption of cheese and milk. Eur J Clin Nutr 57, 917929.CrossRefGoogle ScholarPubMed
8. Teufel, NI (1997) Development of culturally competent food-frequency questionnaires. Am J Clin Nutr 65, 1173S1178S.CrossRefGoogle ScholarPubMed
9. Barkoukis, H (2007) Importance of understanding food consumption patterns. J Am Diet Assoc 107, 234236.CrossRefGoogle ScholarPubMed
10. Hu, FB (2002) Dietary pattern analysis: a new direction in nutritional epidemiology. Curr Opin Lipidol 13, 39.CrossRefGoogle ScholarPubMed
11. Jacques, PF & Tucker, KL (2001) Are dietary patterns useful for understanding the role of diet in chronic disease? Am J Clin Nutr 73, 12.CrossRefGoogle ScholarPubMed
12. George, SM, Ballard-Barbash, R, Manson, JE, et al. (2014) Comparing indices of diet quality with chronic disease mortality risk in postmenopausal women in the women’s health initiative observational study: evidence to inform national dietary guidance. Am J Epidemiol 180, 616625.CrossRefGoogle ScholarPubMed
13. Harmon, BE, Boushey, CJ, Shvetsov, YB, et al. (2015) Associations of key diet-quality indexes with mortality in the Multiethnic cohort: the dietary patterns methods project. Am J Clin Nutr 101, 587597.CrossRefGoogle ScholarPubMed
14. Liese, AD, Krebs-Smith, SM, Subar, AF, et al. (2015) The dietary patterns methods project: synthesis of findings across cohorts and relevance to dietary guidance. J Nutr 145, 393402.CrossRefGoogle ScholarPubMed
15. McCullough, ML (2014) Diet patterns and mortality: common threads and consistent results. J Nutr 144, 795796.CrossRefGoogle ScholarPubMed
16. Reedy, J, Krebs-Smith, SM, Miller, PE, et al. (2014) Higher diet quality is associated with decreased risk of all-cause, cardiovascular disease, and cancer mortality among older adults. J Nutr 144, 881889.CrossRefGoogle ScholarPubMed
17. Fung, TT, McCullough, ML, Newby, PK, et al. (2005) Diet-quality scores and plasma concentrations of markers of inflammation and endothelial dysfunction. Am J Clin Nutr 82, 163173.CrossRefGoogle Scholar
18. Martinez, ME, Marshall, JR & Sechrest, L (1998) Invited commentary: factor analysis and the search for objectivity. Am J Epidemiol 148, 1719.CrossRefGoogle ScholarPubMed
19. Slattery, ML & Boucher, KM (1998) The senior authors’ response: factor analysis as a tool for evaluating eating patterns. Am J Epidemiol 148, 2021.CrossRefGoogle ScholarPubMed
20. Hu, FB, Rimm, E, Smith-Warner, SA, et al. (1999) Reproducibility and validity of dietary patterns assessed with a food-frequency questionnaire. Am J Clin Nutr 69, 243249.CrossRefGoogle ScholarPubMed
21. Khani, BR, Ye, W, Terry, P, et al. (2004) Reproducibility and validity of major dietary patterns among Swedish women assessed with a food-frequency questionnaire. J Nutr 134, 15411545.CrossRefGoogle ScholarPubMed
22. Nanri, A, Shimazu, T, Ishihara, J, et al. (2012) Reproducibility and validity of dietary patterns assessed by a food frequency questionnaire used in the 5-year follow-up survey of the Japan Public Health Center-Based Prospective Study. J Epidemiol 22, 205215.CrossRefGoogle ScholarPubMed
23. Newby, PK, Weismayer, C, Akesson, A, et al. (2006) Long-term stability of food patterns identified by use of factor analysis among Swedish women. J Nutr 136, 626633.CrossRefGoogle ScholarPubMed
24. Castelló, A, Pollan, M, Buijsse, B, et al. (2014) Spanish Mediterranean diet and other dietary patterns and breast cancer risk: case-control EpiGEICAM study. Br J Cancer 111, 14541462.CrossRefGoogle ScholarPubMed
25. Lope, V, Perez-Gomez, B, Sanchez-Contador, C, et al. (2012) Obstetric history and mammographic density: a population-based cross-sectional study in Spain (DDM-Spain). Breast Cancer Res Treat 132, 11371146.CrossRefGoogle ScholarPubMed
26. Pollan, M, Lope, V, Miranda-Garcia, J, et al. (2012) Adult weight gain, fat distribution and mammographic density in Spanish pre- and post-menopausal women (DDM-Spain). Breast Cancer Res Treat 134, 823838.CrossRefGoogle ScholarPubMed
27. Vioque, J, Navarrete-Munoz, EM, Gimenez-Monzo, D, et al. (2013) Reproducibility and validity of a food frequency questionnaire among pregnant women in a Mediterranean area. Nutr J 12, 26.CrossRefGoogle Scholar
28. Willett, WC, Sampson, L, Stampfer, MJ, et al. (1985) Reproducibility and validity of a semiquantitative food frequency questionnaire. Am J Epidemiol 122, 5165.CrossRefGoogle ScholarPubMed
29. Burt, C (1948) Factor analysis and canonical correlations. Br J Math Stat Psychol 1, 95106.CrossRefGoogle Scholar
30. Tucker, LR (1951) A method for the synthesis of factor analysis studies, Personnel Research Section Report no. 984. Washington, DC: Department of the Army.Google Scholar
31. Haven, S & Berge, J (1977) Tucker’s coefficient congruence as a measure of factorial invariance: an empirical study. Heymans Bulletin no. 290 EX. Groningen, The Netherlands: University of Groningen.Google Scholar
32. Lorenzo-Seva, U & Berge, J (2006) Tucker’s congruence coefficient as a meaningful index of factor similarity. Methodology 2, 5467.CrossRefGoogle Scholar
33. Nesselroade, J & Baltes, P (1970) On a dilemma of comparative factor analysis: a study of factor matching based on random data. Educ Psychol Meas 30, 935948.CrossRefGoogle Scholar
34. Schulze, MB, Hoffmann, K, Kroke, A, et al. (2003) An approach to construct simplified measures of dietary patterns from exploratory factor analysis. Br J Nutr 89, 409419.CrossRefGoogle ScholarPubMed
35. Castello, A, Buijsse, B, Martin, M, et al. (2016) Evaluating the applicability of data-driven dietary patterns to independent samples with focus on measurement tools for pattern similarity. J Acad Nutr Diet (In the Press).CrossRefGoogle ScholarPubMed
36. Agurs-Collins, T, Rosenberg, L, Makambi, K, et al. (2009) Dietary patterns and breast cancer risk in women participating in the black women’s health study. Am J Clin Nutr 90, 621628.CrossRefGoogle ScholarPubMed
37. Cottet, V, Touvier, M, Fournier, A, et al. (2009) Postmenopausal breast cancer risk and dietary patterns in the E3N-EPIC prospective cohort study. Am J Epidemiol 170, 12571267.CrossRefGoogle ScholarPubMed
38. Cui, X, Dai, Q, Tseng, M, et al. (2007) Dietary patterns and breast cancer risk in the Shanghai breast cancer study. Cancer Epidemiol Biomarkers Prev 16, 14431448.CrossRefGoogle ScholarPubMed
39. Terry, P, Suzuki, R, Hu, FB, et al. (2001) A prospective study of major dietary patterns and the risk of breast cancer. Cancer Epidemiol Biomarkers Prev 10, 12811285.Google ScholarPubMed
40. Velie, EM, Schairer, C, Flood, A, et al. (2005) Empirically derived dietary patterns and risk of postmenopausal breast cancer in a large prospective cohort study. Am J Clin Nutr 82, 13081319.CrossRefGoogle Scholar
41. Wu, AH, Yu, MC, Tseng, CC, et al. (2009) Dietary patterns and breast cancer risk in Asian American women. Am J Clin Nutr 89, 11451154.CrossRefGoogle ScholarPubMed
42. Adebamowo, CA, Hu, FB, Cho, E, et al. (2005) Dietary patterns and the risk of breast cancer. Ann Epidemiol 15, 789795.CrossRefGoogle ScholarPubMed
43. Bessaoud, F, Tretarre, B, Daures, JP, et al. (2012) Identification of dietary patterns using two statistical approaches and their association with breast cancer risk: a case-control study in Southern France. Ann Epidemiol 22, 499510.CrossRefGoogle Scholar
44. De Stefani, E, Deneo-Pellegrini, H, Boffetta, P, et al. (2009) Dietary patterns and risk of cancer: a factor analysis in Uruguay. Int J Cancer 124, 13911397.CrossRefGoogle Scholar
45. Demetriou, CA, Hadjisavvas, A, Loizidou, MA, et al. (2012) The Mediterranean dietary pattern and breast cancer risk in Greek-Cypriot women: a case-control study. BMC Cancer 12, 113.CrossRefGoogle ScholarPubMed
46. Hirose, K, Matsuo, K, Iwata, H, et al. (2007) Dietary patterns and the risk of breast cancer in Japanese women. Cancer Sci 98, 14311438.CrossRefGoogle ScholarPubMed
47. Zhang, CX, Ho, SC, Fu, JH, et al. (2011) Dietary patterns and breast cancer risk among Chinese women. Cancer Causes Control 22, 115124.CrossRefGoogle ScholarPubMed
48. Rencher, A (2002) Principal component analysis. In Methods of Multivariate Analysis, pp. 380407. New York: John Wiley & Sons, Inc.CrossRefGoogle Scholar
49. García-Arenzana, N, Navarrete-Munoz, EM, Peris, M, et al. (2012) Diet quality and related factors among Spanish female participants in breast cancer screening programs. Menopause 19, 11211129.CrossRefGoogle ScholarPubMed
Figure 0

Table 1 Description of food groups used in principal component analyses

Figure 1

Table 2 Anthropometric, reproductive and socio-demographic characteristics of EpiGEICAM controls and Determinants of Mammographic Density in Spain (DDM-Spain) women (Mean values and standard deviations; medians and interquartile ranges (IQR); numbers and percentages)

Figure 2

Fig. 1 Pattern loadings of the Western dietary pattern extracted from the EpiGEICAM study(24) (left) and pattern loadings and 95 % percentile CI of the Western pattern extracted from Determinants of Mammographic Density in Spain (DDM-Spain) data (right). * Congruence coefficient (CC) and 95 % percentile CI between EpiGEICAM and DDM-Spain pattern loadings. † Correlation coefficient (Corr) and 95 % percentile CI between EpiGEICAM and DDM-Spain pattern scores. All correlations were significant at a 95 % confidence level.

Figure 3

Fig. 2 Pattern loadings of the Prudent dietary pattern extracted from the EpiGEICAM study(24) (left) and pattern loadings and 95 % percentile CI of the Prudent pattern extracted from Determinants of Mammographic Density in Spain (DDM-Spain) data (right). * Congruence coefficient (CC) and 95 % percentile CI between EpiGEICAM and DDM-Spain pattern loadings. † Correlation coefficient (Corr) and 95 % percentile CI between EpiGEICAM and DDM-Spain pattern scores. All correlations were significant at a 95 % confidence level.

Figure 4

Fig. 3 Pattern loadings of the Mediterranean dietary pattern extracted from the EpiGEICAM study(24) (left) and pattern loadings and 95 % percentile CI of the Mediterranean pattern extracted from Determinants of Mammographic Density in Spain (DDM-Spain) data (right). * Congruence coefficient and 95 % percentile CI between EpiGEICAM and DDM-Spain pattern loadings. † Correlation coefficient (Corr) and 95 % percentile CI between EpiGEICAM and DDM-Spain pattern scores. All correlations were significant at a 95 % confidence level.

Supplementary material: File

Castelló supplementary material

Castelló supplementary material 1

Download Castelló supplementary material(File)
File 46.3 KB