A multitude of in-vitro studies had reported the effect of individual-specific dietary components on breast cancer (BC)(Reference Gandini, Merzenich and Robertson1). Nutritional studies have historically been focusing on specific nutrients or foods in isolation and oversimplified the complexity of foods(Reference Michels, Mohllajee and Roset-Bahmanyar2,Reference Kerr, Anderson and Lippman3) . The high degree of intercorrelation among various nutrients and foods makes it difficult to attribute effects to a single independent component; the interpretation and application of results were limited(Reference Hu4,Reference Tapsell5) . Now in nutrition epidemiology, the critical concept of food synergy has been convinced that nutrients exist in a purposeful biological sense in food, delivering their combinations reflect biological functionality.
Creating dietary patterns that inherently account for interactions among micronutrients and estimate overall dietary effects may provide a more robust approach for determining associations between disease and diet(Reference Gleason, Boushey and Harris6,Reference Castello, Buijsse and Martin7) . In general, dietary patterns are typically derived using two main approaches, ‘a priori approach’ by using a predefined dietary pattern and fitting the data into the indices, as Diet Quality Index(Reference Kant8,Reference Ocke9) , or ‘a posteriori approach’ by data-driven statistical reduction techniques explores dietary patterns, as cluster analysis, factor analysis (FA) and principal component analysis(Reference Ocke9,Reference Newby and Tucker10) . Both ‘a priori’ and ‘a posteriori’ approaches had been widely used for defining dietary patterns associated with various health outcomes. As Mediterranean dietary pattern measured by the compliance with ‘a priori’ defined dietary indices was associated with reducing BC risk(Reference Sofi, Cesari and Abbate11,Reference Bloomfield, Koeller and Greer12) , the Western dietary pattern derived by FA was linked with increasing the risk of BC(Reference Albuquerque, Baltar and Marchioni13–Reference Grosso, Bella and Godos15). Although various methods have been developed to explore dietary patterns in different populations, there were still some challenges in accurately identifying dietary patterns(Reference Edefonti, Randi and La Vecchia16–Reference Uzhova, Woolhead and Timon18), including reducing the complex multidimensional nutritional data down to meaningful observed dietary patterns, dealing with the heterogeneity of individuals embedded in the studied population and classifying each participant into a specific dietary pattern(Reference Nurius and Macy19,Reference von Eye and Bergman20) .
Latent class analysis (LCA) is a person-centred, data-driven analytic approach(Reference Uzhova, Woolhead and Timon18,Reference Lanza and Rhoades21) . LCA identifies the patterns of relations among a set of observed variables and classifies similar individuals into specific latent classes, which leads subjects to highly similar in class but uniquely different from the members of other classes(Reference Leech, Worsley and Timperio22). Compared with ‘variable-oriented’ approaches, as FA/principal component analysis, that characterise the overall sample, ‘person-centred’ approaches model distinct configurations of heterogeneity within the sample, which could identify distinct, unknown patterns in subtypes of individuals from multidimensional data, capturing heterogeneity within and between groups. Moreover, LCA allows adjustment for covariates, quantifies member class probability and assessment of the goodness of fit(Reference Miettunen, Nordstrom and Kaakinen23,Reference DiStefano and Kamphaus24) .
Therefore, the study aims to use LCA to identify distinct classes of dietary patterns in Chinese women and evaluate whether specific dietary patterns are associated with BC risk and whether these associations will be affected by menopause status.
Methods
Study design and subjects
Chinese Wuxi Exposure and Breast Cancer Study (2013–2014) is a population-based case–control study. The subjects were all women who lived in Wuxi city, Jiangsu Province, China, for more than 5 years, as previously reported(Reference Lu, Qian and Huang25). According to the cancer registration system, newly diagnosed BC patients within 1 year were selected as the case group. All cases were identified according to the International Classification of Diseases for Oncology (ICD-10, code C50). Patients with secondary or recurrent BC were excluded. For those with multiple incident cancers, we only included those with BC as the first diagnosed original malignancy. Controls were selected from the local population registry system and matched to the cases by the same residence area and age (range of ±5 years), excluding individuals with any cancer history.
From November 2013 to November 2014, a total of 1410 newly diagnosed BC cases were identified, 1072 cases meeting the inclusion criteria and 818 of them were recruited in the current study, the response rate was 76·3 % (818/1072). Moreover, 1072 controls were selected and 935 of them participated, with a response rate of 87·2 % (935/1072), conducted as a frequency-matched case–control study (Fig. 1). The study protocol was approved by the Institutional Review Boards of Jiangsu CDC, and written informed consent was obtained from all subjects.
Data collection
Demographic, lifestyle characteristics, menstrual and reproductive events, dietary intake, disease history and physical activity-related data were collected during the person-to-person interviews conducted by trained interviewers. Anthropometric measures were obtained by trained personnel following a standard protocol. The usual diet was assessed by a validated, semi-quantitative FFQ, which included 149 items along with the recipes commonly used in China, a detailed description given(Reference Zhao, Hasegawa and Chen26). Physical activity was measured by referencing the Global Physical Activity Questionnaire(Reference Armstrong and Bull27). Nutrient and energy intake were calculated through the Chinese Food Composition Database (2018, 6th version).
Dietary patterns derived by latent class analysis
Dietary intake assessment included whether the food was consumed, consumption frequency (times per day/week/month/year) and the average amount of food consumption at each time. The 149 food items in the FFQ were classified into eighteen predefined food groups based on similarities in nutrient profile and culinary usage, including rice/flour, cereals, fried foods, red meat, poultry, aquatic products, eggs, milk, fruits, vegetables, soya foods, nuts, cakes, sugar strengthened beverage, fresh juice, soft drink, pickled foods and coffee.
LCA identifies the number of ‘latent’ classes that describe the association between manifest variables. LCA estimates two key sets of parameters for categorical outcomes and covariates, respectively(Reference Fahey, Thane and Bramwell28): conditional probabilities (or called response probability) of observer indicators under a given class (e.g., the probability of red meat consumption among women adhering to Western class) and the regression coefficients predicting class membership (e.g., to differentiate between two subjects under the same energy intake but with different types of food consumption per day, as 250 g/d fire food and 1000 g/d rice respectively, with same 8368 kJ/d (2000 kcal/d)). Two sets of parameters were given to predict the posterior probability of belonging to each class for each subject(Reference Wedel and Kamakura29).
Since the probability of food items consumption often had a typical spike at 0 for non-consumers and constrained between 0 and 1 for consumers, we categorised the derived food group consumption into four levels: tertiles of non-zero consumption and no consumption (calculated from controls). Because there were < 20 % of women consumed sugar strengthened beverage, fresh juice, soft drink or coffee, we set the consumption of these foods as binary variables (consumed or no). While rice/flour was consumed almost ubiquitously, there were only tertiles of consumption and no non-consumption category.
LCA was used to derive dietary patterns of food groups based on the dietary data from controls, adjusted for energy intake (kcal/d). The dietary classes were interpreted and named according to the conditional probabilities of food group intake. The number of classes was determined using the Bayesian information criterion, Lo-Mendel-Rubin likelihood ratio (LMR) test and entropy value to identify the best-fitting model with statistical fit balance and parsimony. Finally, we predicted the probabilities of class membership for controls and cases, assigning them to the specific latent class (dietary pattern) for which the probability is the highest.
Statistical analysis
All analyses were stratified by menopausal status at diagnosis for cases or enrollment for controls. Women were considered to be postmenopausal as an absence of menstruation in the past 12 months.
Associations between exposures (dietary pattern as a nominal predictor) and the outcomes (BC) were estimated in terms of adjusted OR and corresponding 95 % CI using the logistic regression model. All models were adjusted for age at diagnosis for cases or enrollment for controls (by years), area (urban, rural), education (ordered as illiterate and primary, middle and high school, university and above), tobacco smoking (no or yes: including smoking and second-hand smoking $\, \ge \,$ 3d/week), tea intake (no or yes: $\, \ge \,$ 3 d/week), alcohol intake (no or yes: $\, \ge $ 3 d/week), moderate physical activity (min/d), oral contraceptives use (no or yes: current use or ever use), hormone replacement therapy (HRT) (no or yes: current use or ever use), family history of BC (no or yes: in a first-degree relative), history of benign breast disease (no or yes: including lactation mastitis, plasma cell mastitis, cyclomastopathy, fibroadenoma of breast, galactocele), age at menarche (by years), parity (ordered as 0, 1, 2 or ${\rm{\;}} \ge $ 3), age at first full-term delivery (by years), breast-feeding (no or yes), height (by cm), BMI (kg/m2) and energy intake (kcal/d). Postmenopausal stratification analysis was further adjusted for the menopausal age (by years). Furthermore, to examine whether the association between dietary patterns and BC risk was affected by well-established or suspected non-dietary BC risk factors, we conducted stratified analysis and interaction test by selected covariates which can cause a change in the OR of interest by at least 10 %, including age, education, tobacco smoking, alcohol intake, HRT, age at menarche, age at first full-term delivery, menopausal age, family history of BC, history of benign breast disease, parity, energy intake, BMI and height (stratification based on the median of controls distribution). The P for interaction was calculated by the likelihood ratio test.
Finally, to assess the impact of classification quality, we also excluded women with low predicted probabilities of the class membership (< (K–1)/K in LCA with K classes) for their assigned dietary class as a sensitivity analysis.
The LCA was conducted using MPLUS (V8.3; Muthén & Muthén), and other statistical analyses were conducted using R version 4.0.0 (The R Project for Statistical Computing, USA; https://www.r-project.org/). All P values quoted were 2-sided, and < 0·05 was considered as statistically significant.
Results
Of the participants interviewed (818 cases, 935 controls), we excluded seventy-seven cases and seventy-five controls because of extreme values in total energy intake (< 500 or > 5000 kcal) and forty-six cases and fifty-six controls missing information on adjusting covariant variables. No significant difference among demographic characteristics was found between excluded participants and remained ones. The results presented here were based on 695 cases and 804 controls who have complete information on FFQ and possible covariates included for an adjustment. The education level of the cases lower than that of the controls, the overweight rate, family history of BC and history of benign breast disease of cases was higher than that of the controls (P < 0·01) (Table 1).
* P-values are calculated based on the χ 2 test or t test.
Dietary patterns derived by latent class analysis
Latent class models were fitted for two to six classes, and finally, four dietary pattern classes were chosen. Because when the LCA model retains four classes, the value of Bayesian information criterion is the smallest, the Lo-Mendel-Rubin likelihood ratio test reaches a significant level and the entropy value (0·836) is ideal. The proportion of each class after dividing into four categories is also balanced. Therefore, based on the balance of statistical fit and parsimony, we believe that the four-class model is appropriate (see online supplementary material, Supplemental Table 1). Figure 2 shows the conditional probabilities of Chinese women taking each food group in each class. A food group in high (third tertile) consumption (Fig. 2a) had a probability close to 1 in a given latent class, suggesting women in that class were likely to take more of that specific food. The food group in no consumption (Fig. 2b) with a probability close to 1 indicated that women took food less often (see online supplementary material, Supplemental Table 2). We named the four chosen classes as Prudent diet consumers, Western diet consumers, Chinese traditional diet consumers and Picky diet consumers.
Characteristics of dietary patterns
The Prudent class was characterised by a high probability of consuming healthy foods such as cereals, aquatic products, fruits, vegetables, soya foods and nuts. The Chinese traditional class was featured by the preference of white meat (as poultry) over red meat and the general willingness to take soya foods, with the lowest probabilities in non-consumption of the specific foods. Western class shows the highest probability of consuming a high-protein, high-fat and high-sugar foods such as fried meat or eggs, cakes, soft drinks, coffee, as well as soya foods. Compared with other classes, women in the Picky class were characterised by higher extreme probabilities of non-consumption of specific foods. In addition, women in the Picky class showed the highest probabilities of consumption of pickled foods and the lowest probabilities of consumption of cereals, soya foods and nuts (see online supplementary material, Supplemental Table 3). Overall, by conditional probability, 29·0, 31·9, 15·7 and 23·4 % of women were characterised by Prudent class, Chinese traditional class, Western class and Picky class, respectively. Women characterised by Western class demonstrated a significantly higher amount of energy intake than those the other three classes did (Western: 2226·71 $ \pm \,$ 414·84 kcal/d, Chinese traditional: 1630·53 $\, \pm \,$ 283·57 kcal/d, Prudent: 1860·84 $ \pm \,$ 280·99 kcal/d, Picky: 1555·98 $\, \pm $ 272·89 kcal/d). Additionally, we compared the socio-demographic characteristics within the identified latent classes and found significant differences between the classes (see online supplementary material, Supplemental Table 4).
Association between dietary pattern and breast cancer risk
Regarding the Prudent class as the reference group, the Picky class showed an independent risk effect on BC with an OR of 1·46 (95 % CI 1·01, 2·05) among postmenopausal women, whereas Western class (OR = 0·87, 95 % CI 0·54, 1·43) and Chinese traditional class (OR = 0·90, 95 % CI 0·63, 1·29) showed no difference, while no relevant association was found in premenopausal women (Table 2). Because the associations of BC risk with LCA-driven dietary patterns were not found among premenopausal women, stratified analysis and interaction tests were restricted only to the postmenopausal women. As shown in Table 3, we found that the associations of dietary patterns with BC risk were affected by some well-established or suspected non-dietary BC risk factors, that is, the Picky class was strongly associated with increased risk of BC among women who were age < 60 (years), drink alcohol, never use of HRT, age of menarche $\, \ge $ 16 (years), age at first full-term delivery $ \ge $ 25(years), menopausal age $\, \ge $ 55, had a history of benign breast disease, parity < 2 or height < 155 cm.
* OR were derived from the logistic regression model.
† Adjusted for age, area, education, tobacco smoking, tea intake, alcohol intake, moderate physical activity, oral contraceptives use, hormone replacement therapy, family history of breast cancer, history of benign breast disease, age at menarche, parity, age at first full-term delivery, breast-feeding, height, BMI, energy intake, postmenopausal additional adjusted for the menopausal age.
‡ P for interaction between dietary patterns and menopause was derived from a likelihood ratio test.
* Adjusted for age, area, education, tobacco smoking, tea intake, alcohol intake, moderate physical activity, oral contraceptives use, hormone replacement therapy, family history of breast cancer, history of benign breast disease, age at menarche, parity, age at first full-term delivery, breast-feeding, height, BMI, energy intake, menopausal age.
† P for interaction between dietary patterns and non-dietary BC risk factors was derived from a likelihood ratio test.
Note: in this study, not all the food groups had a third tertile level (e.g., sugar strengthened beverage, fresh juice, etc. were binary) as the title states. And for food groups that had four levels, there is one non-consumption category.
Sensitivity analysis
The sensitivity analysis of dietary classification quality was assessed by excluding women (20·98 % of controls, 27·48 % of cases) with a low predicted probability (< 0·75) of the class membership. However, none of the associations between dietary classes and BC risk changed substantially.
Finally, we analysed the impact of selection bias caused by non-responders on the results; the possible selection bias would not change the existing conclusions. Through telephone follow-up, we investigated the reasons for non-respondents and fill them with random sampling from existing samples based on their characteristics, using fully conditional specification multivariate imputation by the chained equations method(Reference Van Buuren, Brand and Groothuis-Oudshoorn30) (see online supplementary material, Supplemental ‘Selection bias analysis’).
Discussion
In the current study, we applied a novel approach LCA to identify the generic dietary patterns in the Chinese female population and evaluated their associations with the risk of BC. We found that the Picky class contributed an additional risk to BC (OR = 1·42, 95 % CI 1·06, 1·90), while the relevant association was only in post- (OR = 1·44, 95 % CI 1·01, 2·05) but not in premenopausal women. The Western class and Chinese traditional classes were not associated with beneficial or adverse effects on BC risk compared with the Prudent class.
In nutritional epidemiology, data-driven methods such as FA/principal component analysis were widely used for nutritional data reduction, but the challenges in accurately identifying dietary patterns across population still exist. For these ‘variable-oriented’ methods, food items are grouped according to the degree of association between each other, operating by partitioning variance between measured variables(Reference Santos, Gorgulho and Castro31,Reference Varraso, Garcia-Aymerich and Monier32) . However, in most of the cases, the asymmetric distribution of variables reflecting food consumption caused by the heterogeneity of individuals in the diet will hinder the full capture of generic dietary patterns in the studied population(Reference Edefonti, Randi and La Vecchia16,Reference Jacques and Tucker17) . Because this type of method performs on the square of simple correlation coefficients between variables, skewed distribution results in the sum of squares of simple correlation coefficients between variables being much smaller than the sum of squares of partial correlation coefficients, and the lower variance contribution will make it challenging to capture the information about the relationship between the variables of interest(Reference MacCallum, Zhang and Preacher33).
Therefore, considering that the source of heterogeneity is from the individuals of the studied population rather than the diet measurement variables themselves, studying ‘person-centred’ instead of ‘variable-oriented’ may be more effective(Reference Rabe-Hesketh and Skrondal34). We compared the results of LCA and FA based on the same data sets (online supplementary material, Supplemental ‘Comparison between LCA and FA’). We found that the classification of dietary patterns was roughly similar to the previous study based on the FA approach, which demonstrates LCA and FA identified similar dietary patterns when presented with the same data set. However, heterogeneity embedded in the study population leads to an unreliable result of FA; a low original variance (45·21 %) may affect obtained findings. In contrast, LCA with an ideal entropy value (0·836) ensures the accuracy of classification (> 90 %). Additionally, we compared the alternate Mediterranean Diet score to the highly data-driven LCA results; as ‘a priori’ approach, it could better capture specific dietary characteristics under an identified actual dietary pattern. We found that the alternate Mediterranean Diet score indices seemed similar to the Prudent dietary pattern in terms of its correlation with specific foods (online supplementary material, Supplementary ‘Comparison between LCA and DQI’). The comparison demonstrates that under the premise that LCA is conducive to identifying the heterogeneity in different subpopulations, it also has a good performance in understanding the combination of food consumption and capturing the diet characteristics in a specific dietary pattern.
The results of the association between dietary patterns and BC risk were not consistent in Western population(Reference Gandini, Merzenich and Robertson1,Reference Albuquerque, Baltar and Marchioni13–Reference Grosso, Bella and Godos15) . Inverse associations with Prudent dietary pattern and positive associations with the Western dietary pattern of BC risk have been found in many studies. However, some studies also reported contradictory findings(Reference Gandini, Merzenich and Robertson1,Reference Albuquerque, Baltar and Marchioni13–Reference Grosso, Bella and Godos15) . Similarly, the current research on Asian women’s dietary patterns and BC risk has not reached a consistent conclusion(Reference Butler, Wu and Wang35–Reference Cui, Dai and Tseng39). In our study, no difference between BC risk and Prudent class, Western class or Chinese traditional class was found. Although the characteristics of the three dietary patterns were different, we have found some commonalities. Compared with Picky class, the probability of soya foods intake in Prudent, Chinese and Western was relatively higher. Soya isoflavones may reduce the risk of BC by preferentially binding the oestrogen-dependent mechanism of oestrogen receptor-b relative to oestrogen receptor-α (Reference Strom, Hartman and Foster40), as well as the oestrogen-independent mechanism of inhibiting the nuclear transcription factor ${\rm{\kappa }}$ B DNA binding activity and the Akt signalling pathway(Reference Gong, Li and Nedeljkovic-Kurepa41). We did not find Western class had a positive association with BC risk in the current study, a possible reason for that might be the relatively high consumption of polyphenols foods such as soya foods and nuts in the Chinese women of Western class. Besides, Chinese women’s average red meat intake (19·32 g/1000 kcal/d) was much lower than that among Asian Americans (34·5 g/1000 kcal/d)(Reference Wu, Yu and Tseng42), which was different from a typical Western diet pattern among Western populations.
What deserves attention in the current study is the Picky class. We first found women’s compliance with this dietary pattern was with a higher BC risk among postmenopausal women compared with those of Prudent class. Women in Picky class were characterised by higher extreme probabilities of non-consumption of specific foods, the highest probabilities in consumption of pickled foods and the lowest probabilities in consumption of cereals, soya foods and nuts. Therefore, we suspected that the high BC risk of Picky class might come from an imbalance diet that could lead to loss of certain vital nutrients and high consumption of pickled foods that are prone to inflammation.
For BC, the observed heterogeneity of risk affected by menopausal status was particularly substantial, as it hinted at relative contributions of oestrogens, progesterone, insulin and insulin-like growth factor 1 (IFG-1) in mediating the association(Reference Keum, Greenwood and Lee43). The meaningful findings were concentrated in the postmenopausal women population; while during the postmenopausal period, oestrogens appear to be a dominant driver. Therefore, a potential biological mechanism may explain the null finding among premenopausal women: the ovaries are a predominant site of oestrogen synthesis in the premenopausal period; the additional contribution of diet factors to the circulating pool of oestrogens (i.e., oestrone, oestradiol, oestriol) may be negligible. Not only the amount of oestrogens from adipocytes is far smaller, but also the form of oestrogens (i.e., oestrone rather than oestradiol) is less biologically potent(Reference Nelson and Bulun44). However, the conclusion of the association between dietary patterns and breast cancer differed by the menopausal status was not consistent(Reference van den Brandt and Schulpen45); also, this meaningless premenopausal association may be because of the sample size, a small sample size reduces the power of the study and increase the margin of error, more in-depth and cross-validation studies are needed. Furthermore, among postmenopausal women, we found that the association between diet and BC risk could be affected by some well-established or potential non-dietary risk factors, the interaction between these factors and compliance with Picky dietary pattern associated with BC risk appears to be complicated and some factors increased the risk of association between Picky dietary pattern and BC risk, but some others weakened this association. Notably, we found that the Picky pattern was at higher risk in non-alcohol drinkers and non-HRT users, implying that some strong independent risk factors for BC may mask the association between diet and BC risk. For example, oestrogens appear to be a dominant driver of BC in postmenopausal women. In the absence of excess oestrogens from the ovaries and HRT, variation in oestrogen levels because of the different dietary patterns may be sufficient to distinguish the risk of BC. In contrast to HRT users, exogenous oestrogens from HRT may raise plasma oestrogens to the extent that endogenous oestrogens from the diet have a little incremental effect. Further work should assess associations of BC risk and the concentrations of these nutrients in plasma, which may be more predictive for vivo situation and interpretative for disease risk and biological mechanism.
However, several limitations also should be noted. First, data were collected from a case–control study, which might be partially influenced by the biases inherent in case–control designs, included selection bias, recall bias, residual confounding and reverse causality. We only included newly diagnosed BC patients and design the question carefully to reduce the recall bias, and the dietary preference was collected based on composite measures in which it was less likely to cause information bias on specific foods/food groups. We try to reduce the influence of residual confounding on conclusions through stratified analysis. Also, we analysed the potential effect of selection bias in the current study, and it does not change the conclusion. Another limitation is the lack of information on the receptor status of breast tumours, which might lead to an underestimated impact of diet on BC risk. However, some studies reported that the associations between dietary patterns and BC risk did not change substantially by receptor status(Reference Castello, Boldo and Perez-Gomez46,Reference Harris, Willett and Vaidya47) . Lastly, the results obtained through LCA tend to be highly data-driven and require cross-validation with other independent samples in the future.
Conclusions
In conclusion, we found that LCA is a useful approach to capture dietary patterns within complex dietary data of high inter-individual variation and to derive interpretable dietary patterns suitable for associating with health outcomes. Our findings further support the hypothesis that the combinations of specific dietary factors protect against BC and may be involved in the mechanisms of action on breast carcinogenesis.
Acknowledgements
Acknowledgements: We appreciate all study participants for their contributions. The authors thank the entire data collection team. Incident BC cases and controls for the current study were collected by Wuxi Center for Disease Control and Prevention, Jiangsu Center for Disease Control and Prevention. Financial support: The current study was supported by the World Cancer Research Fund (2011/RFA/473) and Wuxi Young Medical Talents (QNRC035). Conflict of interest: No. Authorship: The authors’ responsibilities were as follows: M.W., S.C., S.R.L. and P.M.W.: designed and conducted the study; J.Y.Z., Z.Z., L.W., J.S., H.Y., W.D., L.C., Y.Q.D. and Y.Q.: developed plant-based diet indices and data collection; S.C. and M.W.: performed the statistical analyses and drafted the manuscript; S.C., M.W., P.M.W. and Z.Z.: interpreted the data, critically revised the manuscript and had full responsibility for the analyses and interpretation of the data; S.C.: full access to all study data; and all authors: read and approved the final manuscript. Ethics of human subject participation: The current study was conducted according to the guidelines laid down in the Declaration of Helsinki and all procedures involving human subjects/patients were approved by the Jiangsu Center for Disease Control and Prevention ethical committee. Written informed consent was obtained from all subjects/patients. All participants signed written informed consent.
Supplementary material
For supplementary material accompanying this paper visit https://doi.org/10.1017/S1368980020004826