Hostname: page-component-cd9895bd7-gvvz8 Total loading time: 0 Render date: 2024-12-25T05:57:33.310Z Has data issue: false hasContentIssue false

Predicting Unpaid Care Work in India Using Random Forest: An Analysis of Socioeconomic and Demographic Factors

Published online by Cambridge University Press:  28 November 2024

Saumya Tripathi*
Affiliation:
Department of Social Work, Binghamton University, Binghamton, NY, USA
Rights & Permissions [Opens in a new window]

Abstract

Given the complexity of unpaid care work in the Indian context, this study employs advanced machine learning techniques to unveil hidden patterns within the 2019 time-use survey dataset. The study pursues a dual objective: (1) assessing the superior predictive capability of machine learning over traditional statistical methods in estimating unpaid care work time, and (2) unveiling the sociodemographic determinants of extended unpaid care work durations. The results emphasise the exceptional predictive performance of machine learning, notably the random forest analysis, with a noteworthy 9 per cent improvement in forecast accuracy. Key determinants influencing unpaid care work time encompass gender, employment status, marital situation, and age. Findings underscore the heightened vulnerability of young married women without employment, who face amplified unpaid care work demands, exacerbating related challenges and risks. It further highlights the country’s imperative for a comprehensive care framework to mitigate caregiving constraints hindering women’s equitable participation in evolving economic paradigms.

Type
Article
Copyright
© The Author(s), 2024. Published by Cambridge University Press

Introduction

The global prevalence rate for time spent on unpaid care work (UCW) is estimated to be more than 75 per cent for women across different countries (Charmes, Reference Charmes2019). It is three times greater in high-income countries, five times greater in middle-income countries, and seven times greater in low-income countries, including South Asia (United Nations Development Programme [UNDP], 2015; World Bank, 2019). The gender disparity in time spent on UCW is alarming in India. Approximately 90 per cent of women in India spend six hours per day on UCW compared to 27 per cent of men who engage in similar activities for thirty six minutes per day (Radhakrishnan et al., Reference Radhakrishnan, Sen and Singaravelu2020).

UCW is defined as any services provided free of monetary cost within a household for its members. It includes household and domestic chores such as cooking, cleaning, laundry, water, fuel collection, child care, and elder care (ILO, 2018). Time spent on UCW subsidises the cost of care that sustains and supports families, communities, and economies. It further fills in the social services and care gaps. According to UN Women (2019), UCW is economically valued at 10 to 39 per cent of the gross domestic product and contributes to the economy more than the manufacturing, commerce, or transportation sectors. Yet, those who contribute to it remain outside the primary economic purview of the Systems of National Account (SNA).

Although empirically untested, sociodemographic factors such as household income, employment status, education level, age, marital status, household size, geographical location, and social group may also determine women’s time spent on UCW. Recent evidence highlights that time spent on UCW is linked with adverse employment and educational outcomes for women in India (Tripathi and Zhai, Reference Tripathi and Zhai2023). However, we lack information about what best predicts their increased time spent on UCW, and to what extent these sociodemographic factors are responsible for it.

Utilising machine learning methods can enhance predictions of UCW time compared to traditional statistical approaches like OLS regression. It uncovers key socio-demographic factors influencing women’s time spent on UCW that traditional approaches may overlook. By analysing large datasets, such as comprehensive national surveys, it can reveal hidden patterns and offer more precise insights. For instance, machine learning can analyse time-use surveys and demographic data to explore why urban women may spend more time on UCW than rural women. It might show that limited formal childcare services in cities, compared to informal care available in rural areas, is a significant factor. These insights can help policymakers improve childcare access or create a national care framework to reduce the unpaid care burden. Additionally, this methodology provides flexibility through customisation and scalability for specific research goals, offering advantages over traditional methods.

However, existing studies haven’t leveraged machine learning to understand UCW time factors fully, highlighting a literature gap. To address some of this gap, the present study seeks to examine: (1) whether the utilisation of machine learning methods will better predict time spent on UCW over traditional statistical methods (e.g., OLS regression) and (2) the sociodemographic factors that best predict time spent on UCW. The study’s implications extend to practice and policy, offering insights for reducing UCW time at a systemic level and informing gender-equitable care initiatives and policies. Ultimately, it aims to ensure universal access to care services without compromising individual capabilities.

Literature review

Hirway (Reference Hirway2015) argues that UCW is labeled as ‘women’s work’, and social-cultural norms shape the pattern where women perform the lion’s share of it. Prior research indicates that increased time spent on UCW is correlated with adverse health, mental health, social, and economic outcomes. Time-use surveys from different countries also show that women who invest more time in UCW are more likely to suffer in terms of their health and mental health, as well as experience wage discrimination and parental penalties (Hochschild and Machung, Reference Hochschild and Machung2003; Sanchez, Reference Sanchez2015). Factors such as geographical location, caste, household size, and social norms influence this time commitment, with rural women spending approximately seven hours daily on UCW compared to six hours for urban women (Kamdar, Reference Kamdar2020). The Gates Foundation (2020) reports that girls in low- and middle-income countries who spend over four hours daily on household tasks are 28 per cent less likely to be enrolled in school, limiting their educational and future labor force participation.

In recent decades, social policy discourse has significantly advanced, particularly highlighting the importance of UCW within families, discussing work-family balance, and addressing the gendered dynamics of welfare institutions (Reimat, Reference Reimat and Diebolt2019; Zumbyte, Reference Zumbyte2024). Despite this progress, mainstream perspectives still struggle to fully integrate these insights regarding care, gender power dynamics, and dependency. Policies such as the Maternity Benefit Act of 2017 and the promotion of ‘universal child care’ inadvertently reinforce the societal expectation of women performing UCW (Gopal, Reference Gopal2006), perpetuating women’s subordinate position in Indian society. Welfare approaches adopted by nations offering extensive publicly funded childcare and specific parental leave for fathers have shown the potential to reduce gender gaps in unpaid housework and childcare (Cornwell et al., Reference Cornwell, Gershuny and Sullivan2019; Salin et al., Reference Salin, Ylikännö and Hakovirta2018). However, implementing such approaches in the Indian context, given its male breadwinner model and low female labor force participation (Klasen and Pieters, Reference Klasen and Pieters2015), presents challenges and complicates understanding the issue from both structural and capabilities perspectives.

This study applies Sen’s capability framework, which emphasises women’s freedom to achieve well-being by allowing them to make meaningful choices based on available resources and opportunities. Increased time spent on UCW is viewed as a result of capability deprivation, influenced by low employment status, education levels, social ignorance, systemic oppression, lack of material resources, or negative consciousness (Sen, Reference Sen1980, Reference Sen2006). It highlights how various disadvantages limit women’s ‘capabilities’ – their ability to make significant decisions and achieve valued outcomes. In India, women’s caregiving responsibilities are shaped by overlapping socio-cultural, economic, and institutional factors (Janiso et al., Reference Janiso, Shukla and Reddy2021). The intersection of caste, religion, rural or urban residence, and age intensifies caregiving burdens and limits women’s access to employment, education, and personal development.

Women from disadvantaged sociodemographic backgrounds – such as those belonging to minority castes or religions, with limited education and employment opportunities, or being young and married – are often pushed into caregiving roles due to societal expectations, as shown in Figure 1 (Folbre, Reference Folbre2006). This dual role of caregiving and facing structural barriers, including caste discrimination and limited access to opportunities, restricts their overall well-being.

Figure 1. Exploring disparities in unpaid care work: an intersectional analysis of women’s capabilities.

This figure presents an intersectional analysis of women’s capabilities concerning disparities in UCW. It highlights the interplay of various factors and underscores the complexities of unpaid care work across different demographic contexts.

Theoretical frameworks used to analyse large datasets, such as household or time-use surveys, often reveal inconsistencies. As a result, innovative approaches like machine learning are necessary to identify patterns in previously unexplored variables crucial for addressing social issues. Grimmer et al. (Reference Grimmer, Roberts and Stewart2021) suggest that the wealth of data in social sciences enables a shift from traditional deductive methods to more interactive, inductive approaches. Machine learning, in particular, allows for the analysis of more complex relationships between variables beyond the limitations of linear models (Lundberg et al., Reference Lundberg, Brand and Jeon2022).

Scholars examining UCW in low- and middle-income countries, especially India, have noted the limitations of traditional statistical models. While regression analyses have linked factors like education, employment, and marital status to UCW involvement, these models often lack predictive power. Additionally, 40 per cent of these studies use qualitative methods like content analysis or phenomenology (Chopra and Zambelli, Reference Chopra and Zambelli2017; Tasnim, Reference Tasnim2020), while 60 per cent rely on secondary data and regression techniques (Janiso et al., Reference Janiso, Shukla and Reddy2021; Sinha et al., Reference Sinha, Sedai, Rahut and Sonobe2024). Machine learning, with its ability to handle large datasets and uncover nuanced insights, aligns with Sen’s capability approach by providing a deeper understanding of individuals’ functioning and opportunities. This facilitates the development of more effective, equitable policies.

Methods

Data and sample

The National Sample Survey Office (NSSO) conducted the Time Use Survey (TUS) in four rounds (Mospi, 2019). The primary objective of TUS is to measure the participation of men and women in both paid and unpaid activities, including unpaid caregiving and domestic services. The survey spanned one year, starting on January 1, 2019, before the COVID-19 pandemic in India. The year was divided into four sub-rounds of three months each:

Sub-round 1: January – March 2019

Sub-round 2: April – June 2019

Sub-round 3: July – September 2019

Sub-round 4: October – December 2019

A stratified two-stage sampling design was used to recruit a nationally representative sample of 272,117 individuals from rural areas and 173,182 from urban areasi. Women constituted 49.6 per cent of the total sample, representing both rural and urban populations. The survey employed a time-diary methodology, where participants recorded their activities in precise time intervals over a designated periodi. Data was collected for each household member aged six years and older, covering a twenty four-hour period from 4:00 AM the day before the interview to 4:00 AM on the day of the interview. This twenty four-hour period was divided into forty eight time slots, each lasting thirty minutes (Mospi, 2019).

Measures

Outcome variable

Consistent with the ILO’s definition, time spent on unpaid care work was measured through two different categories: (1) unpaid domestic services for household members, and (2) unpaid caregiving services for household members. Time was measured in hours a continuous measurement. When multiple activities were performed during a given period of time, the prime activity was taken into account.

Predictors

Socio-demographic characteristics included age, marital status, education level, employment status, monthly household expenditure, household size, geographical location, (i.e. rural vs. urban settings), religion, social group, and available infrastructure such as source of cooking, energy, and type of washing clothes.

Data analysis

A variety of machine learning prediction methods, including random forest, were used to evaluate and compare predictive performance. Analyses were performed in Python using the package Scikit-learn (Pedregosa et al., Reference Pedregosa, Varoquaux, Gramfort, Michel, Thirion, Grisel and Duchesnay2011). Along with predictive analytics, this approach also allowed us to determine the importance of each socio-demographic factor in predicting time spent on UCW. According to Hindman (Reference Hindman2015), machine learning can outperform standard regression analyses in predictive ability when studying complex social issues. The demand for explainable or interpretable models has been especially pronounced in the social science field (Guidotti et al., Reference Guidotti, Monreale, Matwin, Pedreschi, Brefeld, Fromont, Hotho, Knobbe, Maathuis and Robardet2020). It is not only of great significance to predict the socio-economic outcome of time spent on UCW but also to take features associated with individuals (e.g. age, educational attainment, place of residence, marital and employment status, etc.) into account in an explainable and quantifiable manner.

Although traditionally not applied to unpaid care work, these techniques have been utilised in related fields such as social and family demography (Arpino et al., Reference Arpino, Gumà and Julià2018; Kinza, Reference Kinza2019). Unlike standard regression-based methods, approaches like random forests do not impose a rigid model linking an outcome variable to independent variables. Instead, they allow the algorithm to identify the relationship between excessive time spent on unpaid care work and independent socio-demographic variables, automatically detecting nonlinearities and interactions among predictors. It offers flexibility and scalability compared to traditional methods, rendering it suitable for various tasks in social research and policy, including intervention modeling, pilot policy adaptation, and predictive assessment.

In this study we compare different feature importance measures using both linear (ordinal least square (OLS), Lasso, Ridge, and Elastic Net regression with regularisation of the feature coefficients to exactly zero or with less weight) and non-linear (random forest) methods. The term ‘feature importance’ pertains to the variables that exhibit a stronger association with the ultimate outcome and make a more significant contribution to the variation of the dependent variable (Saarela and Jauhiainen, Reference Saarela and Jauhiainen2021). In general, random forest models are trained using as many features as possible, with the algorithm subsequently returning a list of the most salient features for prediction. Nonetheless, the accuracy of target predictions can be substantially improved by prudently selecting appropriate features.

According to literature, although simple linear classification models are easier to understand and interpret, their performance is often outperformed by non-linear models (Casalicchio et al., Reference Casalicchio, Molnar, Bischl, Berlingerio, Bonchi, Gärtner, Hurley and Frim2019; Horn et al., Reference Horn, Pack, Rieger, Cellier, Driessens, Cellier and Driessens2020). To identify early risks and evaluate the efficacy of a model in terms of effecting change in an undesired outcome, it is necessary to provide explanations that highlight areas of improvement (Molnar et al., Reference Molnar, Casalicchio and Bischl2020; Zien et al., Reference Zien, Krämer, Sonnenburg, Rätsch, Buntine, Grobelnik, Mladenić and Shawe-Taylor2009). In the present investigation, random forests were utilised as a supervised machine learning approach, which leverages the averaging of multiple decision trees and is well-suited for capturing intricate nonlinearities and interactions among input variables (Breiman, Reference Breiman2001). The assessment of predictor importance within the random forest approach is based on the mean decrease impurity (MDI) and the permutation importance metric (Breiman, Reference Breiman2001). MDI also known as ‘Gini importance’, is the most popular feature importance method for random forests and serves as the default measure in scikit-learn (Friedman et al., Reference Friedman, Breiman, Olshen and Stone1984). MDI is defined as the total decrease in node impurity, weighted by the probability of reaching that node, and averaged over all trees in the ensemble. The construction of such models involved the application of the MDI metric, whereby the relevance of each input variable was established based on its capacity to minimise the uncertainty encountered during the development of decision trees. The variable that proved most effective in reducing this uncertainty and was employed most frequently in the trees was designated as the most important predictor (Andersson et al., Reference Andersson, Bathula, Iliadis, Walter and Skalkidou2021).

However, impurity-based importances can be biased towards features with high cardinality and are computed using training set statistics, which may not reflect the feature’s generalisability to the test set (Browne et al., Reference Browne, Matteso, McBride, Hu, Liu, Sun and Barrett2021). As an alternative, permutation importance is computed on a held-out test set. This method calculates feature importance by shuffling the values of each feature and measuring the resulting decrease in model performance. A feature with a high permutation importance score significantly decreases the model’s performance when shuffled, indicating its importance for the model’s predictions.

In the context of UCW and identifying important features using permutation importance, this was calculated after the model had been fitted, ensuring it didn’t alter the predictions. It entails randomly shuffling a single variable while keeping the target and other variables constant. A substantial decrease in the model’s accuracy when a particular feature is shuffled suggests that the model heavily depends on that feature for predictions. This approach quantifies the reduction in model performance caused by random shuffling. To address variability, the process is repeated multiple times, and the value after the ± symbolises the variation in performance across different shuffles (Agarwal et al., Reference Agarwal, Kenney, Tan, Tang and Yu2023). Applying random forest on the data led to a high predictability but not very high in test data (still higher than linear models), so we did regularisation by adding a minimum number of samples in each leaf of random forest to predict with the same accuracy on both train and test datasets.

To prevent overfitting and ensure generalisability, the dataset was randomly split into a training set (75 per cent) and a test set (25 per cent). The training set was used for model training, while the test set was reserved for evaluation. Model optimisation was achieved using the min_samples_leaf hyperparameter, which defines the minimum number of samples required in a leaf node. The consistency of performance metrics between the training and test sets confirmed the absence of overfitting.

A subsequent examination was conducted exclusively on data pertaining to female participants, in which the gender variable was omitted due to its significance as the primary feature in both linear and non-linear models throughout the analysis. As expected, the exclusion of the most influential factor (gender) resulted in a decrease in predictive capacity. However, superior predictions were still obtained compared to linear models, and the findings remain relevant to the female participant dataset, consistent with existing literature.

Results

Male participants had an average age of thirty eight point four zero years, with a standard deviation of sixteen point three three years, while female participants had an average age of thirty seven point four seven years, with a standard deviation of fifteen point seven three years. Additionally, there is a notable difference in the allocation of time spent on UCW between men and women. Women, on average, dedicated significantly more time to these activities than men, reflecting a prevalent traditional gendered time-use pattern as depicted in Table 1. Furthermore, men had higher levels of educational attainment and more favourable employment outcomes when compared to women. For instance, there was a significant contrast in educational attainment, with 32.2 per cent of women classified as non-literate, in contrast to only 17.5 per cent of men falling into the same category. Only 10 per cent of women had completed undergraduate studies, whereas 13.2 per cent of men had achieved this milestone. Substantial disparities were identified in the employment rates of men and women, with 76.7 per cent of men being employed, a significantly larger proportion than the 20.8 per cent of women engaged in paid employment.

As presented in Table 2.1, our study employed various models, encompassing both linear and non-linear approaches, each incorporating different levels of regularisation. This choice was made due to the limited prior knowledge available regarding the precise relationship between socio-demographic factors and the time allocated to UCW. The empirical findings, derived from an array of models, including linear, Lasso, Ridge, and Elastic Net, consistently yielded an R-squared (R2) value of zero point five four (55 per cent) across the entire model spectrum (see Table 2.3). In essence, this R² value signifies that our model effectively elucidates 55 per cent of the variability in the time dedicated to UCW, thereby providing substantial explanatory power regarding this phenomenon (R² = 0.54).

The analysis identified several socio-demographic variables, including gender, marital status, employment status, age, and education, as having significant impacts on increased time devoted to UCW. The coefficients, as presented in Table 2.1, demonstrated stability across all models (β = 2.61, β = 2.60, β = 2.61, β = 2.61), underscoring the pivotal role of these demographics in predicting UCW. Moreover, in terms of model significance, gender (female) emerged as a particularly influential predictor of UCW, with employment and marital status exhibiting notable significance as well. To elucidate further, being female exhibited a robust association with an increased allocation of time to UCW. Comparable outcomes were observed when analysing a sub-cohort composed exclusively of women. It is noteworthy that marital and employment statuses significantly contributed to the duration of UCW hours.

The coefficient (β = 2.61, β = 2.60, β = 2.61, β = 2.61 indicates that being female is linked to spending an additional two point six one units of time on UCW compared to being male. This positive coefficient quantifies the gender disparity in UCW, highlighting that, on average, females dedicate two point six one more hours per week (approximately three hours) to UCW than males. This substantial difference underscores the significant influence of gender on time allocation to UCW, with females typically shouldering a greater portion of these responsibilities.

Similarly, marital status is associated with spending an additional two point five zero units of time on UCW compared to being never married (β = 2.50, β =2.50, β =2.48, β =2.47). This coefficient quantifies the extent of the disparity, indicating that, on average, married females spend two point five more hours per week (approximately three hours) on UCW compared to females who were never married.

Employed individuals (β1 = 2.11), in all models, show a significant increase in UCW time, spending approximately two point one one additional hours per week on UCW compared to those who are non-employed. Conversely, self-employed individuals (β2 = -0.51, β2 = 0.73, β2= 0.66, β2= 0.66) demonstrate a minor decrease in UCW time, with a reduction of approximately zero point five one units compared to the non-employed group in model 1. Despite the negative coefficient, this effect size is relatively small. Pursuing higher education (β3 = 0.74) is linked to a moderate increase in UCW time in model 1, with individuals dedicating an additional 0.74 units of time to UCW compared to their non-employed counterparts. Whereas (β3 = −0.51, β3= −0.49, β3= −0.5) in other models shows a minor decrease with effect sizes relatively small.

While it is evident that being female is associated with an increased engagement in UCW, our objective was to further dissect the contributing factors specific to the female population. Therefore, we conducted a separate analysis using a sample consisting solely of women to identify variables beyond gender that influence the time allocated to UCW. This refined analysis revealed that marital status (β = 3.79, β = 3.79, β = 3.77, β = 3.74), and employment (β = 1.70, β = 1.70, β = 0.71, β = 1.70) emerged as the primary determining factor. As shown in Table 2.2, a consistency was observed in coefficient values and the ranking of feature importance across various models. In summary, our findings underscore that married and non-employed women face a heightened risk of devoting a substantial portion of their time to UCW, potentially at the expense of engaging in productive work activities.

Subsequently, a feature importance analysis was conducted through a random forest approach. Specifically, this method employed the MDI metric, which quantifies the reduction in model impurity (or error). The outcomes of this analysis, as evaluated by MDI, are visually presented in Figure 2. Notably, when assessed through MDI, employment status emerged as the most influential feature by a substantial margin. Following closely in second place is the feature of age. To validate the findings acquired through the MDI evaluation, an alternative approach was also employed. This involved determining feature importance by comparing a model trained with the original dataset to a model trained with a dataset in which one of the inputs was randomly shuffled, referred to as feature permutation (FP).

Figure 2. Random forest permutation importance metric.

This figure presents the permutation importance scores for socioeconomic and demographic factors affecting UCW time derived from a random forest model. The scores indicate the reduction in model accuracy on test data when each feature’s values are permuted, thereby highlighting the contribution of each factor to predicting unpaid care work time.

The random forest analysis yielded a training accuracy score of zero point six four and a test accuracy score of zero point six three (see Figure 3). This demonstrates its superior predictive capability when compared to conventional methods for estimating women’s time allocation to UCW. The fitted random forest models supplied a hierarchical assessment of variable importance, and these assessments were averaged across all models. Among the predictors, gender, employment status, and age emerged as particularly influential factors in determining the time devoted to UCW. Similarly, when we replicated the analysis with a subsample comprising exclusively women, the same variables – employment status, marital status, and age – retained their status as significant predictors.

Figure 3. Regularisation.

This graph illustrates the impact of regularisation techniques on estimating UCW. It demonstrates the relationship between the degree of regularisation applied and the resulting model coefficients, emphasising how regularisation affects predictive accuracy and model stability. Additionally, the graph identifies optimal regularisation parameters that minimise overfitting and improve model generalisation.

Discussion and social policy implication

Using machine learning to study UCW in India has shown how the time spent on care work differs between men and women based on their capabilities and factors such as whether they are married or not, where they live, their education, employment status, and age. The results underscore the superiority of machine learning techniques over traditional regression methods, with the random forest analysis demonstrating an almost 9 per cent enhancement in predictive accuracy (random forest = 64 per cent vs. linear models = 55 per cent). Remarkably, even after omitting the most influential feature, gender, the predictive performance remained superior to that of linear models, and the study’s conclusions remained consistent. This trend was consistently supported by existing literature (Janiso et al., Reference Janiso, Shukla and Reddy2021; Tripathi et al., Reference Tripathi, Zhai and Azhar2024), reaffirming the robustness of our findings for women-only data. The study conclusively establishes that gender, employment status, marital status, and age stand out as substantial predictors of the time dedicated to UCW. In simpler terms, being a woman inherently increases the burden of time spent on UCW activities. It emphasises the influence of socio-cultural norms and the social positioning of gender within intricate social frameworks. This unequal distribution of UCW responsibilities among women carries various consequences, particularly in terms of employment and economic repercussions for them (Hess et al., Reference Hess, Ahmed and Hayes2020).

The current findings support the arguments made by other scholars in the field of care work to some extent. Studies utilising qualitative data have shown that gender stereotypes perpetuate the notion that UCW is inherently women’s responsibility (Hirway, Reference Hirway2015; Singh and Pattanaik, Reference Singh and Pattanaik2020; Chopra and Zambelli, Reference Chopra and Zambelli2017; Zaidi, Reference Zaidi and Ellina2022). This research, using nationally representative quantitative data, further substantiates this by demonstrating that in South Asian countries, women face significant pressure to perform UCW and consequently spend more time on these tasks. Similarly, sociodemographic factors indicate that women’s UCW is influenced by marital status, age, employment status, and household size, as noted in previous studies. For instance, earlier research has shown a correlation between time spent on UCW and women’s employment status (Tripathi et al., Reference Tripathi, Zhai and Azhar2024). However, a novel finding in our study is that marital status emerged as a stronger predictor of women’s time spent on UCW, even more so than employment. This aspect has not been captured in previous TUS studies. This finding aligns with Allendorf (Reference Allendorf2017) qualitative analysis that daughters-in-law in India often bear the majority of care work responsibilities. Thus, our findings align with studies utilising narrative data and provide a quantitative perspective, showing strong predictive numbers.

Furthermore, additional socio-demographic factors, such as being married and non-employed, exacerbate this burden. The findings also indicate that young married women without employment are especially vulnerable to devoting more time to UCW, compounding the associated risks and challenges. Similarly, when the analysis was restricted to the women-only sample, marital and employment status retained their significance as predictors, with age following closely behind. In alignment with the results from a previous study that employed the same dataset, educational attainment did not exhibit any significant influence on the time allocated to UCW (Tripathi and Zhai, Reference Tripathi and Zhai2023). This holds particularly true for young women, signifying that whether women possess a graduate degree or not, they still dedicate a similar amount of time to UCW. Simultaneously, employment status emerges as a significant predictor of their engagement in UCW. This underscores the crucial role of employment policies in mitigating the gender disparity evident in UCW (Tripathi et al., Reference Tripathi, Zhai and Azhar2024).

Tackling the burden of UCW necessitates a thorough reassessment of current strategies, with a focus on redistributing policy frameworks, recognising and backing UCW, and incentivising fairer distribution of care duties across genders and households, including single-parent households. Understanding the primary predictors of high UCW can inform policy actions. For instance, if gender is a significant predictor, policymakers can devise initiatives that integrate care policies targeting socio-cultural norms and patriarchy to alleviate the burden of UCW.

Sociocultural norms shaped by patriarchy and the labeling of UCW as women’s work are critical factors requiring further investigation to address the gendered division of labor in India. India’s current TUS data lacks information on these important variables. Future studies should aim to address this through more robust data collection. Early marriage, common in India (Allendorf, Reference Allendorf2017), often forces women to dedicate substantial time to UCW from a young age, limiting their capabilities and opportunities for employment, higher education, and entrepreneurship. This cycle starts in youth with household responsibilities that expand post-marriage to include reproductive, childcare, and eldercare tasks (ILO, 2018). These strenuous demands adversely affect women’s health, leading to early health issues and increasing the burden on the healthcare system. Without financial resources and employment opportunities, these women cannot meet their own needs, perpetuating poverty at individual and community levels and hindering the developmental potential of future generations.

Evidently, women who are not employed face a higher likelihood of engaging in UCW, underscoring the importance of identifying target areas, such as employment policies, and integrating them with care policies within the country to enhance productivity. The study underscores the necessity of reforming marital laws and policies to make them more inclusive and flexible, allowing men and individuals of other genders to partake in certain practices traditionally associated with women. The study recommends promoting the ‘dual-earner care model’ prevalent in Nordic countries and increasing women’s employment in India through flexible working arrangements and care initiatives to reduce the gender disparity in time spent on UCW (OECD, 2019). Findings re-iterate the Connelly et al., (Reference Connelly, Dong, Jacobsen and Zhao2018) argument of building a care system in the country that addresses the caregiving constraints impeding women’s capabilities to benefit equally from the new economic policies and reforms.

In alignment with findings in existing literature that focus on the interpretability of results (Tripathi et al., Reference Tripathi, Azhar and Zhai2022, Reference Tripathi, Zhai and Azhar2024), age has emerged as a noteworthy predictor, too. Specifically, our analysis underscores that the younger generation, particularly women, faces an elevated risk of allocating a greater proportion of their time to UCW. The country’s comprehensive social policies, spanning marital regulations and labor laws, should prioritise the protection of our younger population, particularly women, from the obligations of involuntary UCW. By adopting such measures, we can harness their potential to contribute significantly to our nation’s socio-economic progress. To attain this objective, various strategies may be set in motion. One significant step involves the reconsideration of the legal marriage age, which currently stands at eighteen years. This legal framework was established centuries ago, at a time when child marriages were at their peak in India (Dasra, 2021). Modifying this statute could empower women by permitting them to delay their marital obligations and concentrate on their professional ambitions. Furthermore, an alternative approach involves the expansion of employment opportunities across all sectors, including the private, public, and nonprofit sectors. Additionally, enforcing gender parity within organisations can establish a more inclusive and equitable work environment that benefits women (Yoon, Reference Yoon2014).

Public investment is essential in reducing the disproportionate burden of unpaid work that primarily falls on women (Folbre, Reference Folbre2006). However, achieving true gender equity also necessitates strategies for equitable distribution of this work, ensuring it is shared among various stakeholders in the system. This entails collaborative efforts involving the government, healthcare, childcare (including daycare and schools), as well as the business and employment sectors. Elevating women’s engagement in the workforce hinges on the pivotal factor of fair compensation for employed women (Goldin et al., Reference Goldin, Kerr and Olivetti2022). Our research, alongside other studies, consistently underscores the role of equitable pay in shaping women’s caregiving responsibilities. Consequently, it’s imperative to not only promote men’s active engagement but also foster broader systemic involvement in unpaid caregiving and domestic duties as part of the solution.

Application of this methodology has given us detailed insights that help design targeted interventions for different groups of women. For example, our findings show that rural women with less education face caregiving challenges due to a lack of affordable caregiving options. This suggests introducing targeted policies like subsidised childcare to address the issue. The analysis helps us identify where the caregiving burden is heaviest and which factors are most important, allowing us to allocate resources more effectively and ensure support reaches those who need it most. Through this methodology, we also know how caregiving intersects with other issues like employment and education, leading to more integrated policies that address various aspects of women’s lives. Additionally, current models can be updated with new data to track and improve the effectiveness of interventions over time. Overall, the findings provide data-driven insights that lead to more precise and effective solutions for women’s caregiving challenges.

It is crucial to recognise that caregiving is a continuous and essential aspect of our society. The challenge lies not only in how we address this issue in our policy documents but also in establishing its sustainability for all members involved in caregiving within society. Interventions at the societal level should adopt an inter-sectoral approach. For instance, redistributing caregiving responsibilities can alleviate the disproportionate burden on women. This redistribution should be framed as a collective responsibility of the state, involving various stakeholders such as daycare, healthcare, and the education system at the community level. This collaborative effort can help share the responsibility while simultaneously maximising employment opportunities within these sectors. Consequently, women who were previously unemployed and were engaged in unpaid care labor can enter the workforce. This approach not only benefits societal growth but also enhances the overall well-being of women within the community, resulting in a mutually advantageous situation.

Conclusion

The allocation of time to UCW, especially by women, has been a long-standing topic among development practitioners and scholars. Traditionally, researchers have relied on qualitative data to explore the nuances of UCW and its impact on women’s lives. However, literature on this subject is limited in the South Asian context, with few studies using quantitative data and most focusing on correlations rather than predictive factors. This study addresses this gap by using a nationally representative TUS and predictive modeling to identify the drivers behind the significant amount of time women spend on UCW. By employing machine learning over traditional statistical methods, the study demonstrates a nearly 9 per cent improvement in predictive accuracy with random forest analysis. Gender emerges as a significant factor, highlighting the need to break gender stereotypes and introduce women-friendly policies in caregiving, employment, and education. The analysis, initially conducted on the total sample and later focused on women, also identifies employment status, marital status, and age as key predictors of time spent on UCW. Notably, being married significantly increases the time women dedicate to care work.

The findings show that care work policies alone are insufficient to reduce the UCW burden in India. It is essential to account for the intersectionality of women’s capabilities when assessing caregiving dynamics. For instance, proposals for daycare centres or nursing homes must consider whether they will benefit a small segment of already economically empowered women or those with limited financial mobility, awareness, and decision-making freedom. As scholars, we must adopt a holistic perspective that not only addresses caregiving responsibilities but also seeks to dismantle sociodemographic inequalities limiting women’s agency. This approach involves integrating strategies across sectors – such as education, employment, and care services – while considering how intersecting factors shape women’s caregiving experiences in India. As a pilot initiative, government-sponsored daycare centres and nursing homes in poor rural areas can help increase women’s agency and reduce their caregiving burdens. Caregiving policies need to be integrated with employment, family, and marriage laws and efforts to raise the marriage age for women. Modernising these policies could enhance systemic efficiency. Employment practices should also incorporate care provisions to avoid unintentional discrimination in hiring or compensation decisions. In summary, our analysis reveals key determinants of women’s UCW time, offering insights for systemic improvements to reduce the burden on women.

Funding

This work was supported by WG-USA Small Grant.

Competing interests

No potential conflict of interest was reported by the authors.

Ethics approval

Received IRB approval from the Fordham University on 06.24.2021.

Appendix

Table 1. Sociodemographic characteristics (N = 409,371)

This table outlines the distribution of key sociodemographic variables – age, gender, education level, employment status, marital status, and time spent on unpaid care work – offering critical insights into the demographic context essential for understanding the dynamics of unpaid care work.

Table 2.1 Factors predicting time spent on unpaid care work

This table presents the key socioeconomic and demographic factors influencing the time individuals allocate to unpaid care work. It highlights significant predictors such as gender, employment status, education level, and marital status, providing a comprehensive overview of the variables that shape unpaid care work responsibilities.

Table 2.2 Factors predicting women’s time spent on unpaid care work

This table details the key socioeconomic and demographic factors affecting women’s time allocation to unpaid care work. In line with previous models that recognise gender as a primary predictor, it also emphasises other significant intersectional factors, such as marital status, employment status, and education level, that contribute to the feminisation of unpaid care work in the country.

Table 2.3 Comparison of R² values across different regression models

This table presents the R² values for different regression models assessing individual time spent on UCW and women’s allocation of time to UCW. The R² value reflects the proportion of variance in UCW time explained by sociodemographic factors. Higher R² values indicate better model fit and predictive accuracy, facilitating comparisons of model performance across various analytical methods.

References

Agarwal, A., Kenney, A. M., Tan, Y. S., Tang, T. M. and Yu, B. (2023) ‘MDI+: a flexible random forest-based feature importance framework’, Statistics Methodology, 182. Available at: https://arxiv.org/pdf/2307.01932 Google Scholar
Allendorf, K. (2017) ‘Like her own: ideals and experiences of the mother-in-law/daughter-in-law relationship’, Journal of Family Issues, 38, 15, 21022127.CrossRefGoogle Scholar
Andersson, S., Bathula, D. R., Iliadis, S. I., Walter, M. and Skalkidou, A. (2021) ‘Predicting women with depressive symptoms postpartum with machine learning methods’, Scientific Reports, 11, 1, 7877.CrossRefGoogle ScholarPubMed
Arpino, B., Gumà, J. and Julià, A. (2018) ‘Family histories and the demography of grandparenthood’, Demographic Research, 39, 42, 11051150.CrossRefGoogle Scholar
Bill and Mellinda Gates Foundation, (2020) Global Education Program. [Online] Available at: https://www.gatesfoundation.org/our-work/programs/global-growth-and-opportunity/global-education-program [accessed 12 Oct 2023].Google Scholar
Breiman, L. (2001) ‘Random forests’, Machine Learning, 45, 1, 532.CrossRefGoogle Scholar
Browne, C., Matteso, D. S., McBride, L., Hu, L., Liu, Y., Sun, Y. and Barrett, C. B. (2021) ‘Multivariate random forest prediction of poverty and malnutrition prevalence’, Plos One, 16, 9, e0255519.CrossRefGoogle ScholarPubMed
Casalicchio, G., Molnar, C. and Bischl, B. (2019) ‘Visualizing the feature importance for black box models.’ in Berlingerio, M., Bonchi, F., Gärtner, T., Hurley, N., and Frim, G. (eds.), Machine Learning and Knowledge Discovery in Databases, Cham: Springer, 655670.CrossRefGoogle Scholar
Charmes, J. (2019) The unpaid care work and the labour market. An analysis of time use data based on the latest World Compilation of Time-use Surveys. International Labour Office- ILO.Google Scholar
Chopra, D. and Zambelli, E. (2017) No Time to Rest: Women’s Lived Experiences of Balancing Paid Work and Unpaid Care Work, Brighton: Institute of Development Studies.Google Scholar
Connelly, R., Dong, X.-Y., Jacobsen, J. and Zhao, Y. (2018) ‘The care economy in post-reform China: feminist research on unpaid and paid work and well-being’, Feminist Economics, 24, 2, 1130.CrossRefGoogle Scholar
Cornwell, B., Gershuny, J. and Sullivan, O. (2019) ‘The social structure of time: emerging trends and new directions’, The Annual Review of Sociology, 45, 1, 301–20.CrossRefGoogle Scholar
Dasra. (2021) Marry Me Latter Preventing Child Marriage and Early Pregnancy in India. Retrieved from https://www.dasra.org/assets/uploads/resources/CHECK_Marry%20Me%20Later%20-%20Delaying%20Marriage%20and%20Pregnancy%20in%20India.pdf Google Scholar
Folbre, N. (2006) ‘Measuring care: gender, empowerment and the care economy’, Journal of Human Development, 7, 2, 183199.CrossRefGoogle Scholar
Friedman, J., Breiman, L., Olshen, R. and Stone, C. J. (1984) Classification and Regression Trees. New York: Chapman and Hall/CRC.Google Scholar
Goldin, C., Kerr, S. P. and Olivetti, C. (2022) The other side of the mountain: women’s employment and earnings over the family cycle. IFS Deaton Review of Inequalities, 1-25.Google Scholar
Gopal, M. (2006) ‘Gender, ageing and social security’, Economic and Political Weekly, 41, 41, 44774486.Google Scholar
Grimmer, J., Roberts, M. E. and Stewart, B. M. (2021) ‘Machine learning for social science: an agnostic approach’, Annual Review of Political Science, 24, 1, 395419.CrossRefGoogle Scholar
Guidotti, R., Monreale, A., Matwin, S. and Pedreschi, D. (2020) ‘Black box explanation by learning image exemplars in the latent feature space’ in Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M. and Robardet, C. (eds.), Machine Learning and Knowledge Discovery in Databases (Vol. 11906). Cham: Springer, 189205.CrossRefGoogle Scholar
Hess, C., Ahmed, T. and Hayes, J. (2020) Providing Unpaid Household and Care Work in the United States: Uncovering Inequality. Institute for Women’s Policy Research.Google Scholar
Hindman, M. (2015) ‘Building better models: prediction, replication, and machine learning in the social sciences’, The ANNALS of the American Academy of Political and Social Science, 659, 1, 4862.CrossRefGoogle Scholar
Hirway, I. (2015) ‘Unpaid work and the economy: linkages and their implications’, Indian Journal of Labour Economics, 58, 1, 121.CrossRefGoogle Scholar
Hochschild, A. and Machung, A. (2003) The Second Shift. Penguin Books.Google Scholar
Horn, F., Pack, R. and Rieger, M. (2020) ‘The autofeat python library for automated feature engineering and selection’ in Cellier, P., Driessens, K., Cellier, P. and Driessens, K. (eds.), Communications in Computer and Information Science (Vol. 1167), Cham: Springer, 111120.Google Scholar
ILO (International Labour Organisation) (2018) Care work and care jobs for the future of decent work. International Labour Organisation.Google Scholar
Janiso, A., Shukla, P. K. and Reddy, B. (2021 ) What explains gender gap in unpaid household and care work in India? Retrieved from General Economics, Cornell University : https://arxiv.org/abs/2106.15376 Google Scholar
Kamdar, B. (2020) India’s Women Bear the Burden of Unpaid Work – With Costs to Themselves and the Economy. [Online] Available at: https://thediplomat.com/2020/11/indias-women-bear-the-burden-of-unpaid-work-with-costs-to-themselves-and-the-economy/ [accessed 30 Oct 2023].Google Scholar
Kinza, N. 2019. Comparing Machine Learning Techniques and Classical Approach for child’s Education and Alternative Activities in case of Pakistan. Islamabad, Pakistan, Pakistan Institute of Developemnt Economics - Registration No: PIDE2016FMPHILETS13.Google Scholar
Klasen, S. and Pieters, J. (2015) ‘What explains the stagnation of female labor force participation in urban India?’, The World Bank Economic Review, 29, 3, 449478.CrossRefGoogle Scholar
Lundberg, I., Brand, J. E. and Jeon, N. (2022) ‘Researcher reasoning meets computational capacity: machine learning for social science’, Social Science Research, 108, 1, 102807.CrossRefGoogle ScholarPubMed
Ministry of Statistics and Programme Implementation (MOSPI) National Statistical Office, Goverment of India. (2019). Time Use in India-2019. http://mospi.nic.in/sites/default/files/publication_reports/Report_TUS_2019_0.pdf Google Scholar
Molnar, C., Casalicchio, G. and Bischl, B. (2020) Interpretable Machine Learning – A Brief History, State-of-the-Art and Challenges. ECML PKDD 2020 Workshops (pp. 417–431). Joint European Conference on Machine Learning and Knowledge Discovery in Databases.CrossRefGoogle Scholar
Organisation for Economic Co-operation and Development (OECD) (2019) Gender Institutions and Development Database (GID-DB). http://oecd.stat.org [accessed March 2020].Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O. and Duchesnay, E. (2011) ‘Scikit-learn: machine learning in Python’, Journal of Machine Learning Research, 12, 1, 28252830.Google Scholar
Radhakrishnan, V., Sen, S. and Singaravelu, N. (2020, September 30) Data | 92% Indian women take part in unpaid domestic work; only 27% men do so. The Hindu. Available at: https://www.thehindu.com/data/92pc-indian-women-take-part-in-unpaid-domestic-work-only-27pc-men-do-so/article32729100.ece [accessed 09 Dec 2020].Google Scholar
Reimat, A. (2019) ‘Gendered welfare regimes, work–family patterns and women’s employment’ in: Diebolt, C., et al. (eds.), Cliometrics of the Family. Studies in Economic History. Cham: Springer, 277303.CrossRefGoogle Scholar
Saarela, M. and Jauhiainen, S. (2021) ‘Comparison of feature importance measures as explanations for classification models’, SN Applied Sciences, 3, 272, DOI: 10.1007/s42452-021-04148-9 CrossRefGoogle Scholar
Salin, M., Ylikännö, M. and Hakovirta, M. (2018) ‘How to divide paid work and unpaid care between parents? Comparison of attitudes in 22 Western Countries’, Social Sciences Work-Family Balance and Gender (In)equalities in Europe: Policies, Processes and Practices, 7, 10, 188.Google Scholar
Sanchez, M. (2015) The third shift: Paid work, care work and education [Master’s, University of Washington]. In ProQuest Dissertations and Theses. http://search.proquest.com/genderwatch/docview/1732168411/abstract/D88BB7B8A2164833PQ/34 Google Scholar
Sen, A. (1980) ‘Equality of what?’ in McMurrin, Tanner Lectures on Human Values. Cambridge University Press.Google Scholar
Sen, A. (2006) ‘Human rights and capabilities’, Journal of Human Development, 6, 2, 151166.CrossRefGoogle Scholar
Singh, P and Pattanaik, F. (2020) ‘Unfolding unpaid domestic work in India: women’s constraints, choices, and career’, Humanities and social sciences communications, 6, 111, DOI: 10.1057/s41599-020-0488-2 Google Scholar
Sinha, A., Sedai, A. K., Rahut, D. B. and Sonobe, T. (2024) ‘Well-Being costs of unpaid care: gendered evidence from a contextualized time-use survey in India’, World Development, 173, 106419. DOI: 10.1016/j.worlddev.2023.106419 CrossRefGoogle Scholar
Tasnim, G. (2020) ‘Making women’s unpaid care work visible in India: importance and challenges’, Journal of International Women’s Studies 21, 2, Article 4. Available at: https://vc.bridgew.edu/jiws/vol21/iss2/4 Google Scholar
The World Bank (2019 ) New country classifications by income level: 20192020. World Bank Blogs. https://blogs.worldbank.org/opendata/new-country-classifications-income-level-2019-2020 Google Scholar
Tripathi, S., Azhar, S. and Zhai, F. (2022) ‘Unpaid care work among women in South Asia: A systematic review’, Asian Social Work and Policy Review, DOI: 10.1111/aswp.12268 CrossRefGoogle Scholar
Tripathi, S. and Zhai, F. (2023) ‘The association between couples’ education and gender gap in unpaid care work in India’, Social Policy & Administration, 58, 5, DOI: 10.1111/spol.12975 Google Scholar
Tripathi, S., Zhai, F. and Azhar, S. (2024) ‘Unpaid care work time and women’s employment status: evidence from India’, The British Journal of Social Work, bcae108. DOI: 10.1093/bjsw/bcae108 CrossRefGoogle Scholar
UN Women. (2019) World survey on the role of women in development: Report of the Secretary-General (2019): Why addressing women’s income and time poverty matters for sustainable development. United Nations.Google Scholar
UNDP (United Nations Development Programme) (2015) Unpaid care work policy brief - Human Development Report 2015: Work for Human Development. New York, NY: UNDP.Google Scholar
Yoon, J. (2014) ‘Counting care work work in social policy: valuing unpaid child and elder care in Korea’, Feminist Economics, 20, 2, 6589.CrossRefGoogle Scholar
Zaidi, M. (2022) ‘Work and women’s economic empowerment in tribal Rajasthan, India’ in: Samantroy & Ellina, N. S., eds. Gender, Unpaid Work and Care in India. London: Routledge, 18.Google Scholar
Zien, A., Krämer, N., Sonnenburg, S. and Rätsch, G. (2009) ‘The feature importance ranking measure’ in Buntine, W., Grobelnik, M., Mladenić, D., and Shawe-Taylor, J. (Eds.), Machine Learning and Knowledge Discovery in Databases (Vol. 5782), Berlin, Heidelberg: Springer, 694709.CrossRefGoogle Scholar
Zumbyte, I. (2024) ‘Support matters: how formal and informal institutions shape young Indians’ work-family preferences’, Journal of Marriage and Family, 86, 3, 593613.CrossRefGoogle Scholar
Figure 0

Figure 1. Exploring disparities in unpaid care work: an intersectional analysis of women’s capabilities.This figure presents an intersectional analysis of women’s capabilities concerning disparities in UCW. It highlights the interplay of various factors and underscores the complexities of unpaid care work across different demographic contexts.

Figure 1

Figure 2. Random forest permutation importance metric.This figure presents the permutation importance scores for socioeconomic and demographic factors affecting UCW time derived from a random forest model. The scores indicate the reduction in model accuracy on test data when each feature’s values are permuted, thereby highlighting the contribution of each factor to predicting unpaid care work time.

Figure 2

Figure 3. Regularisation.This graph illustrates the impact of regularisation techniques on estimating UCW. It demonstrates the relationship between the degree of regularisation applied and the resulting model coefficients, emphasising how regularisation affects predictive accuracy and model stability. Additionally, the graph identifies optimal regularisation parameters that minimise overfitting and improve model generalisation.

Figure 3

Table 1. Sociodemographic characteristics (N = 409,371)

Figure 4

Table 2.1 Factors predicting time spent on unpaid care work

Figure 5

Table 2.2 Factors predicting women’s time spent on unpaid care work

Figure 6

Table 2.3 Comparison of R² values across different regression models