1. Introduction
As the cost and prevalence of obesity continue to soar, it has become more important than ever to produce accurate estimates of the lifetime medical care cost externality. While an abundance of estimates can be found in the literature, the vast majority assume a person remains at the same body mass index (BMI), for their entire lives (Thompson et al., Reference Thompson, Edelsberg, Colditz, Bird and Oster1999; Yang & Hall, Reference Yang and Hall2007; Finkelstein et al., Reference Finkelstein, Trogdon, Brown, Allaire, Dellea and Kamal-Bahl2008). This assumption is contrary to fact. Recent work by Fallah-Fini et al. (Reference Fallah-Fini, Adam, Cheskin, Bartsch and Lee2017) and Schell et al. (Reference Schell, Just and Levitsky2020) has found that age-related weight gain – the fact that a person’s BMI tends to increase significantly as they age accounts for the majority of costs associated with obesity (Fallah-Fini et al., Reference Fallah-Fini, Adam, Cheskin, Bartsch and Lee2017; Schell et al., Reference Schell, Just and Levitsky2020). There are numerous other factors that explain the divergence of cost estimates between studies, including the selection of an appropriate cost model and accounting for differential life expectancy, that are essential to producing accurate and consistent estimates. Additionally, while all of the current lifetime cost studies are associational, recent developments in the use of instrumental variables in the context of obesity research could allow future models to produce credibly identified causal estimates.
For the purpose of cost-benefit analysis, it is important to consider the factors necessary for policymakers to judge the merits of any anti-obesity intervention and to incorporate the best available information. Specifically, a policy-relevant estimate should (i) cover obesity’s costs over the life course, (ii) focus on third-party costs (the externality imposed on others), and (iii) account for changes in BMI over time. Each of these points is developed in the following section.
While most studies on the costs associated with obesity are cross-sectional, policymakers deciding whether a specific obesity intervention provides sufficient benefit at an acceptable cost need to understand the full scope of obesity’s costs over the course of a person’s life. Many of obesity’s sequelae are latent, with higher medical care costs sometimes not appearing for decades (Must & Strauss, Reference Must and Strauss1999). For this reason, an estimate of obesity’s costs at one particular time point provides little insight regarding its true burden, which evolves over a lifetime. Often these costs are lumpy and concentrated around the end of life. Despite the data limitations discussed later in this paper, researchers should focus on producing cost estimates over the longest time possible in order to quantify the complete monetary benefit of interventions.
People with obesity bear only 15 % of their own medical costs, with public and private insurers covering the rest (Wang et al., Reference Wang, Pamplin, Long, Ward, Gortmaker and Andreyeva2015). The question of whether this represents a genuine externality on private insurance is debated, with some researchers arguing that people with obesity pay the price for their condition in the form of higher premiums and lower wages (Bhattacharya & Sood, Reference Bhattacharya and Sood2011). However, the impact of obesity on public insurance, which does not allow for differences in premiums based on medical risk and is not tied to employment, remains a substantial externality. Obesity has an immense impact on public insurance, with the costs associated with obesity representing at least 9.5 % of total Medicare expenditures (Finkelstein et al., Reference Finkelstein, Fiebelkorn and Wang2003; Bhattacharya & Sood, Reference Bhattacharya and Sood2011). Because of the substantial burden of obesity’s medical costs on society through public and private insurance, policymakers and researchers should emphasize the third-party, or external, cost of obesity when considering the cost-benefit of anti-obesity interventions. Age-related weight gain is a persistent phenomenon found in the USA, where people tend to gain weight with a concomitant rise in adiposity as they age (Williams & Wood, Reference Williams and Wood2006). Despite this fact, relatively few lifetime cost estimates of obesity factor age-related weight gain into their models (Finkelstein et al., Reference Finkelstein, Trogdon, Brown, Allaire, Dellea and Kamal-Bahl2008; Fallah-Fini et al., Reference Fallah-Fini, Adam, Cheskin, Bartsch and Lee2017; Schell et al., Reference Schell, Just and Levitsky2020). To understand the sheer complexity of modeling BMI development over time, it is instructive to think about what causes a person to develop obesity in their lifetime.
Obesity occurs as the result of a gradual process during which daily caloric surpluses compound over time. The static model of weight gain, which assumes 3500 surplus calories result in a pound of weight gain, fails to reflect metabolic changes from weight loss or weight gain that create a nonlinear relationship between caloric surplus and weight gain over time (Hall, Reference Hall2007). Inputting such minute and dynamic information in a model to predict whether an adolescent will become an adult with obesity is clearly infeasible. This issue is exacerbated by the fact that no dataset in the US covers a representative sample of Americans’ weight gain trajectories from birth to death, so we must use multiple datasets to monitor BMI state transitions. As a result, researchers must rely on simplifying assumptions to effectively monitor BMI growth curve progression and most have decided to use a Markov model (Tucker et al., Reference Tucker, Palmer, Valentine, Roze and Ray2006; Ma & Frick, Reference Ma and Frick2011; Sonntag et al., Reference Sonntag, Ali, Lehnert, Konnopka, Riedel-Heller and König2015; Fallah-Fini et al., Reference Fallah-Fini, Adam, Cheskin, Bartsch and Lee2017). While Markov models have been widely used in the estimation of BMI growth curves, we found most articles explain relatively little of their methodology or why the Markov model provides an ideal fit for the problem. This manuscript seeks to demystify the recent literature on the lifetime social costs of obesity by detailing the advantages and pitfalls of applying a Markov model to measure age-related weight gain, possibilities for causal inference in future models, discussing methodological considerations in adopting the appropriate cost model, and finally demonstrating how to account for differences in life expectancy between people with obesity and normal weight people. We conclude by discussing data requirements, how age-related weight gain affects cost estimates, and limits to existing estimates and data availability.
2. Literature review
Tucker et al. first accounted for age-related weight gain relying on data from Burton et al. (Reference Burton, Chen, Schultz and Edington1998) to project the cost from 20 to death (Tucker et al., Reference Tucker, Palmer, Valentine, Roze and Ray2006). This work used a multitude of sources to derive cost data, which forces heavy reliance on the background assumptions of these prior cost estimates. Nonetheless, Tucker et al. (Reference Tucker, Palmer, Valentine, Roze and Ray2006) pioneered the use of a semi-Markov state-transition model by using simulated cohorts with a BMI range of 24–45 and ages 20–65 all while accounting for life expectancy discrepancies. They relied on Heo et al.’s (Reference Heo, Faith, Mott, Gorman, Redden and Allison2003) estimates of age-related weight gain to control for the effect of age-related weight gain on cost. This curve was estimated using a hierarchical linear model to piece together a variety of older data sources to predict BMI by age and sex, creating a generic, though perhaps outdated, age-related weight gain curve (Heo et al., Reference Heo, Faith, Mott, Gorman, Redden and Allison2003).
In 2010, Wang et al. (Reference Wang, Denniston, Lee, Galuska and Lowry2010) selected a cohort aged 16–17 from whom to estimate lifetime cost (Wang et al., Reference Wang, Denniston, Lee, Galuska and Lowry2010). They relied on the life expectancy estimates of Finkelstein et al. (Reference Finkelstein, Trogdon, Brown, Allaire, Dellea and Kamal-Bahl2008) and applied a two-part model (2PM) – similar to a double hurdle model – using a logit model to describe the probability of falling into a BMI range and a generalized linear model (GLM) with a log link to estimate medical costs. They use only the 2000 Medical Expenditure Survey (MEPS) data for costs after age 40 in the second stage estimation. In estimating the weight gain curve, they employed the 1979 NLS Survey of Youth, and specifically focused on the older cohort in this survey. Thus, given the timing, it may be questionable whether the weight gain trajectory faced by children today is represented well within their data. The methodology was simple, with only two age points where BMI was observed, and a basic linear regression was used to derive coefficient estimates. The most apparent limitations of this study are the age of the data, the use of only two age points to track BMI trajectories, and only looking at costs after age 40 based on the assumption of constant BMI after age 40. Wang et al. (Reference Wang, Denniston, Lee, Galuska and Lowry2010) provided the first age-related weight gain curve produced from its own data and assumptions.
Fallah-Fini et al. (Reference Fallah-Fini, Adam, Cheskin, Bartsch and Lee2017) produced a cost estimate complete with an age-related weight gain curve covering a simulated cohort from young adulthood to death (Fallah-Fini et al., Reference Fallah-Fini, Adam, Cheskin, Bartsch and Lee2017). Fallah-Fini et al. (Reference Fallah-Fini, Adam, Cheskin, Bartsch and Lee2017) used a Markov model that found lifetime third party costs for a person with obesity occur mostly later in life (Fallah-Fini et al., Reference Fallah-Fini, Adam, Cheskin, Bartsch and Lee2017). Fallah-Fini et al.’s (Reference Fallah-Fini, Adam, Cheskin, Bartsch and Lee2017) age-related weight gain estimate relied on the Coronary Artery Disease Risk in Adults (CARDIA) study to estimate weight gain below age 45, and the Atherosclerosis Risk in Communities (ARIC) study for weight gain above 45. Fallah-Fini et al. (Reference Fallah-Fini, Adam, Cheskin, Bartsch and Lee2017) applied a Markov model to measure both comorbidity state transition and BMI state transition, with 15 states covering each of the most popular BMI classifications and predominant comorbidities of obesity. Costs were derived both from the distribution of costs by type in MEPS and published cost data. Unfortunately, the study only considers the comorbidity with the highest cost even if the subject has multiple comorbidities, which likely substantially understates true costs. Additionally, the study only accounts for four obesity-related outcomes and relies heavily on an assumption of disease independence. Using the most recent data available, they demonstrated that state transitions most significantly burdened third-party payers, perhaps providing a welfare economic justification for regulating externalities.
3. Theoretical exposition
3.1 Estimating the age-related weight gain curve
A dependent variable conditioned only on its own previous state exhibits the “Markov Property,” which makes observations before the one period prior irrelevant to estimation. A Markov process is “memoryless” in the sense that the only data used to create an estimate come from the previous state, which means the process can be represented by the first order difference equation below (Hamilton, Reference Hamilton1994; Lay et al., Reference Lay, Lay and McDonald2016):
where in the case of BMI, $ {\varepsilon}_{t+1} $ is BMI at age t + 1,εt is BMI at age t, and $ {v}_{t+1} $ is a random component, which should have a mean of zero and some finite variance (Hamilton, Reference Hamilton1994). Here, $ {P}_{t+1}^{\varepsilon_t} $ is a coefficient to be estimated representing systematic weight gain between age $ t $ and t + 1 of an individual who is in BMI class εt at age t.
This modeling technique has a history in the study of BMI transition estimation (Tucker et al., Reference Tucker, Palmer, Valentine, Roze and Ray2006; Ma & Frick, Reference Ma and Frick2011; Sonntag et al., Reference Sonntag, Ali, Lehnert, Konnopka, Riedel-Heller and König2015; Fallah-Fini et al., Reference Fallah-Fini, Adam, Cheskin, Bartsch and Lee2017). However, these studies generally do not discuss the Markov model’s functioning at length. We provide a working example to understand both how the Markov model provides an ideal approach to estimating age-related weight gain and how it can be implemented over the course of a lifetime. Before exploring the model’s use over a lifetime, a working example may help to demonstrate the advantages of the Markov modeling approach (Fosler-Lussier, Reference Fosler-Lussier1998). We will rely on five broadly agreed-upon BMI categories for transition: underweight (under 18 kg/m2), normal weight (from 18 to 25 kg/m2), overweight (from 25 to 30 kg/m2), obese (from 30 to 35 kg/m2), and morbidly obese (greater than or equal to 35 kg/m2) (Bhaskaran et al., Reference Bhaskaran, Douglas, Forbes, dos-Santos-Silva, Leon and Smeeth2014). While these categories simply reflect clinical standards, the fact that clinical interventions generally rely on these categories as treatment thresholds and previous research has demonstrated discontinuities in the cost of weight gain by category suggests these categories are informative (Cawley & Meyerhoefer, Reference Cawley and Meyerhoefer2011; Heymsfield et al., Reference Heymsfield, Aronne, Eneli, Kumar, Michalsky, Walker, Wolfe, Woolford and Yanovski2018). However, we also note the shortcomings of BMI and these categories more generally as a measure of adiposity given that they lead to misclassification of people with high lean body mass or high adiposity but lower weights (Prentice & Jebb, Reference Prentice and Jebb2001). The lack of available alternatives in national surveys means that these imperfect proxies are perhaps the best option. Five different iterations of the model (one for each BMI category one chooses) would exist in total. The effect of previous states on the current state without applying the Markov property can be represented by the conditional probability $ P\left({State}_n|{State}_{n-1,\dots, }{State}_1\right) $ .
Even over a relatively short period of time, this method of using all the data available quickly becomes intractable. For instance, if we considered each year of life separately then after only 5 years, we would need 55 or 3125 past histories to compute current BMI using the past data available. Clearly, over a lifetime a more parsimonious method is required. Applying the Markov property, for which one needs only the most recent state to predict the current state, we could treat the 5 years as an interval, and this results in needing information for only 52 or 25 past histories regardless of the length of time measured. This more manageable model is interested only in $ P\left({State}_n|{State}_{n-1}\right) $ .
We now consider whether it is reasonable to group BMI transitions over the course of a few years. One of the most remarkable facts of the current obesity epidemic is its insidious and persistent nature: over a few years, people generally do not gain much weight. But weight builds over long periods of time through a process called “age-related weight gain” (Burke et al., Reference Burke, Bild, Hilner, Folsom, Wagenknecht and Sidney1996; Gokee LaRose et al., Reference Gokee LaRose, Tate, Gorin and Wing2010). Because of the relative intractability and consistency of age-related weight gain, a subject’s weight in the last time period likely correlates extremely powerfully with their current weight (Must & Strauss, Reference Must and Strauss1999). An adult with obesity or overweight rarely goes back down to a lower BMI category, and almost never sustains this weight loss, which makes their history of weight prior to the latest period largely irrelevant (Daviglus et al., Reference Daviglus, Liu, Yan, Pirzada, Manheim, Manning and Garside2004).
It is desirable to have the time interval between observations as short as possible to avoid multiple long term weight transitions taking place between observations. Anywhere from 1 to 3 years should yield a sufficiently short time period to avoid multiple state transitions at once (Burke et al., Reference Burke, Bild, Hilner, Folsom, Wagenknecht and Sidney1996). This recommended range is based upon the assumption that permanent BMI changes only gradually from the previous time period and multiple state transitions in such a short period of time seem unlikely. These state transitions allow the researcher to form a stochastic state transition probability matrix, from which the probability of shifting from one BMI category to another based on current BMI and age can be elucidated over the course of every subject’s life. As a result, using intervals of only a few years, one should be able to capture the vast majority of BMI state transitions, while also easing data requirements considerably.
In contrast to much of the published literature, we recommend taking a nonparametric approach by allowing the data to dictate the formation of the state transition probability matrices for each age group. Alternative approaches must rely on previous estimates or a rigid functional form for a BMI-age curve. Using a rigid functional form for BMI-age can introduce misspecification bias. Using others’ estimates for this curve puts one at the mercy of bias from the imperfect study designs of others and unverifiable assumptions that often make less sense today than when those studies were current (Fernandes, Reference Fernandes2010; Sonntag et al., Reference Sonntag, Ali, Lehnert, Konnopka, Riedel-Heller and König2015; Fallah-Fini et al., Reference Fallah-Fini, Adam, Cheskin, Bartsch and Lee2017). Some have argued the country’s recent secular trend in weight gain, wherein people of every age weigh more than they did previously, could introduce bias when employing a Markov model as estimates may pick up aggregate rather than individual effects (Massachusetts Medical Society et al., Reference Majeed2017). However, because these curves will be constructed from observations of real people over time and, hence, will control for individual-invariant fixed effects, such estimates serve the purpose of accurately modeling real world weight gain trajectories.
3.2 Limitations of the Markov model in estimating age-related weight gain
There are several shortcomings of the Markov model that are important to consider. The use of the model is motivated by the relative lack of datasets combining lifetime cost and BMI information available in the US. If a dataset were to exist that allowed for an estimate of age-related weight gain over the entire lifespan of Americans as well as the associated medical costs, a curve created from the entire history of their weight gain trajectories would be more appropriate. The necessity for the Markov property to hold is the primary limitation of the model, as an individual’s past beyond the last few years could provide important information regarding their propensity to gain or lose weight over time. Additionally, the current iteration of the Markov model employed in the literature relies on data from a different nationally representative sample than the one from which the cost of obesity is estimated, which could result in a misstatement of the relationship between medical cost and weight gain over time.
3.3 Identifiability in estimating obesity’s lifetime costs
One of the main limitations of the current literature on obesity’s lifetime costs is that all of the studies rely on observational data, finding associations rather than causal estimates. Cawley and Meyerhoefer (Reference Cawley and Meyerhoefer2011) implements an instrumental variable approach to isolate exogenous variation in BMI on a cross-sectional estimate that could also be used on longitudinal data (Cawley & Meyerhoefer, Reference Cawley and Meyerhoefer2011). Causal approaches to obesity’s estimation will undoubtedly prove vital given the tendency for poorer people and minorities to utilize healthcare at lower rates. Both of these groups tend to have higher than average risk of obesity. This means that endogeneity in the estimation of obesity’s costs likely causes a severe underestimate of the true value. Therefore, Cawley and Meyerhoefer (Reference Cawley and Meyerhoefer2011) proposes the use of an instrument – the weight of an adult’s oldest biological child – to isolate random variation in weight. While an instrumental variable approach estimates only a Local Average Treatment Effect (LATE) and so can lack generalizability to a larger population, it could prove a vital innovation for future causal estimates of obesity’s lifetime costs. The burgeoning literature on mendelian randomization, where germline genetic variation acts as an instrumental variable, could also provide a valuable new tool through which to produce models that account for confounding and selection bias (Dixon et al., Reference Dixon, Hollingworth, Harrison, Davies and Smith2020; Kurz & Laxy, Reference Kurz and Laxy2020).
3.4 Estimating differential life expectancies
Naturally, a lifetime cost estimate must account for the possibility that people with obesity do not live as long as their normal weight counterparts (Flegal et al., Reference Flegal, Graubard, Williamson and Gail2005; Finkelstein et al., Reference Finkelstein, Brown, Wrage, Allaire and Hoerger2010; Abdelaal et al., Reference Abdelaal, Roux and Docherty2017). Time censoring and a skewed distribution make survival analysis, and therefore quantifying life expectancy, a difficult statistical issue that requires specific techniques (Clark et al., Reference Clark, Bradburn, Love and Altman2003). Unfortunately, the existing literature on the life expectancy penalty resulting from obesity relies on largely older data and has provided equivocal results. In fact, some studies even found an “obesity paradox” among Black males, wherein people with obesity outlive those of normal weight (Tucker et al., Reference Tucker, Palmer, Valentine, Roze and Ray2006). This undoubtedly has to do with some form of endogeneity affecting both BMI status and life expectancy. It would be difficult to fully account for such an issue in any study. However, to provide an authoritative and more recent view of the subject, we propose using the most recent life expectancies available with proportional hazards generated from National Health Interview Survey (NHIS) data and official life tables.
While other common approaches to survival analysis exist (such as Kaplan-Meier Curves), we recommend the cox proportional hazards model both because of its ubiquity in medical research and ability to account for other covariates, in particular smoking and age at entry into the dataset (Clark et al., Reference Clark, Bradburn, Love and Altman2003). Unlike other proportional hazard approaches, this model is estimated nonparametrically at baseline, which reduces the probability of misspecification. The only major assumption, proportional hazards – where hazard rates between groups do not cross conditional on covariates – is verifiable. Even if these assumptions are not met, one can interact hazard and age to allow for time-dependent covariates that may have caused nonproportionality (Bradburn et al., Reference Bradburn, Clark, Love and Altman2003).
Nevertheless, the proportional hazards approach has come under recent scrutiny for its trouble estimating small risks accurately and the improbability of its assumptions holding in sufficiently large samples (Moolgavkar et al., Reference Moolgavkar, Chang, Watson and Lau2018; Stensrud & Hernán, Reference Stensrud and Hernán2020). We argue in this context such concerns remain relatively unimportant because the relative risk of obesity at younger ages is considerable. Specifically, the relative risk of mortality for being a person with obesity compared to normal weight at younger ages tends to exceed 2, which is well beyond the “small risks” discussed by Moolgavkar et al. (Reference Moolgavkar, Chang, Watson and Lau2018)
The official U.S. Lifetables published by the National Center for Health Statistics (NCHS) provide age-specific death probabilities by gender (Arias & Xu, Reference Arias and Xu2015). Unfortunately, these data do not account for specific BMI categories or smoking status, the largest health-behavior based potential confounder in life expectancy. As a result, one can use NHIS data linked calculated hazards with corresponding linked mortality files (LMFs) in the National Death Index. The Cox proportional hazards model can be written as
where h(t) is the hazard of death at time t, $ {\displaystyle \begin{array}{l}{h}_0(.)\end{array}} $ is a base time curve adjusted by the exponential factor with $ {X}^{\prime}\beta $ as a set of covariate controls. This presumes hazards at any particular age are proportional given a specific set of covariates. The output produced by a Cox proportional hazards model, a hazard ratio, illustrates the changed hazard of an outcome occurring from a change in characteristics. For instance, a hazard ratio of three for a person with obesity would suggest that they have triple the chance of dying that year compared to a reference normal weight person with otherwise identical attributes. As a result, applying the Cox proportional hazards model to a lifetable allows for a researcher both to control for potential confounders and to address directly the impact of obesity on life expectancy. Thus, in order to determine BMI’s effects on life expectancy by age, one must apply the hazard ratios of each BMI and smoking category (and any other important confounder) to the unadjusted probability of death at any age given by the lifetables.
Depending on the data source, one could find points at which the proportional hazards assumption would likely be violated (Fontaine et al., Reference Fontaine, Redden, Wang, Westfall and Allison2003; Finkelstein et al., Reference Finkelstein, Trogdon, Brown, Allaire, Dellea and Kamal-Bahl2008). There are two common methods for handling disproportionality – an interaction between the violating variable and time, and stratification by the violating variable (Grambsch & Therneau, Reference Grambsch and Therneau1994). Because stratification does not allow for estimation of a parameter value (and we need such a value to accurately assess life expectancy effects) and creates less efficiency given the artificial constraint on the information available to the researcher, we propose adding interaction terms between age and each violating variable. This adjustment also makes intuitive sense, as BMI’s impact differs markedly based on one’s age and time spent in each state, so age interactions should provide a more precise estimation of its impact on survival probability over time. To confirm the logic of this intuition, we propose conducting likelihood ratio tests for the interaction model and Wald tests for the joint significance of the interaction variables and analyzing Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) criteria to assess goodness of fit. After fitting this model, one simply averages the association of obesity and mortality across smoking statuses, after which estimating a median life expectancy simply requires taking the product of the probability of mortality every year times the relative risk for people with obesity.
3.5 Limitations of the Cox proportional hazards model for life expectancy
There are several important limitations to applying the Cox proportional hazards model. Firstly, the Cox proportional hazards model makes an assumption, proportional hazards, where the hazard ratio between individuals remains constant over time, that is likely to fail in practice. While there are steps to remedy this, the assumption represents an imperfect parameterization of an individual’s survival and can result in statistical bias in the model. New techniques employing nonparametric methods for survival analysis could be explored by future researchers to relax this simplifying assumption. Perhaps the most important problem that plagues any survival model is the persistence of endogeneity, for which no simple solution exists. The wide variety of estimates of survival differences between people with obesity and normal weight people could be driven in part by differences between these individuals not attributable to obesity. Lastly, the impact of obesity on survival could be contingent on the existence of other medical conditions, like pre-existing heart issues, which means the true effect could exhibit significant heterogeneity.
3.6 Estimating obesity-related costs
One of the persistent concerns when modeling healthcare data is the large number of subjects who face no medical expenditures in a given year (Buntin & Zaslavsky, Reference Buntin and Zaslavsky2004). Additional methodological issues include the data’s strict non-negativity and, due to the presence of some patients with exceptionally high medical costs, the highly skewed nature of the data. The linear conditional expectation and normality assumptions typically specified in ordinary least squares regressions are severely violated, and the literature suggests a wide variety of potential solutions to these issues. This includes log models, 2PM, and GLM. We propose the use of a 2PM estimating the probability of any medical expenditures and then the total medical expenditures conditional on having any. We propose this 2PM both because of its reliance on more reasonable assumptions and history of use in the existing literature allowing for comparability (Thorpe et al., Reference Thorpe, Florence, Howard and Joski2004; Wee et al., Reference Wee, Phillips, Legedza, Davis, Soukup, Colditz and Hamel2005; Yang & Hall, Reference Yang and Hall2007; Bell et al., Reference Bell, Zimmerman, Arterburn and Maciejewski2011; Cawley & Meyerhoefer, Reference Cawley and Meyerhoefer2011; Trogdon et al., Reference Trogdon, Finkelstein, Feagan and Cohen2012). The large number of subjects with no medical expenditures in any given year could otherwise severely bias the results without separately accounting for the possibility of no medical expenditure.
For the discrete part of the model, the dependent variable, whether a subject reports any medical expenditure, will have a generalized binomial distribution (Nelson et al., Reference Nelson, Story, Larson, Neumark-Sztainer and Lytle2008). In order to regress a binary variable, one can use a generalized linear model, where a link function transforms the binomial distribution into an approximately normal distribution. The two most popular versions of this model are the Logit and Probit models. There are relatively few practical differences between these models: a logistic error term’s distribution normally has a higher kurtosis, and interpretations of the coefficients vary. There are also slight differences in model fit. Because of their similarity, we propose the use a logit model due to its marginally superior fit and comparative ease of interpretation.
Determining an appropriate functional form of the second part of the model proves more nuanced. The two most commonly used modeling approaches are a GLM with a Gamma Log Link and a Logged Ordinary Least Squares (OLS) regression. The discussions regarding this model choice can often feel a bit murky. A helpful analogy to understand the distinction between a GLM and an OLS model is the difference between a rectangle and square. An OLS model is a GLM that requires a normal (or in this case lognormal) distribution, like how a square is a rectangle that requires equal length sides. In the general case a GLM, like how a rectangle does not require a square’s assumptions, can fit both normal and non-normal distributions. Thus, the GLM requires no retransformation into a normal distribution and has more relaxed assumptions than an OLS model, but provides less statistical efficiency as a result of the fewer assumptions regarding functional form (Manning & Mullahy, Reference Manning and Mullahy2001; Buntin & Zaslavsky, Reference Buntin and Zaslavsky2004).
The gamma distribution allows for the modeling of non-negative data without the need for a smearing retransformation. A smearing retransformation is done because logging variable results in a shift in the distribution that must be accounted for when returning to the original scale. Meanwhile, a logged OLS model brings in the upper tail of the distribution, can account to some extent for the extreme range of healthcare data, and focuses solely on positive values. However, the logged OLS also requires a smearing retransformation, which could cause bias in the presence of heteroskedasticity (Buntin & Zaslavsky, Reference Buntin and Zaslavsky2004). We recommend researchers apply a histogram of expenditure data and run Park tests to determine which distribution is best suited for the data, as this choice has varied in the literature and based on cost source.
The choice of variables to control depends on whether one applies the instrumental variables or typical associative approach but typically includes education, smoking status, marital status, geographical region, insurance status, and age, which is often modeled nonlinearly. All costs should be inflated to the most recent medical care component of the Consumer Price Index (CPI) and discounted to cohere with previous estimates.
3.7 Limitations of the 2PM for healthcare costs
While it remains the most widely used method to account for healthcare costs, the 2PM has several methodological shortcomings. Firstly, as noted by Deb and Trivedi (Reference Deb and Trivedi2002), the 2PM method of dividing between non-users and users of the healthcare system poorly reflects the actual functioning of healthcare usage over time (Deb & Trivedi, Reference Deb and Trivedi2002). Instead, a clearer division could be based on “frequent” and “infrequent” users of healthcare. This suggests that the 2PM’s estimates are contingent partly on the length of time of a given episode of disease the dataset covers. Additionally, while the 2PM will produce consistent estimates due to being fit to the empirical distribution, its reliance on researchers to specify a parametric probability distribution could lead to misspecification (Mullahy, Reference Mullahy1998).
4. Data requirements
4.1 Criteria for datasets in the age-related weight gain curve
Unfortunately, the lack of a recent, robust estimate of age-related weight gain stems primarily from the limitations of available longitudinal studies in America. Ideally, this longitudinal dataset would be nationally representative, recent, and cover subjects from early adolescence until their deaths. Because such a dataset simply does not exist in this country, we created a list of criteria to determine whether a dataset deserves inclusion in an age-related weight gain curve despite its shortcomings. Most importantly, the dataset should cover over 10 years of subjects’ lives, be relatively recent, have sufficient follow-up and low attrition rates, have a short time between observations, objectively measure height and weight, and be nationally representative. There are a variety of potential datasets suited to the task, including the Framingham Heart Study (FHS), CARDIA, the Health and Retirement Study (HRS), the Medicare Current Beneficiary Survey (MCBS), and the ARIC.
4.2 The Framingham Heart Study
Perhaps the most famous of the five major longitudinal population health studies in the USA, the Framingham Heart Study began with a predominately white cohort in 1948 in Framingham, Massachusetts and continues to this day (Splansky et al., Reference Splansky, Corey, Yang, Atwood, Cupples, Benjamin and D’Agostino2007). Despite biennial observations, FHS has several drawbacks, including covering too few minority subjects, representing only one city, and having recent observations only for middle to older aged subjects. Despite these limitations, FHS provides a robust dataset with over 5000 subjects even in the initial cohort well-suited for modeling BMI transitions across all age groups (Oster et al., Reference Oster, Thompson, Edelsberg, Bird and Colditz1999).
4.3 Coronary artery disease risk in young adults
In an attempt to create a more representative sample, CARDIA, which began in 1985 and ended in 2005 with its fifth and final examination, observes the progression of coronary artery disease in four population centers, including Birmingham, Chicago, Minneapolis, and Oakland (Friedman et al., Reference Friedman, Cutter, Donahue, Hughes, Hulley, Jacobs, Liu and Savage1988). The study enrolled over 5000 Black and white men and women from a variety of regional and sociodemographic situations, with 72 % of the group remaining in the study until 2005. As one of the few nationally representative datasets available, the inclusion of CARDIA in an age-related weight gain curve estimation is practically obligatory. CARDIA focuses predominately on the time period after 18 years of age until middle age.
4.4 Health and Retirement Study (HRS)
The HRS is a longitudinal panel study of over 37,000 Americans from 23,000 households aged 50 and older conducted biannually since 1992 that contains a range of information on health insurance, health, employment, genetic data, and Medicare cost files (Sonnega et al., Reference Sonnega, Faul, Ofstedal, Langa, John and Weir2014). It provides objectively measured height and weight, as well as costs for Medicare recipients, which makes it particularly useful for modeling the BMI trajectory and costs beyond age 65.
4.5 Medicare Current Beneficiary Survey (MCBS)
The MCBS has been conducted for over 25 years and contains longitudinal BMI data for Medicare recipients. It can be linked to Medicare Fee for Service Beneficiary Claims Data to provide detailed information on medical costs and the specifics of an individual’s healthcare utilization. Much like the HRS, the MCBS provides an opportunity to apply cost and BMI data from one dataset but also includes only Americans aged 65 and older (Adler, Reference Adler1994).
4.6 Atherosclerosis Risk in Communities Study (ARIC)
Similar to CARDIA, ARIC takes subjects from four population centers: Minneapolis, Minnesota; Hagerstown, Maryland; Forsyth County, North Carolina; and Jackson Mississippi (Chambless et al., Reference Chambless, Heiss, Folsom, Rosamond, Szklo, Sharrett and Clegg1997). One of the largest longitudinal population health datasets in American history, ARIC, which began in 1987 and has conducted five examinations to date, boasts over 15,000 subjects pulled relatively evenly from each of these centers. Because of its size, recency, and national representation, ARIC should be included in the estimation of the curve as well. The dataset focuses primarily on subjects aged 45–64. Unfortunately, ARIC switched to phone interviews in 1998, at which point weight and height became self-reported and no longer fit the criteria for inclusion outlined above. Still, because ARIC provides a unique age range, over a decade of objectively measured BMI, and consists of a more nationally representative sample than other datasets, it warrants inclusion until the 1998 survey.
4.7 Differential life expectancies dataset
An ideal dataset to measure life expectancy differences between people with obesity and their normal weight peers would be nationally representative, contain the covariates outlined in Equation (6), rely on objectively measured height and weight, and would account for time spent with obesity. No dataset in the USA comes close to meeting all these parameters. However, the NCHS has created linked files between the National Death Index and NHIS interview files detailing all the covariates of interest. This dataset is nationally representative and contains the variables necessary for estimation; however, height and weight are self-reported, and time spent with obesity is not factored into its impact on life expectancy.
4.8 Public use NHIS and corresponding LMFs
Commissioned by the U.S. Census Bureau, the NHIS studies a range of health behaviors and characteristics. Recently, the NCHS has made public use LMFs available through the year 2014 that utilize data from the NHIS to link to files from the comprehensive National Death Index (Lochner et al., Reference Lochner, Hummer, Bartee, Wheatcroft and Cox2008; Center for Health Statistics, 2015). As a result, one can use data from the years 1997 to 2014 in an effort to update recent work on life expectancy that relied primarily on data from the 1990s. Although public use data is subject to data perturbation for anomalous causes of death and location censoring, a predominant focus on only vital status renders these limitations bearable. A researcher’s primary concern in using these data is self-reported height and weight, but these limitations remain a stumbling block for any study of obesity’s effect on life expectancy.
4.9 National lifetables
In order to provide a basis from which to create BMI-specific life expectancies, one should use the official 2015 U.S. Lifetables (the most recent available as of this writing) provided by the NCHS and separated by gender and apply hazard ratios associated with different levels of BMI to these estimates to discern the impact of BMI on life expectancy separately by gender (Arias & Xu, Reference Arias and Xu2015).
The complex survey design and clustering of the NHIS dataset necessitates the use of complex survey design commands, which are intuitive to use in Stata (or similar software packages). However, because the sample design changed in 2006, one must also alter the strata and primary sampling units to maintain statistical independence between these differing sampling plans. Additionally, in order to pool nationally representative data, one must divide the weighting variable by the number of years in the pool. The result is a survey of 500,121 respondents, of whom 61,552 died by the year 2015.
5. Obesity cost model dataset
Because of obesity’s chronic and latent nature, the dataset used for expenditure would ideally include time spent with the disease and corresponding costs (Fallah-Fini et al., Reference Fallah-Fini, Adam, Cheskin, Bartsch and Lee2017). It should also differentiate between third-party payer and out-of-pocket costs because obesity’s cost to society is reflected most accurately by how it strains outside payers in the healthcare system instead of an individual’s budget. The dataset should also be nationally representative and have objectively measured weight and height. While the MEPS uses self-reported height and weight and only reports on subjects for 2 years, it represents the closest dataset to these ideals through the separation of costs by payer, national representation, a large number of subjects, and some of the most detailed cost data in the country and so we recommend the use of this dataset for cost estimates over a lifetime. However, there are better options, including the HRS and MCBS, for studies focusing exclusively on older Americans and researchers focused on Medicare expenses that actually track BMI and costs over time.
Beginning in 1996, MEPS is the most detailed analysis of healthcare cost and utilization among noninstitutionalized Americans presently available. Far and away the most commonly used dataset for U.S. medical expenditure studies, MEPS consists of a 2-year panel design, where subjects report on diseases, health care costs, payment methods, and hundreds of other questions In order to account for the self-reporting of height and weight, we suggest eliminating biologically implausible BMIs, which the WHO defines as subjects with z-BMIs in excess of positive or negative four (World Health Organization, 1995).
Researchers should use the most recent data available and inflate all costs to the most recent year of the Medical Component of the CPI, which presently is 2016 dollars (Consumer Price Index, 2019). Because external medical care costs focus on third-party payers, one must next remove out of pocket costs from total expenditures. MEPS data makes use of a stratified multi-stage probability design to ensure subjects receive weights that make them nationally representative (Lichtenberg, Reference Lichtenberg2001). This cluster design, in which like subjects are grouped into strata, violates the independent and identically distributed observations assumption fundamental to the traditional calculation of standard errors and results in biased statistical inference (Dohoo et al., Reference Dohoo, Martin and Stryhn2003).
To account for this cluster design, we can use Stata’s complex survey design tools, which correct for the unorthodox sampling plan. We suggest using singleton Primary Sampling Units caused by data sub setting centered to the overall sample mean to allow for variance estimation (NHIS – Singleton PSU Reference Information, 2019). Additionally, because data is pooled from 2014 and 2016, one can apply a standard correction recommended by both the CDC and in William G. Cochran’s seminal book Sampling Techniques of dividing the weights by the number of years pooled (Cochran, Reference Cochran1977). This approach also makes intuitive sense because each survey represents the entire nation, so any additional year added would cause the weighted observations to double the population of the country without proper adjustment.
6. An empirical example
We provide an example of the powerful effect of accounting for age-related weight gain on a lifetime estimate of the third-party medical care cost of being a person with obesity as an adolescent. Specifically, we estimate the external cost over the life course of being a person with obesity at age 20 relative to being normal weight at age 20 accounting for age-related weight gain and differential life expectancies. This requires specifying an age-related weight gain, cost, and life expectancy model, which are elucidated below. The full cost of early life obesity depends in part on its effect on future BMI trajectory and mortality and the full scope of external costs over a lifetime is most relevant to policymakers. This approach provides a realistic and practical way to produce an actionable obesity cost estimate.
We apply the CARDIA, ARIC, and FHS datasets to estimate age-related weight gain from ages 20 to 75. It is important to note that due to data sparsity issues, we were unable to separate the obese category from morbidly obese category. This is a significant limitation given the far higher costs borne by the morbidly obese in the literature and researchers should explore ways to rectify this shortcoming. We thus estimate the Markov process represented by Equation (1).
In order to estimate the external medical care cost of obesity, we use the MEPS dataset. Because we rely on publicly available MEPS data for cost estimation, which provides neither sibling nor genetic information, the instrumental variable approaches discussed earlier are not possible. As a result, this estimate faces endogeneity concerns and should be considered only the best possible associational estimate. The first part of the 2PM used to estimate costs is given by
where Yi is external medical care costs and $ {X}^{\prime}\beta ={\beta}_0+{\sum}_{j = 1}^3{\beta}_j{BMICategories}_i\hskip1.5pt +\hskip2.5pt {\beta}_2{Education}_i\hskip1.5pt +\hskip1.5pt {\beta}_3{Rural}_i\hskip1.5pt +\hskip1.8pt {\beta}_4{Smoker}_i\hskip2.5pt +\hskip1.8pt {\beta}_5{InsurStat}_i + {\sum}_{j = 6}^8{\beta}_j{MaritalStat}_i\hskip1.5pt +\hskip1.5pt {\beta}_9{Region}_i\hskip1.5pt +\hskip1.5pt {\beta}_{10}{Age}_i\hskip1.5pt +\hskip1.5pt {\sum}_{j = 11}^{13}{\beta}_j BMICat\hskip1.5pt \ast \hskip1.5pt {Age}_i\hskip1.5pt +\hskip1.5pt {\beta}_{14}{Age}_i^2\hskip0pt +\hskip1.0pt {\varepsilon}_i $ . BMI categories consist of underweight, normal weight, and overweight (obesity is the excluded category) and marital status consists of single, married, divorced, and widowed. This is a logit model predicting the probability of any medical care expenditure. The second part of the model is a logged OLS specification with a smearing transformation used to estimate medical care costs given costs are greater than zero in (3) and can be represented as
Both models rely on the same set of covariates described below, which is standard practice for 2PMs.
Lastly, to account for the potential of differential life expectancies by BMI status we use the LMFs provided by the NCHS and the most recent U.S. lifetable. We apply a Cox proportional hazards model run separately by gender and stratified by smoking status,
where $ {Z}^{\prime}\beta $ includes age, BMI category, and smoking status as covariates.
It is important to note that this empirical example demonstrates best practices for creating a lifetime associative estimate of obesity’s cost. If one favored better identifiability, desired to focus on Medicare enrollees exclusively, or only emphasized costs at a particular point in time, there are other techniques and datasets available that would provide a more robust estimate unaffected by the Markov assumption or persistent endogeneity.
As Table 1 demonstrates, the external costs from the 2PM are summed after applying a 3 % discount rate until the point at which the subgroup reaches their median life expectancy. After accounting for age-related weight gain and differential life expectancy, a male with obesity at 20 years of age faces $16,091.25 in excess lifetime external medical care costs compared to if they were normal weight ceteris paribus with 95 % confidence intervals derived from nonparametric bootstrapping ranging from $13,987.09 to $18,195.37. An average female with obesity at age 20 faces excess lifetime external costs of $27,181.24 ($22,357.71–$31,801.53) relative to their normal weight peers. Clearly, a reduction in obesity in early life, as well as the maintenance of normal weight status thereafter, could produce substantial cost savings (Tables 2–4).
Note: 95 % confidence intervals in parentheses.
Abbreviation: BMI, body mass index.
Standard errors in parentheses.
***p < 0.01.
**p < 0.05.
*p < 0.1.
***p < 0.01.
**p < 0.05.
*p < 0.1.
***p < 0.01.
**p < 0.05.
*p < 0.1.
Disease prevention measures, even when cost effective, only rarely result in actual cost savings (Cohen et al., 2008). For instance, the CDC explains a “cost-effective” intervention is generally regarded as any intervention that costs less than $50,000–$100,000 per Quality Adjusted Life Years(QALYs), a measure used to quantify health benefits factoring in life expectancy and subjective life quality, saved. Cost savings, on the other hand, would involve an intervention that actually costs less than the current status quo. Because obesity results in significant costs and disability, interventions targeting it have the opportunity to be either cost-effective or cost-saving.
By assigning a dollar figure to the excess external medical care costs from obesity over a person’s entire life, not just a year snapshot, and accounting for the biological realities of weight gain and premature death from obesity, we can discern when interventions are cost saving, cost-effective, or budget neutral. Potential interventions that prove efficacious and provide cost saving to the US’s already overburdened healthcare system should be given priority, particularly given growing dissatisfaction with the increasing costs of healthcare (Hempstead 2012). However, there are numerous “indirect social costs” of obesity not discussed in this analysis. Several studies have attempted to quantify these, with little agreement on methods or results.
There exist a variety of social costs beyond medical care from obesity, including increased disability, excess mortality, and absenteeism and presenteeism at work causing a decrease in productivity. Studies on these “indirect” costs of obesity generally exhibit even greater heterogeneity than in the medical care cost literature, with relatively few longitudinal or well-identified studies on the subject that produce an enormous range of estimates (Tremmel et al., Reference Tremmel, Gerdtham, Nilsson and Saha2017). These shortcomings exist in part from data limitations; however, there are also numerous methodological differences between the studies, which use anything from a population attributable fraction approach to instrumental variables to microsimulation models (Goettler et al., Reference Goettler, Grosse and Sonntag2017). Additionally, only six studies in the current literature cover over 1 year of time, which means the long-term indirect costs of obesity remain largely unexplored. The range of costs in the literature is so large as to be uninformative, as absenteeism, for example, has costs ranging from $8 to $1,586 annually and even significant divergence in the amount of excess mortality and associated costs (Neovius et al., Reference Neovius, Rehnberg, Rasmussen and Neovius2012; Goettler et al., Reference Goettler, Grosse and Sonntag2017). While these costs are not borne directly by the healthcare system, they are important in understanding the full extent of externalities created by obesity.
7. Conclusion
As more comprehensive and recent datasets become available for the estimation of age-related weight gain and obesity’s lifetime costs, researchers should be able to use this article as a guide on how to create these models and what data limitations to consider. While currently only a few articles on obesity’s costs make any attempt to account for age-related weight gain, we hope that this explication of the first-order Markov model will inspire further work in a field in desperate need of definitive answers.
As pre-specification and transparency become more common in the selection and implementation of econometric models, economists will be better able to replicate and decipher the robustness of estimates found in other work in a process that should increasingly mirror the one that has existed for decades in the physical sciences. As a result, methodological articles such as this one will provide a crucial step in the creation of complete transparency between reviewers, researchers, and stakeholders in the medical field. Because of the record high prevalence of obesity and overweight in the USA, as well as sharp increasing trends in many lower and middle-income countries, understanding weight gain’s evolution and the underpinning of the obesity epidemic remains as important as ever. Estimating obesity’s impact on costs, and the assumptions on which this estimate was derived, is a crucial first step in understanding how it burdens the healthcare system and in the design of efficacious – and cost-saving – solutions.
Financial Support
Robert Schell received NIH funding from the NIA as a T32 training grant: T32-AG000246.
Disclosure
The authors declare no conflict of interest.