Economic evaluations such as cost-utility and cost-effectiveness analyses are used to provide evidence on the value for money of new interventions for relevant decision-makers.Reference Glick, Doshi, Sonnad and Polsky1 One of the largest guideline development bodies in the UK is the National Institute for Health and Care Excellence (NICE). NICE provides national guidance and advice to improve health and social care2 and uses economic evidence of cost-effectiveness as well as effectiveness evidence in the development of guidelines. NICE provide methodological guidelines for organisations considering submitting evidence to NICE, outlining the principles and methods of health technology assessment and appraisal within the context of the NICE appraisal process. These guidelines describe the NICE ‘reference case’ for economic evaluation, which includes a preference for outcomes to be measured in terms of quality-adjusted life-years using the EuroQoL-5 dimension (EQ-5D) measure of health-related quality of life.3
The EQ-5D dimension and Short Form-6 dimension (SF-6D) are generic preference-based patient-reported outcome measures that are used to derive health-related quality of life. Research suggests that the EQ-5D and SF-6D have equivalent psychometric properties when examined in people with depressionReference Peasgood, Brazier and Papaioannou4,Reference Mulhern, Mukuria, Barkham, Knapp, Byford and Brazier5 and are therefore both equally as good for producing utility values for economic evaluation. However, there is some evidence that the EQ-5D may lack responsiveness in certain populations with depression (for example in elderly populationsReference Peasgood, Brazier and Papaioannou4) and no work has been conducted on the psychometric properties of the EQ-5D and SF-6D in pregnant women with depression. We therefore aimed to explore the psychometric properties of the five-level EQ-5D (EQ-5D-5L) and SF-6D measures of health-related quality of life in a representative sample of pregnant women with depression, through the psychometric assessment of validity, responsiveness and acceptability.
Method
Data
Data were taken from the Wellbeing in pregnancy: identification and prevalence of common mental health problems (WENDY) cohort study.Reference Howard, Ryan, Trevillion, Anderson, Bick and Bye6 WENDY was a cohort study of pregnant women identified around the maternity booking appointment (approximately 10 weeks of pregnancy) and followed-up to 3 months postpartum. The purpose of the WENDY study was to determine the prevalence of antenatal common mental disorders and to investigate the effectiveness and cost-effectiveness of the Whooley questions and the Edinburgh Postnatal Depression Scale (EPDS) in identifying antenatal depression.Reference Howard, Ryan, Trevillion, Anderson, Bick and Bye6
Ethical approval for the research was granted by the National Research Ethics Service, London Committee – Camberwell St Giles (ref no 14/LO/0075). Written informed consent was obtained.
Participants
Pregnant women attending an antenatal booking clinic in a South East London maternity service between 10 November 2014 and 30 June 2016, aged 16 years or older were recruited into the WENDY study. Data from all women including those with and without depression were included.
Outcome measures
Outcomes for the purpose of the current study were assessed at baseline (booking appointment) and 3 months post-delivery. The Structured Clinical Interview DSM-IV (SCID)Reference First, Spitzer, Gibbon and Williams7 was used to determine who met criteria for a DSM-IV-TRReference First, Spitzer, Gibbon and Williams7 diagnosis of current depression (mild, moderate or severe major depressive disorder, or mixed anxiety and depressive disorder).
Health-related quality of life was measured using both the EQ-5D-5L and the SF-6D. The EQ-5D-5L is measured on five dimensions (mobility, self-care, usual activities, pain/discomfort and anxiety/depression), each with five levels (no problems, slight problems, moderate problems, severe problems and extreme problems).Reference Herdman, Gudex, Lloyd, Janssen, Kind and Parkin8 This allows participants to be classified into one of 3125 health states. Appropriate utility weights can then be attached to these health states.Reference Devlin, Shah, Feng, Mulhern and van Hout9
The SF-6D is derived from the Short Form-36 item survey of heath.Reference Ware, Kosinski, Dewey and Gandek10 It measures health on eight dimensions: (a) limitations in physical activities because of health problems; (b) limitations in social activities because of physical or emotional problems; (c) limitations in usual role activities because of physical health problems; (d) bodily pain; (e) general mental health (psychological distress and well-being); (f) limitations in usual role activities because of emotional problems; (g) vitality (energy and fatigue); and (h) general health perceptions.Reference Ware, Kosinski, Dewey and Gandek10 This allows participants to be classified into one of 18 000 health states. Appropriate utility weights can then be attached to these health states.Reference Brazier, Roberts and Deverill11
The EPDS is used to measure depression symptoms. It is a ten-item screening tool for perinatal depression.Reference Cox, Holden and Sagovsky12 The 10 items correspond with various clinical depression symptoms, for example guilt, low energy and suicidal ideation. Studies have shown that it is sensitive to changes in severity of depression over time.Reference Cox, Holden and Sagovsky12
Analysis
Analysis included: acceptability; concurrent validity; convergent validity; known-group validity; and responsiveness. Acceptability was assessed descriptively in terms of completion rates at baseline and follow-up at 3 months post-delivery. Concurrent validity refers to the extent an outcome of interest (for example SF-6D utility scores) shows an expected association with other measures of the same target construct (for example EQ-5D utility scores). The association between EQ-5D and SF-6D was examined using the intraclass correlation coefficient.Reference Shrout and Fleiss13 Bland–Altman plotsReference Bland and Altman14 were also used to display the limits of agreement between EQ-5D and SF-6D measurements. The plots in this study were generated using the Stata module provided by Mander.Reference Mander15
Convergent validity refers to the extent an outcome of interest (such as utility scores) shows an expected association with other logical outcomes (such as depression scores) measured at the same time point. Convergent validity was assessed by examining the correlation between baseline EQ-5D-5L or SF-6D scores and baseline scores on the EPDS using Spearman's rank correlation coefficient or Pearson's correlation coefficient as appropriate. A coefficient greater than 0.5 or less than −0.5 is considered strong, values between 0.3 and 0.49 or −0.3 and −0.49 are considered moderate and values between −0.3 and 0.3 are considered weak.Reference Fleiss16
Known-group validity refers to the extent an outcome measure of interest helps distinguish between groups that are theoretically expected to differ. For example, people with depression would be expected to have lower levels of quality of life than people without depression. Using the SCID and EPDS, we grouped participants in the following ways before comparing their utility scores.
(a) SCID: non-depressed versus any depression (mild, moderate or severe major depressive disorder, or mixed anxiety and depressive disorder).
(b) SCID: mild depression versus moderate/severe depression.
(d) EPDS: non-depressed (indicated by a score of 14 or less on the EPDSReference Matthey, Henshaw, Elliott and Barnett17) versus any depression (indicated by a score of 15 or more on the EPDSReference Matthey, Henshaw, Elliott and Barnett17).
(e) EPDS: no/mild depressive symptoms (indicated by the EPDS cut-offs of 0–13 for no/mild depressionReference McCabe-Beane, Segre, Perkhounkova, Stuart and O'Hara18) versus moderate/severe depressive symptoms (indicated by the EPDS cut-offs of 14–30 for moderate/severe depressive symptomsReference McCabe-Beane, Segre, Perkhounkova, Stuart and O'Hara18).
In each case the baseline mean EQ-5D-5L and SF-6D scores were calculated for each group and tested for differences using t-tests (or non-parametric equivalent as appropriate).
Responsiveness refers to the ability of an outcome of interest to distinguish clinically important changes and was explored in a number of ways. Floor (lowest possible) and ceiling (highest possible) scores were examined at baseline and follow-up for the EQ-5D-5L and SF-6D. These affect a measure's ability to detect deterioration or improvements in health states and large numbers at the ceiling or floor would suggest that the measure may not be able to adequately capture an improvement or deterioration in health status, respectively. The magnitude of change in EQ-5D-5L and SF-6D scores were examined and compared using the standardised response mean statistic, which is calculated by dividing the mean change on the measure by the standard deviation of the change, for those who recovered (defined as those who were above the EPDS threshold for probable depression at baseline using the cut-off of 15 or moreReference Matthey, Henshaw, Elliott and Barnett17 but then below the cut-off at follow-up) versus those who did not recover (defined as those remaining above the EPDS threshold for probable depression at baseline and follow-up using the cut-off of 15 or moreReference Matthey, Henshaw, Elliott and Barnett17).
Results
Participants
A total of 545 participants were recruited into the WENDY study. Of these, 27% (147/545) had a diagnosis of mild, moderate or severe major depressive disorder, or mixed anxiety and depressive disorder according to the SCID, and 73% (398/545) did not. Mild depression was the most common (14%, 74/545) followed by moderate depression (11%, 59/545), mixed anxiety and depression (2%, 11/545) and finally severe depression (1%, 3/545).
The baseline sociodemographic and clinical variables for the participants are described in Table 1. The mean age of the sample was 33 years (s.d. = 5.75). Of the 545 participants, 34% (n = 184) were White British, 25% were Black non-British (n = 138), and 18% (n = 100) were White other. In total, 48% (n = 262) were born in the UK. The majority were married or cohabiting (72%, n = 392), had a bachelor's degree or higher (60%, n = 326), and were in paid employment (64%, n = 349). The mean EQ-5D-5L utility score at baseline was 0.87 (s.d. = 0.16). The mean SF-6D utility score at baseline was 0.67 (s.d. = 0.12).
EQ-5D-5L, five-level EuroQoL-5 dimension; SF-6D, Short Form-6 dimension.
Of the 545 included participants, 77% (421/545) had full data on the measures required for the current analyses. Table 2 describes the sociodemographic and clinical variables at baseline for those with and without full data. There were no differences in the follow-up of those with and without depression. Participants without full data appeared to be more likely to be of Black non-British ethnicity, from outside the UK, have qualification lower than a degree and be unemployed. However, in terms of baseline EQ-5D-5L and SF-6D utility, the groups were very similar.
SCID, Structured Clinical Interview DSM-IV; EQ-5D, EuroQoL-5 dimension; SF-6D, Short Form-6 dimension.
Utility scores
At baseline, mean EQ-5D-5L utility was 0.87 (s.d. = 0.16), ranging from −0.10 to 1, and mean SF-6D utility was 0.67 (s.d. = 0.12), ranging from 0.32 to 1. At 3-month follow-up, the mean EQ-5D-5L utility was 0.91 (s.d. = 0.12), ranging from 0.07 to 1 and mean SF-6D utility was 0.76 (s.d. = 0.13), ranging from 0.38 to 1.
Acceptability
The EQ-5D-5L was fully completed by 95.78% of all participants (522/545) and 93.88% (138/147) for those with mild, moderate or severe major depressive disorder, or mixed anxiety and depressive disorder. This compares with 92.48% (504/545) of all participants for the SF-6D and 91.16% (134/147) for those with mild, moderate or severe major depressive disorder, or mixed anxiety and depressive disorder. This suggests interview fatigue or lack of acceptability in only a very small proportion of respondents and little difference for those with and without depression.
Concurrent validity
Figure 1 presents the Bland-Altman plots depicting the agreement between the EQ-5D-5L and SF-6D utility scores at baseline. Under 5% of values lay outside of the 95% agreement limits. The mean difference in utility values was 0.195 with a 95% limit of agreement of −0.054 to –0.445. The figure shows that, in general, the EQ-5D-5L overestimates utility in the higher utility range, and the SF-6D overestimates utility in the lower utility range. For lower utility values, there is a larger discrepancy between EQ-5D-5L and the SF-6D.
Convergent validity
Both the EQ-5D-5L utility score and the SF-6D utility score were significantly and negatively associated with EPDS scores meaning that as EDPS scores decreased (depression symptoms were improving), utility scores increased (indicating improvements in quality of life). The EQ-5D-5L's correlation was strong according to Fleiss'sReference Fleiss16 threshold of 0.5/−0.5 (Spearman's rho = −0.558, P<0.001) as was correlation for the SF-6D's (Spearman's rho = −0.553, P<0.001). Both EQ-5D-5L and SF-6D show a similar magnitude in association with the EPDS.
Known-group validity
Tests of known-group validity are reported in Table 3. There was a significant difference in EQ-5D-5L utility between those who had a SCID diagnosis of depression and those who did not (z = 9.362, P<0.001) with those with depression having a lower utility value. The same was found for the SF-6D utility (z = 10.372, P<0.001). The mean difference for the EQ-5D-5L was 0.16 compared with 0.14 for the SF-6D with an effect size of 1 and 1.17 respectively. Similarly, there was a significant difference in EQ-5D-5L utility (z = 8.995, P<0.001) and SF-6D utility (z = 8.725, P<0.001) for those who had a diagnosis of depression according to the EPDS. The mean difference for the EQ-5D-5L and SF-6D for this was 0.20 and 0.13, respectively, with effect sizes of 1.25 and 1.08, respectively. There was no difference in EQ-5D-5L utility (z = 1.040, P = 0.2982) or SF-6D utility (z = 1.927, P = 0.0552) between those with mild versus moderate/severe depression according to the SCID, but there was a difference between those with mild versus moderate/severe depression according to the EPDS (EQ-5D-5L: z = 6.237, P<0.001; SF-6D: z = 5.996, P<0.001) with mean differences of 0.16 for the EQ-5D-5L and 0.09 for the SF-6D and effect sizes of 1 and 0.75, respectively.
EQ-5D-5L, five-level EuroQoL-5 dimension; SF-6D, Short Form-6 dimension; SCID, Structured Clinical Interview DSM-IV; EPDS, Edinburgh Postnatal Depression Scale.
*P<0.05; **P<0.01; ***P<0.001.
Responsiveness
At baseline and follow-up, there were no participants reporting the lowest possible score on the EQ-5D-5L or SF-6D. At baseline 28.50% of participants (120/421) reported having the highest possible score on the EQ-5D and at follow-up this rose to 42.28% (178/421). This compared with 0.24% (1/421) for the SF-6D at baseline and 1.43% (6/421) at follow-up. Figure 2 shows the distribution of EQ-5D-5L and SF-6D utility at baseline and follow-up. Figure 3 shows the EQ-5D-5L and SF-6D utility plotted between baseline and follow-up. Replicating these figures for the subsample of participants with a SCID diagnosis of depression found the same patterns (supplementary Figs S1 and S2 available at https://doi.org/10.1192/bjo.2019.71).
At baseline there were 72 people who had a diagnosis of depression according to the EPDS cut-off of 15 and above.Reference Matthey, Henshaw, Elliott and Barnett17 At follow-up, 60 of these had recovered (EPDS score fell below 15) and 12 had not recovered (retained an EPDS score of 15 or above). The change in EQ-5D-5L utility for those who had recovered was 0.14 (s.d. = 0.21) compared with 0.24 (s.d. = 0.24) for those who did not. The standardised response mean was 0.67 for those who recovered versus 1.03 in those who did not. The change in SF-6D utility for those who had recovered was 0.14 (s.d. = 0.13) compared with 0.03 (s.d. = 0.14). The standardised response mean was 1.07 for those who recovered versus 0.22 in those who did not.
Discussion
Main findings
This is the first comparison of the relevance of the EQ-5D-5L and SF-6D utility measures of health-related quality of life in a population of pregnant women with depression. With five and six items respectively, acceptability rates were high and comparable for both measures. However, the EQ-5D-5L tended to show higher utility than SF-6D for the same individual. This means that study conclusions may differ depending on the choice of utility measurement used. Further, the EQ-5D-5L showed a substantial ceiling effect that was far less with the SF-6D. Despite this, we found that both measures show associations with depression symptomatology in the expected direction and at similar magnitudes. Both measures also show logical utility differences that we would expect to find between groups with and without depression, and between groups with increasing depression severity. However, such group differences tended to be more apparent with EQ-5D-5L utility. When examined alongside longitudinal changes in depression symptomatology, SF-6D utility showed a much larger increase among those who recovered than among those who did not. This was not the case for the EQ-5D-5L. In fact, EQ-5D-5L utility showed a much larger increase among those who did not recover. However, the difference in EQ-5D-5L utility for those who did and did not recover was not significantly different from each other, and those who did not recover, started with lower utilities and therefore, had more room for an increase in utilities. This was not the case for the SF-6D, which showed a lower increase in utility among those who did not recover compared with those who did, and the baseline utility of those who did and did not recover were similar. Many EQ-5D-5L utility values were at the maximum or near maximum at baseline (almost 30%) meaning that there was little room for utility to improve over time. This was still the case even when only those with depression were examined but was not the case for the SF-6D utility. It is possible that the EQ-5D-5L is failing to pick up the mental health issues to the same extent as the SF-6D. However, it is not clear whether the lower scores on the SF-6D are the result of the mental health issues or the effect of pregnancy.
Various studies have examined the psychometric properties of the EQ-5D and SF-6D in common mental disorders such as depression. However, the majority of these studies have examined each measure in separate populations,Reference Mulhern, Mukuria, Barkham, Knapp, Byford and Brazier5 making direct comparisons of the psychometric properties difficult. Only one study has examined both measures in the same sample,Reference Lamers, Bouwmans, van Straten, Donker and Hakkaart19 only a limited comparison was undertaken (included only examination of known-group validity and responsiveness) and none have been undertaken in a population of pregnant women with depression.
Strengths and limitations
The major strength of this study is that the psychometric properties of the EQ-5D-5L and SF-6D could be examined and compared against each other as they were collected at the same time in the same cohort. This collection of data at the same time allowed not only direct comparisons between the EQ-5D-5L and SF-6D in terms of known-group validity, convergent validity, responsiveness and acceptability, but it also allowed for the examination of concurrent validity, which cannot be done when data on different measures are collected from separate sources.
The main limitation of this study was the collection of data from a single site in inner-city London, meaning results may not be applicable to the rest of the UK. There was also a low response rate, with only 33% of those eligible for study inclusion agreeing to take part. However, the sample was still found to be representative of women in the catchment area including women from very diverse backgrounds and those who did not speak English.Reference Howard, Ryan, Trevillion, Anderson, Bick and Bye6 Further, the non-depressed group included women with other mental disorders. However, this is a realistic sample of women without depression – this will include women with and without other disorders. Finally, the known-group validity by SCID severity result did not attained statistical significance. This could be explained by small subgroup sizes. While sample size is a study limitation, our examination of longitudinal changes in utility values has been anchored on clinically significant changes in depression scores.
Implications
In conclusion, there may be an advantage of using the SF-6D rather than EQ-5D-5L in economic evaluations with pregnant women who are depressed because of the substantial ceiling effect shown by the EQ-5D-5L, but this finding needs to be cross-validated in more studies.
Funding
This paper summarises independent research funded by the National Institute for Health Research (NIHR) under the Programme Grants for Applied Research programme (ESMI Programme: grant reference number RP-PG-1210–12002) and the National Institute for Health Research (NIHR)/Wellcome Trust Kings Clinical Research Facility and the NIHR Biomedical Research Centre and Dementia Unit at South London and Maudsley NHS Foundation Trust and Kings College London. L.M.H. is also supported by a National Institute for Health Research (NIHR) Research Professorship (NIHR-RP-R32–011). The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health. The study team acknowledges the study delivery support given by the South London Clinical Research Network.
Acknowledgements
We gratefully acknowledge the advice received from our Patient and Public Advisory Group (Clare Dolman, Sarah Spring, Ceri Rose, Liberty Mosse, Amanda Grey, Henry Fay, Kathryn Grant, Maria Bavetta, Eleanor O'Sullivan, Jesse Hunt, Diana Rose, chair), our Programme Steering Committee (Professor Rona McCandlish (Chair), Dr Heather O'Mahen, Dr Pauline Slade, Ceri Rose, Sarah Spring and Rosemary Jones) and our Data Monitoring and Ethics Committee (Roch Cantwell (chair), Liz McDonald-Clifford, Marian Knight, Stephen Bremner). We also want to take the opportunity to thank the women who participated in this study. The data are not publicly available due to being of a sensitive nature but are available from the corresponding author on reasonable request.
Supplementary material
Supplementary material is available online at https://doi.org/10.1192/bjo.2019.71.
eLetters
No eLetters have been published for this article.