1. Introduction
In any contemporary health system, the population's demand for health care is likely to exceed the system's capacity to provide it. Policymakers must therefore decide which interventions to fund and which to exclude from coverage (Ubel, Reference Ubel2000; Fleck, Reference Fleck2002; Alexander et al., Reference Alexander, Werner and Ubel2004; Scheunemann and White, Reference Scheunemann and White2011). Given the implications of such decisions for those who contribute to and benefit from national health systems, centralised processes are often put in place to ensure that they are seen to be made fairly (Kenny and Joffres, Reference Kenny and Joffres2008). In the UK, these fall under the remit of the National Institute for Health and Care Excellence (NICE), a public body which has come to be seen as a world-leader in health care priority-setting (Smith, Reference Smith2004; Timmons et al., Reference Timmons, Rawlins and Appleby2016; Schaefer and Schlander, Reference Schaefer and Schlander2018; Littlejohns et al., Reference Littlejohns, Chalkidou, Culyer, Weale, Rid, Kieslich, Coultas, Max, Manthorpe, Rumbold, Charlton, Roberts, Faden, Wilson, Krubiner, Mitchell, Wester, Whitty and Knight2019; Catchpole and Barrett, Reference Catchpole and Barrett2020).
In making recommendations to the National Health Service (NHS), NICE is legally required to ‘have regard to the broad balance between the benefits and costs’ of the technologies that it considers (NICE, 2013). This aligns with the NHS's mandate to provide ‘best value for taxpayers’ money’ by maximising the amount of health that can be delivered from its budget (Department of Health, 2015). A technology's cost-effectiveness – understood as the number of quality-adjusted life-years (QALYs) delivered per unit cost, compared with existing treatment – is therefore NICE's key substantive consideration. However, maximising health is not the only relevant objective: the NHS also promises to employ resources in a way that is ‘fair and sustainable’ and fulfils ‘a wider social duty to promote equality’ (Department of Health, 2015). NICE therefore takes the position that its advice should also ‘take into account other factors’ which might on occasion justify the recommendation of technologies that likely displace more QALYs than they deliver (NICE, 2008b, 2020b).
Given the real-world implications of NICE's advice and the organisation's reputation as an authority on health care priority-setting, the normative grounds for its recommendations have long been the subject of public and academic interest. However, absent from the current literature is any comprehensive summary of what factors have been substantively employed by NICE's independent appraisal committees in practice. This review aims to address this gap by bringing together studies that have empirically examined NICE decision-making from a range of disciplinary perspectives, using a variety of quantitative and qualitative methods. In doing so, it provides a foundation for further in-depth research into the grounds for NICE decision-making and the basis on which NICE's evolving approach can be ethically justified. It is hoped that this contribution will prove particularly timely at present given NICE's recent review of its processes and methods.Footnote 1
2. Methods
2.1 Approach
The study takes the form of a narrative literature review (Grant and Booth, Reference Grant and Booth2009; O'Connor and Sargeant, Reference O'Connor and Sargeant2015; Paré et al., Reference Paré, Trudel, Jaana and Kitsiou2015), a methodology well-suited to the aim of obtaining insight into a specific research question from studies that are broad in scope, use a wide range of different methods and are informed by an array of disciplinary perspectives. In synthesising evidence from these studies, narrative summary (Dixon-Woods et al., Reference Dixon-Woods, Agarwal, Jones, Young and Sutton2005) has been adopted as the most appropriate way of integrating diverse quantitative and qualitative findings into a readily digestible review suitable for a multi-disciplinary audience.
2.2. Data sources and searches
A comprehensive literature search was conducted across five databases: ProQuest, PubMed, Scopus, Web of Knowledge and Lexis Library. These were selected to provide coverage across multiple disciplines including medicine (PubMed), social science and humanities (ProQuest and Scopus), biomedical and natural science (Web of Knowledge) and law (Lexis). Coverage was tested by searching for five articles known in advance to be relevant to the review, deriving from health economics (Devlin and Parkin, Reference Devlin and Parkin2004, Dakin et al., Reference Dakin, Devlin and Odeyemi2006), health policy (Mauskopf et al., Reference Mauskopf, Chirila, Birt, Boye and Bowman2013), sociology (Milewa and Barry, Reference Milewa and Barry2005) and ethics (Charlton and Rid, Reference Charlton and Rid2019). Each of these articles was successfully identified through at least two of the five databases.
The main search (search A) was carried out in October 2019 and used two sets of terms, linked by an AND operator. The first set comprised of NICE's current and past institutional titles, including common misspellings. The second set included various terms used to describe factors considered during NICE decision-making, identified through key articulations of NICE's approach (Rawlins and Culyer, Reference Rawlins and Culyer2004; NICE, 2005, 2008b, 2020b) and experience gained from conducting related reviews (Charlton & Rid, Reference Charlton and Rid2019; Charlton, Reference Charlton2020). Several broad terms were also included to increase search sensitivity (e.g. ‘other factors’, ‘criteria’, ‘equity’). During the initial search, it became evident that a small number of articles use the acronym ‘NICE’ without further definition. A supplementary search (search B) was therefore carried out in November 2019 to identify additional articles whose titles contain only this acronym (see Appendix 1).
Studies were retained if they met the following inclusion criteria:
• an academic article, published in a peer-reviewed journal;
• presents empirical data relating to NICE technology appraisal and
• includes findings that describe what factors appear to substantively influence NICE decision-making, as conducted by its appraisal committees.
No limits were specified regarding date or language.
A parallel search of the grey literature was also conducted which adapted the above strategy for use in Google Scholar. In addition, research articles published by NICE's decision support unit and the UK Office for Health Economics were manually screened for inclusion.
The review was updated in November 2020 according to the same protocol.
2.3. Study selection
Searches A and B together identified 5419 articles, which were compiled in Endnote X9 for desktop. Exact duplicates were automatically removed, with further duplicates removed manually, leaving 2881 potentially eligible articles. These were categorised as either eligible or ineligible based on title, with articles only excluded if the reviewer was confident that they would not satisfy the inclusion criteria. This eliminated a further 2359 articles. The reviewer read the abstract of the remaining 522 articles and categorised them in the same way. Articles not containing an abstract were automatically retained. After this process, 108 potentially eligible articles remained and the full text of each was read to determine final inclusion. This left 20 articles which were deemed to meet the inclusion criteria; however, two of these duplicated data already presented in other articles and were therefore excluded. The references of each of the remaining 18 articles were then hand searched, identifying a further five eligible articles. The review of the grey literature did not yield any further results, giving 23 eligible articles in total.
On updating the search in November 2020, an additional six articles were identified. In total, therefore, 29 articles were included in the review.
2.4. Data extraction and analysis
Following study selection, full texts of the included articles were re-read and key data extracted to an Excel spreadsheet. This recorded: (1) basic bibliographic information (title, author, journal, year); (2) study aim, scope and date range; (3) study methods, (4) a narrative summary of the main findings and (5) the factors observed by the study to have influenced NICE decision-making. Once data extraction was complete, articles identifying particular factors (e.g. cost-effectiveness, uncertainty and innovation) were collated and read for a third time to facilitate narrative synthesis.
3. Results
The search identified 29 eligible studies, details of which are provided in Table 1.
a Where possible, descriptions of the methods used have been taken from the original article.
b List of authors includes an individual who was an employee of NICE or held another formal role with the organisation (e.g. Chair, Vice-Chair, Appraisal Committee Chair) at the time of writing.
c This analysis excluded non-drug appraisals, terminated appraisals and appraisals that have since been updated or withdrawn and are therefore no longer publicly available.
The included studies were published between 2001 and 2020 and cover NICE appraisals conducted between 1999 and 2019. Across these studies, 11 distinct factors were each observed by at least two studies to have substantively influenced NICE decision-making. These are indexed in Appendix 2.
During the period spanned by this review, NICE's formal approach has evolved considerably and differences in how some factors have been treated over time may reflect these changes. To aid interpretation of the results, key changes in NICE's processes and methods are summarised for reference purposes in Table 2. It should also be noted that aspects of NICE's approach have recently undergone further revision as part of a major update of NICE's processes and methods (see previous footnote). The implications of this update for future research are briefly considered in the discussion.
The following sections summarise the evidence relating to each of the 11 identified factors, starting with NICE's primary substantive consideration: cost-effectiveness.
3.1. Influence of cost-effectiveness on NICE decision-making
Unsurprisingly, research has shown cost-effectiveness to exert considerable influence on NICE decision-making. However, studies also illustrate that the role played by economic evaluation varies for different appraisal committee members and that a technology's incremental cost-effectiveness ratio (its ICER, or cost per QALY) is far from determinative of appraisal outcome.
The earliest study of NICE decision-making, conducted by Raftery in 2001, emphasises the importance of clinical- rather than cost-effectiveness. Of the 19 recommendations by then issued by NICE, all cited clinical benefit as a reason for the technology's adoption; in contrast, only half of the completed appraisals (11/22) reported an ICER, with committees often finding it ‘very difficult’ or ‘impossible’ to estimate cost-effectiveness (Raftery, Reference Raftery2001). Raftery concludes from this evidence that economics has a ‘lesser role’ to play in NICE decision-making than evidence of clinical benefit (ibid.). Other work from the same period offers a more nuanced interpretation. Drawing on documentary analysis and interviews relating to seven appraisals conducted during 2002 and 2003, Williams et al. (Reference Williams, Bryan and McIver2007) propose that appraisal committee members draw on cost-effectiveness analysis in two distinct ways: either as a general structure for considering and discussing key issues (the ‘framework approach’), or as factor to be considered only once clinical value has been demonstrated (the ‘ordinal approach’). Under the latter approach, calculation of a technology's ICER may sometimes be unnecessary: in the words of one committee member, ‘If it doesn't get through the clinical effectiveness hurdle then I'm not that interested in the economics’ (ibid.). Other evidence supports the hypothesis that appraisal committees are often able to reach a decision without calculating a technology's ICER, with studies indicating that between 5 and 48% of NICE's decisions were made in this way in the years up to 2011 (Devlin and Parkin, Reference Devlin and Parkin2004; Dakin et al., Reference Dakin, Devlin and Odeyemi2006; Cerri et al., Reference Cerri, Knapp and Fernandez2014; Dakin et al., Reference Dakin, Devlin, Feng, Rice, O'Neill and Parkin2015) (Table 3).
a Multiple technology appraisals comprise multiple decisions (about different technologies and/or indications) within the same appraisal. Hence, the number of distinct decisions analysed in many studies exceeds the number of appraisals included.
b In the original paper (Table 2), this figure represents the total proportion of decisions for which a cost–utility analysis was present. Appraisals were considered to have a cost–utility analysis if they gave an estimate of the cost per QALY gained.
c As in Dakin et al. (Reference Dakin, Devlin and Odeyemi2006), this figure represents the total proportion of decisions for which a cost–utility analysis was present. However, the authors do not specify on what basis a cost–utility analysis was considered to have been performed.
d As per Figure 1 of the original paper: ‘Seventy decisions were “no” as a result of clinical evidence […] Sixty-three decisions were “yes” on clinical grounds (e.g. because all alternative technologies were contraindicated or not tolerated), while 28 decisions were “no” on clinical grounds (e.g. because treatment was “clinically inappropriate” in that patient group)’.
e The original paper states: ‘We estimate that, in practice, the ICER at which the probability switches from more-likely-to accept to more-likely-to-reject is between £39,000 and 44,000: well above the stated £20,000–30,000 range’ (Dakin et al., Reference Dakin, Devlin, Feng, Rice, O'Neill and Parkin2015).
When ICERs are calculated as part of the decision-making process, quantitative research has repeatedly demonstrated a strong correlation with decision outcome. Dakin et al.'s study of appraisals completed by December 2011 – the largest and most recent retrospective analysis of all NICE decisions – estimates that technologies costing £40,000/QALY have a 50% chance of recommendation, compared with 75% at £27,000/QALY and 25% at £52,000/QALY (Dakin et al., Reference Dakin, Devlin, Feng, Rice, O'Neill and Parkin2015). Other studies similarly demonstrate a clear correlation between a technology's estimated cost-effectiveness and its likelihood of recommendation, with the average ICER of recommended technologies found to be substantially lower than that of technologies that are rejected, and the ICERs of technologies in which recommendation is restricted to a subgroup of patients typically coming somewhere between the two (Dakin et al., Reference Dakin, Devlin and Odeyemi2006; Cerri et al., Reference Cerri, Knapp and Fernandez2014; Griffiths et al., Reference Griffiths, Hendrich, Stoddart and Walsh2015; Schaefer and Schlander, Reference Schaefer and Schlander2018).
It therefore seems reasonable to conclude from the available evidence that cost-effectiveness has historically played a major, if not always essential, role in NICE decision-making. However, it does not follow that cost-effectiveness is the principal determinant of all NICE decisions, or that it is the only factor taken into consideration in most cases. Work by Dakin et al. and others implies a cost-effectiveness threshold somewhat higher than the £20,000–30,000/QALY range generally suggested by NICE policy, indicating committees' willingness to recommend seemingly cost-ineffective technologies when this is justified by other considerations (Tappenden et al., Reference Tappenden, Brazier, Ratcliffe and Chilcott2007; Rawlins et al., Reference Rawlins, Barnett and Stevens2010; Dakin et al., Reference Dakin, Devlin, Feng, Rice, O'Neill and Parkin2015; Griffiths et al., Reference Griffiths, Hendrich, Stoddart and Walsh2015) (Table 3). We now turn to these other considerations.
3.2. Other factors shown to substantively influence NICE decision-making
3.2.1. Uncertainty
Studies from the mid-2000s suggest that appraisal committees have long been sensitive to uncertainty about a technology's expected impacts and have historically shown a reluctance to fully recommend technologies that pose a substantial risk to the NHS. More recent work illustrates the continued ubiquity of uncertainty to NICE decision-making but is unable to address emerging questions about how changes to NICE's approach have modified committees' response to risk.
In 2005, Tappenden et al. conducted a binary choice experiment involving 37 past and present appraisal committee members to explore their preferences in deciding which technologies to recommend. This found that members were 69% less likely to recommend a technology when uncertainty about its cost-effectiveness was ‘high’ compared with when it was ‘low’ – an effect that was particularly pronounced when the hypothetical technology's ICER exceeded £25,000/QALY (Tappenden et al., Reference Tappenden, Brazier, Ratcliffe and Chilcott2007).Footnote 2 This finding suggests that committees at the time were minded to follow NICE's advice, first issued in 2004, that when considering technologies at or beyond the cost-effectiveness threshold, special consideration should be given to ‘the degree of uncertainty surrounding the calculation of ICERs’ (NICE, 2004; Rawlins and Culyer, Reference Rawlins and Culyer2004). Further evidence of early committees' sensitivity to risk is provided by Raftery (Reference Raftery2006), who found that around two-thirds of the rejections issued by NICE up to 2005 were on the grounds of insufficient evidence, and by Devlin and Parkin (Reference Devlin and Parkin2004), who observed that even technologies with relatively low ICERs were, on occasion, rejected where uncertainty was high. The role played by clinical vs financial risk in these decisions, and the extent to which the ‘ordinal’ approach might be understood as a way of assessing these different types of risk in turn, are outstanding questions.
Later studies have indicated a greater willingness by committees to recommend technologies about which there is significant uncertainty. An analysis by Clement et al. (Reference Clement, Harris, Li, Yong, Lee and Manns2009) found that nearly half of decisions (46%) made up to December 2008 were subject to ‘considerable’ uncertainty about cost-effectiveness, but that committees nevertheless chose to recommend the technology in 87% of cases. The authors suggest that this apparent tolerance of uncertainty may reflect an approach in which appraisal committees seek to mitigate risk by restricting recommendations to patient subgroups, rather than fully rejecting technologies for which the evidence base is relatively weak (ibid.). Further evidence for this hypothesis is provided by Cerri et al. (Reference Cerri, Knapp and Fernandez2014), who found that technologies recommended for routine use between 2004 and 2009 were supported by substantially more robust clinical evidence – in terms of the number of randomised clinical trials conducted, their size, duration and design, and the size of the observed effect – than technologies that were recommended for restricted use, or those that were rejected. More recently still, qualitative research by Kieslich (Reference Kieslich2020), based on appraisals conducted in 2011 and 2012, has demonstrated NICE's willingness to rely on anecdotal evidence from clinical experts where ‘gold standard’ clinical trial evidence is lacking.
An apparently consistent feature throughout NICE's work has been the continual need to acknowledge and respond to uncertainty. An automated text analysis of appraisal documents published between 2007 and 2016 observed that terms relating to uncertainty arose in association with almost all of the 125 ‘decision factors’ found to feature in committee discussions, demonstrating its pervasiveness across nearly all aspects of NICE decision-making (de Folter et al., Reference de Folter, Trusheim, Jonsson and Garner2018).Footnote 3 More in-depth research by Calnan et al. (Reference Calnan, Hashem and Brown2017) confirms this finding, with the authors observing that committees' difficulties in dealing with different types of uncertainty across three 2012–2014 appraisals rendered straightforward decision-making ‘problematic’. Calnan et al. identify several pragmatic strategies adopted by committees in trying to address such difficulties. These include explicit attempts to measure uncertainty and focus attention on areas about which there can be more confidence, as well as implicit approaches based, for example, on ‘gut feeling’ and the collective bypassing of certain uncertainties in order to reach a decision (employing the ‘fudge factor’). According to one committee member, ‘if there feels like there's a lot of unresolved uncertainty, then we're more conservative in our estimate of what we think the ICER is going to be’, suggesting that considerations about uncertainty and cost-effectiveness interact in ways that are difficult to unravel and may exaggerate the apparent role of cost-effectiveness (ibid.). NICE's increasing use of ‘managed access’ – an arrangement in which a technology's recommendation is made conditional on additional data collection – adds a layer of complexity to this relationship between uncertainty and cost-effectiveness and raises further unanswered questions about appraisal committees' response to different types of risk.
3.2.2. Budget impact
NICE has long been clear that it considers affordability to be a concern primarily for politicians rather than itself (Timmons et al., Reference Timmons, Rawlins and Appleby2016) and, since 2008, it has stated as policy that a technology's potential budget impact ‘does not determine’ whether or not it will be recommended (NICE, 2008a, 2013). Nevertheless, this policy also advises appraisal committees that they should be ‘increasingly certain’ of a technology's ICER as its impact on NHS resources increases, and evidence suggests that committees have tended to follow this advice (NICE, 2008a, 2013, 2017b).
In their retrospective analysis of decisions made up to December 2003, Dakin et al. (Reference Dakin, Devlin and Odeyemi2006) found budget impact to be secondary only to cost-effectiveness and clinical uncertainty in its ability to predict decision outcome, with the total potential cost to the NHS observed to be significantly higher for technologies eventually recommended for restricted use than for those recommended for routine use. A similar effect has also been observed by other studies (Mauskopf et al., Reference Mauskopf, Chirila, Birt, Boye and Bowman2013; Cerri et al., Reference Cerri, Knapp and Fernandez2014), suggesting that appraisal committees may use restricted recommendations as a way of reducing total cost when a technology's potential impact on NHS resources is high. According to an analysis by Mauskopf et al. (Reference Mauskopf, Chirila, Birt, Boye and Bowman2013), after controlling for clinical- and cost-effectiveness, the average potential budget impact for drugs appraised up to April 2011 ranged from £20.3 million for fully recommended drugs, to £49.8 million for drugs that were recommended with restrictions, to £71.1 million for drugs that were wholly rejected.
Although these quantitative studies strongly suggest that budget impact plays a substantive role in NICE decision-making, they are not able to provide any insight into where the normative basis for this role lies. For example, it is unclear whether committees' prudence in recommending technologies with high budget impact primarily reflects a concern for affordability (i.e. net cost to the NHS), or for the risk that such technologies pose to the system (a function of both net cost and uncertainty about their likely effects). Similarly, it is unclear whether committees' willingness to fully recommend technologies with a relatively low budget impact reflects a concern with affordability/risk or is evidence of an allocative preference for small population size in itself (i.e. rarity). The current evidence base is also unable to provide any insight into the impact on NICE decision-making of the ‘budget impact test’ introduced in 2017: a measure intended to identify technologies whose high net cost might necessitate further commercial negotiation and, in some cases, delayed adoption, however cost-effective they may be (Charlton et al., Reference Charlton, Littlejohns, Kieslich, Mitchell, Rumbold, Weale, Wilson and Rid2017; NICE, 2018).
3.2.3. Clinical need
Research has explored the role of several considerations associated with the clinical need addressed by an appraised technology, including disease severity, life expectancy, the availability of alternative treatments, baseline quality-of-life and therapeutic area. However, NICE's advice to its appraisal committees regarding these types of consideration has evolved substantially over time, contributing to a mixed and incomplete picture of this factor's influence on NICE decision-making.
In its 1999 directions from the Government, NICE was advised that its recommendations should have regard to ‘the degree of clinical need of the patients with the condition under consideration’ (NICE, 1999). Early studies highlight several appraisals in which committees demonstrably followed this advice, recommending technologies that addressed significant clinical need despite relatively high ICERs.Footnote 4 Appraisal committee members' willingness to prioritise technologies based on related considerations is also illustrated by the binary choice experiment conducted by Tappenden et al. in 2007, which found that members were more likely to recommend hypothetical technologies when baseline health-related quality-of-life was low and alternative treatment options were unavailable.
Quantitative studies from this period, however, provide little evidence for NICE's systematic prioritisation of technologies based on clinical need. Three large retrospective studies that specifically explored the relationship between the availability of alternative treatments and decision outcome failed to find any significant association between the two (Devlin and Parkin, Reference Devlin and Parkin2004; Dakin et al., Reference Dakin, Devlin and Odeyemi2006; Cerri et al., Reference Cerri, Knapp and Fernandez2014). Dakin et al.'s more up-to-date examination of decisions reached prior to 2012 identified a correlation between outcome and therapeutic area, but this relationship does not straightforwardly map onto clinical need; while indication for cancer was associated with increased odds of recommendation, treatments for musculoskeletal disease received even more favourable treatment (Dakin et al., Reference Dakin, Devlin, Feng, Rice, O'Neill and Parkin2015).
In 2009, NICE's policy that its committees should give general consideration to clinical need was supplemented by the more specific advice that, when appraising potentially life-extending treatments for terminal diseases, committees should consider giving greater weight to QALYs achieved at the end of life (NICE, 2009). In effect, this increased the cost-effectiveness threshold for technologies meeting the ‘end-of-life’ criteria to £50,000/QALY: a figure that was formalised in NICE's methods in 2016 (NICE, 2016; Charlton, Reference Charlton2020). Also in 2016, NICE became responsible for the operation of the new Cancer Drugs Fund (CDF), an instrument that enables patients to access cancer drugs that have failed to meet NICE's cost-effectiveness requirements (NICE, 2016).
Two studies have examined in detail the events that preceded the introduction of the end-of-life criteria, concluding that they came about in large part because of NICE's inability to recommend an emerging cohort of expensive oncology drugs under its standard methods (Chalkidou, Reference Chalkidou2012; Chang, Reference Chang2020). Support for this version of events is provided by Mason and Drummond (Reference Mason and Drummond2009), whose research shows an increase in the rejection rate for cancer drugs in the years leading up to the change, from 11% between 1999 and June 2006, to 26% between June 2006 and October 2008. However, evidence on the actual effect of the end-of-life criteria is limited. In evaluating a subset of appraisals completed between January 2009 and December 2011, Dakin et al. found that technologies assessed under the new criteria were 3.4 times more likely to be recommended than those that were not. However, the overall rate of cancer drug recommendation actually fell during this period compared with appraisals conducted prior to 2009, suggesting that the end-of-life criteria may simply have formalised something that appraisal committees were already considering (Dakin et al., Reference Dakin, Devlin, Feng, Rice, O'Neill and Parkin2015). More recent work does however demonstrate the regularity with which the criteria are now used to facilitate the recommendation of cancer drugs. According to Wood and Hughes, between April 2016 and March 2018, around half of all routine recommendations of cancer drugs (32/70, 46%) and a third of recommendations made through the CDF (14/42, 33%) applied the enhanced threshold permitted by the end-of-life criteria (Wood and Hughes, Reference Wood and Hughes2020).
Evidence suggests that the new CDF has also played a significant role in facilitating the recommendation of cancer drugs. According to NICE's own figures, since the CDF's introduction in April 2016, the rate of approval for cancer drugs has increased from 59 to 74%: a relative increase of 25% (NICE, 2020c). However, it is unclear how appraisal committees exercise judgement in their application of either the end-of-life criteria or in their recommendation of drugs as part of the CDF, or the extent to which these formal instruments represent the totality of committees' concern with clinical need. Given the increasingly dominant position of cancer drugs in NICE's programme of work (NICE, 2020c), such questions represent a potentially significant line of future research.
3.2.4. Innovation
Since 2008, NICE has advised its appraisal committees to take special account of a technology's ‘innovative nature’ in deciding whether it warrants recommendation beyond the usual cost-effectiveness threshold (NICE, 2008a, 2013) and research indicates that consideration of a technology's innovativeness does regularly enter into NICE decision-making. Questions remain, however, about how appraisal committees define innovation and the extent to which their concern for this factor overlaps with – and is potentially derived from – concern for other factors such as uncertainty, clinical need and rarity.
The first study to consider the influence of innovation was Dakin et al.'s retrospective study of appraisals completed by December 2011, which classed technologies as either innovative or non-innovative based on the time since their commercial launch, the drug class to which they belonged and whether or not they were pharmaceuticals (Dakin et al., Reference Dakin, Devlin, Feng, Rice, O'Neill and Parkin2015).Footnote 5 This found ‘innovation’ to be one of the several factors weakly correlated with appraisal outcome, but highlighted the challenge of exploring the influence of factors that are undefined by NICE and difficult to measure empirically (ibid.).
Stronger evidence for the influence of innovation is provided by Charlton and Rid (Reference Charlton and Rid2019), who establish through both qualitative and quantitative analysis of appraisal documentation that considerations about a technology's ‘innovativeness’ played a meaningful role in almost half of the drug appraisals completed between 2000 and mid-2018 (151/320, 47%). In 26/320 instances (8%), this role extended to innovation being explicitly invoked by committees – alongside other factors – to support a technology's recommendation beyond £20,000/QALY (ibid.). This study also identifies significant inconsistencies in how committees define and value innovation and highlights a substantial increase in committees' concern for this factor since 2008 (ibid.). This suggests that committees' consideration of innovation, unlike that of clinical need, has been prompted in large part by NICE's advice to do so, which was issued that year. The influence of innovation has also been highlighted by other recent studies (Kieslich, Reference Kieslich2020; Yuasa et al., Reference Yuasa, Yonemoto, Demiya, Foellscher and Ikeda2021) and by de Folter et al.'s automated text analysis, which found that references to innovation were made in around 80% of all appraisals published between January 2007 and December 2018 (de Folter et al., Reference de Folter, Trusheim, Jonsson and Garner2018). Additional work is needed to further explore appraisal committees' understanding of this concept, its relationship with other normative considerations and the actual substantive role that it plays in decision-making.
3.2.5. Rarity
NICE has historically advised its appraisal committees that they should ‘evaluate drugs to treat rare conditions […] in the same way as any other treatment’ (NICE, 2008b, 2020b). However, since 2013, ‘ultra-orphan’ drugs for very rare diseases have been systematically prioritised through the operation of NICE's highly specialised technologies (HST) programme (NICE, 2017a), raising questions about the role played by rarity in NICE's current approach and the normative basis for NICE's prioritisation of ultra-orphan drugs.
Prior to 2013, two quantitative studies which specifically explored the relationship between a technology's ‘orphan’ status and its likelihood of recommendation failed to find any statistically significant association (Cerri et al., Reference Cerri, Knapp and Fernandez2014; Dakin et al., Reference Dakin, Devlin, Feng, Rice, O'Neill and Parkin2015). However, even prior to the establishment of the HST programme, qualitative evidence indicates that considerations about a condition's rarity have occasionally proved influential. In their study of 10 orphan drugs appraised by NICE between 2006 and 2012, Nicod and Kanavos (Reference Nicod and Kanavos2016) identify five cases in which they consider rarity to have acted as a ‘pivotal factor’ in decision-making. In four of these five cases, concern for rarity was a function of the end-of-life criteria, which at the time required that the technology be indicated for a small patient population (NICE, 2009). However, this and other studies identify several other cases – all cancer drugs – in which rarity appears to have acted as a standalone basis for special treatment.Footnote 6 Indeed, de Folter et al.'s analysis suggests that consideration of a condition's rarity may be a fairly regular aspect of committee discussions, featuring in around 20% of appraisals (de Folter et al., Reference de Folter, Trusheim, Jonsson and Garner2018).
Notably, the HST programme had not, at the time of the review, been the subject of any published empirical research, despite having been in operation for over 8 years.Footnote 7 The substantive role played by rarity in recent NICE decision-making (both in relation to HSTs and other technologies), the normative basis for this role, and rarity's relationship with other factors such as uncertainty, budget impact, clinical need, innovation and age, are therefore matters about which significant questions remain.
3.2.6. Age
NICE's methods do not formally vary based on the age of those who will benefit from a technology's adoption and, given that age is a protected characteristic under the 2010 Equality Act, it is not clear that it would be legal for them to do so (Government, Reference Government2010). However, since 2011 an amendment to NICE's formal methods has allowed technologies that offer very substantial health benefits over a period of at least 30 years to be assessed using a lower than usual discount rate.Footnote 8 This generally has the effect of lowering the ICER, typically of technologies indicated for severely ill young people.
The two studies to have quantitatively explored the relationship between patient age and decision outcome did not find any statistically significant correlation (Tappenden et al., Reference Tappenden, Brazier, Ratcliffe and Chilcott2007; Dakin et al., Reference Dakin, Devlin, Feng, Rice, O'Neill and Parkin2015). However, an article published by three senior NICE members in 2010 identified paediatric indication as one of the six ‘special circumstances’ in which committees might be willing to exceed the usual cost-effectiveness threshold (Rawlins et al., Reference Rawlins, Barnett and Stevens2010). Rawlins et al. highlight two specific casesFootnote 9 in which such circumstances had been recognised, explaining that NICE ‘understands that society would generally favour “the benefit of the doubt” being afforded to sick children’ (ibid.). Quotes from committee members interviewed during other studies indicate a similar tendency, with members stating that they are inclined to ‘give […] more weight’, or be ‘softer at the edges’, when considering the value of paediatric treatments (Bryan et al., Reference Bryan, Williams and McIver2007; Calnan et al., Reference Calnan, Hashem and Brown2017). More recently, de Folter et al.'s analysis found that consideration of ‘children’ entered into committee discussions in around 20% of the appraisals completed between 2007 and 2016 (de Folter et al., Reference de Folter, Trusheim, Jonsson and Garner2018). The application of the special discounting rules introduced in 2011, however, has not been the subject of any empirical research and, given the current legal landscape, appraisal committees may be reluctant to acknowledge any consideration of patient age in their decision-making. As such, further in-depth qualitative research would likely be needed to ascertain what role – if any – considerations of patient age play in NICE technology appraisal.
3.2.7. Cause of disease
As in the case of age, consideration of cause of disease is not formally incorporated into NICE's approach. Specifically, NICE's current principles prohibit it from ‘alter[ing] its normal approach because a condition may have been caused by the person's behaviour’ (NICE, 2020b). Nevertheless, there is evidence to indicate that cause of disease has been considered where fault can be attributed to a third party. Three studies highlight the 2006 case of pemetrexed (Alimta) for the treatment of malignant pleural mesothelioma (MPM), which was recommended beyond the usual cost-effectiveness threshold partly due to the well-established link between MPM and occupational exposure to asbestos (Rawlins et al., Reference Rawlins, Barnett and Stevens2010; Chalkidou, Reference Chalkidou2012; Shah et al., Reference Shah, Cookson, Culyer and Littlejohns2013).Footnote 10 Two studies also cite the 2002 case of imatinib, which was recommended at an ICER of £49,000/QALY for patients in the blast phase of chronic myeloid leukaemia because these patients would have been offered the drug earlier in disease progression were it not for ‘failings in the healthcare system’ (Rawlins and Culyer, Reference Rawlins and Culyer2004; Chalkidou, Reference Chalkidou2012). No further cases are identified in the current literature. However, given the small number of appraisals covered by the type of in-depth qualitative studies capable of identifying such occasional considerations, further research could feasibly identify additional cases.
3.2.8. Wider societal impacts
Although NICE's general approach is to take a relatively narrow ‘health-only’ perspective in assessing the costs and benefits of a technology's adoption (NICE, 2013), wider societal impacts can be taken into account on an exceptional basis and several studies identify instances in which committees have chosen to do so. Charlton and Rid (Reference Charlton and Rid2019), for example, highlight the 2015 case of ledipasvir–sofosbuvir, in which the committee appears to have given weight to the ‘improved earning capacity’ of treated vs untreated hepatitis C patients.Footnote 11 Nicod and Kanavos (Reference Nicod and Kanavos2016) similarly identify patients' ‘ability to contribute to society’ as a key consideration in NICE's 2011 appraisal of mifamurtide, a drug for the treatment of osteosarcoma (bone cancer) in children and young people. The societal value of equality appears to be a particularly common consideration for appraisal committees, with de Folter et al.'s analysis indicating that equality is discussed in all appraisals and other studies highlighting specific cases in which consideration of socioeconomic or other forms of disadvantage have played a role in the decision to recommend a particular technology (Rawlins et al., Reference Rawlins, Barnett and Stevens2010; Yuasa et al., Reference Yuasa, Yonemoto, Demiya, Foellscher and Ikeda2021). Given NICE's ambition to promote health equality (NICE, 2008b, 2020b), further exploration of appraisal committees' understanding of this aim and its relationship with NICE's general approach appears warranted.
3.2.9. Stakeholder influence
Another of the six factors identified by Rawlins et al. (Reference Rawlins, Barnett and Stevens2010) as occasionally justifying a technology's recommendation beyond the usual threshold is ‘stakeholder persuasion’, with patients and their advocates playing ‘an important role in shaping the views of NICE's advisory committees’. Quantitative evidence on the impact of stakeholder input is mixed (Dakin et al., Reference Dakin, Devlin and Odeyemi2006; Cerri et al., Reference Cerri, Knapp and Fernandez2014; Dakin et al., Reference Dakin, Devlin, Feng, Rice, O'Neill and Parkin2015; Yuasa et al., Reference Yuasa, Yonemoto, Demiya, Foellscher and Ikeda2021), but several qualitative studies have demonstrated stakeholders' ability to influence decision-making (Milewa and Barry, Reference Milewa and Barry2005; Milewa, Reference Milewa2006; Chang, Reference Chang2020; Kieslich, Reference Kieslich2020). According to Milewa and Barry (Reference Milewa and Barry2005), this effect is mediated through four main strategies: (i) stakeholders' production of ‘new’ evidence; (ii) their accentuation of evidence that might not otherwise be considered relevant; (iii) alliance building across the stakeholder group and (iv) direct lobbying of NICE and the government. A case for which there is strong evidence of the latter is that of interferon beta, which, according to one committee member interviewed as part of Milewa and Barry's study, was ‘one of the least cost-effective drugs there is’ but was nevertheless recommended due to ‘the pressure that was put on the government’ by patient advocacy groups, ‘back[ed] up’ by treating clinicians (ibid.).Footnote 12 Stakeholder influence has also been shown to be influential in shaping NICE policy; specifically the end-of-life criteria (Chalkidou Reference Chalkidou2012; Chang, Reference Chang2020).
3.2.10. Process factors
Finally, studies noted a range of process factors that appear to be correlated with appraisal outcome. Dakin et al. (Reference Dakin, Devlin and Odeyemi2006) and Cerri et al. (Reference Cerri, Knapp and Fernandez2014) both identify an apparent association between the year of appraisal and its likelihood of recommendation, with later appraisals (up to 2009) more likely to result in either restriction or rejection than those conducted very early in NICE's lifetime. Dakin et al. (Reference Dakin, Devlin and Odeyemi2006) also found that until 2004, pharmaceuticals were less likely to be rejected than other types of intervention (such as medical devices), although this finding was not replicated in a later study (Dakin et al., Reference Dakin, Devlin, Feng, Rice, O'Neill and Parkin2015). Additionally, Cerri et al. (Reference Cerri, Knapp and Fernandez2014) found that an increase in the number of technologies considered within the same appraisal increased the odds of a restriction relative to a recommendation, indicating that committees may attempt to ‘pick a winner’ in such situations rather than fully recommending several similar technologies. More recent evidence on these types of process factors – and the drivers behind these trends – is lacking.
4. Discussion
This review identifies and provides insight into 11 factors that have been observed by multiple studies to play a substantive role in NICE decision-making. NICE's consideration of some of these factors – such as cost-effectiveness, uncertainty and clinical need – is well known and shaped in part by NICE's formal methods. However, the role played by other factors – including innovation, budget impact, rarity, age and cause of disease – is less well established and potentially in tension with the approach that NICE publicly articulates (NICE, 2013, 2020b). Across each of these factors, questions remain about the relationships that exist between them and their role in an evolving NICE policy landscape.
Unsurprisingly, evidence suggests that concern for cost-effectiveness is central to NICE decision-making. However, there are reasons to believe that the available literature may over-estimate the importance of allocative efficiency to NICE's approach while failing to fully recognise the influence of other normative considerations. Several authors have pointed out the emergence in recent years of various ‘decision rules’ or ‘modifiers’ that codify exceptions to NICE's usual decision-making criteria, such that technologies are recommended that likely displace more QALYs from the NHS than they deliver (O'Mahony and Paulden, Reference O'Mahony and Paulden2014; Paulden et al., Reference Paulden, O'Mahony, Culyer and McCabe2014; Paulden, Reference Paulden2017; Charlton, Reference Charlton2020). Examples include the end-of-life criteria, the selective use of differential discount rates and the exceptional treatment of HSTs (Table 2). Much of the available literature, however, is based on appraisals conducted early in NICE's life, before these decision-rules were introduced: of the 29 articles included in the review, 15 (52%) are based entirely on appraisals completed before the introduction of the end-of-life criteria in 2009, with several others also drawing substantially on appraisals completed in the first decade of NICE's work (Table 1). This overrepresentation of early appraisals is made more pronounced by the increasing scale of NICE's core programme, which, as of 1 November 2021, has produced 731 appraisals, 563 of which (77%) have been completed since 2009 (NICE, 2020c). Where studies have included appraisals that span this period, the results are often not interpreted in the context of NICE's changing processes and methods, making it difficult to isolate the effects of such changes.
The review therefore identifies a considerable need to better understand the current (as opposed to the historical) grounds for NICE's decisions. Some outstanding research questions relating to individual factors have been highlighted in the previous section. However, a future programme of research might also focus on the following broad areas of investigation.
First, there is a need to unravel the roles played by different normative considerations within individual appraisals in order to better understand the ethical judgements that drive NICE decision-making. For example, in recommending a highly innovative but uncertain treatment for a rare and debilitating disease, an appraisal committee may cite each of these factors in explaining its decision. But while a committee driven primarily by concern for efficiency may justify its recommendation with reference to the future health benefits likely to be gained from supporting innovation and making research into rare diseases commercially viable, a committee driven by concern for equality might give greater weight to the importance of addressing current unmet clinical need. To date, little research has been conducted with the ability to identify such distinctions, explore their normative basis and evaluate the moral (and perhaps social and political) rationales that appraisal committees draw on in justifying their decisions. A similar knowledge gap also exists at the policy level, with further research needed to explore the normative basis and justification for, for instance, the prioritisation of cancer technologies and ultra-orphan drugs through the CDF and HST programme.
A second area of focus concerns the impact of recent policy changes on NICE decision-making. As previously highlighted, several major amendments to NICE's processes and methods have not yet been the subject of empirical research, making it difficult to establish their effects on the NHS and its users. If, as some have argued, such changes have increased NICE's tolerance for allocative inefficiency, then additional ethically oriented questions arise about the extent to which this evolving approach is morally coherent and consistent with NICE's stated principles (NICE, 2020b). Answering such questions will likely require further in-depth exploration of the normative factors embedded in NICE policy and the discretionary judgements made by appraisal committees across a range of recent cases.
A third area of potential focus relates to NICE's evolving approach to uncertainty and risk. In recent years, NICE has increasingly used ‘managed access’ as a way of mitigating risk to the NHS while providing patients with accelerated access to technologies whose benefits remain uncertain. This suggests that neither the ‘framework approach’ (in which cost-effectiveness analysis provides a general structure for considering and discussing key issues) nor the ‘ordinal approach’ (in which the dual hurdles of clinical- and cost-effectiveness are considered in turn) remain suitable ways of conceptualising the evolving role of economic analysis in NICE appraisal (Williams et al., Reference Williams, Bryan and McIver2007). Rather, it is plausible that such analysis might today be better characterised as contributing to a process of risk assessment, in which the potential impacts of a technology's adoption are identified primarily to determine how they might be managed and (where necessary) mitigated. Further research exploring how appraisal committees think about both clinical and financial risk, and how they balance these risks against cost-effectiveness and other normative factors, would be of significant value in understanding NICE's evolving approach. Also of value would be research that explores the quality of the data collected through managed access agreements and NICE's response when technologies made available through such arrangements are found to be either more or less effective than anticipated: an outcome with challenging ethical and political implications.
A significant obstacle to answering these and other questions is the increasing number of appraisals for which confidential commercial arrangements, often made in the context of managed access, mean that key factors in decision-making – including the technology's ICER – are not publicly reported. Another necessary consideration for those with an interest in NICE's work is the recent update to its processes and methods, implemented in February 2022 (NICE, 2020a). While the former change acts as a potential barrier to the conduct of robust empirical research, the latter highlights the need for such research to continue if the grounds for health care priority-setting in the UK are to remain transparent and well understood.
5. Study limitations
In considering what can be learnt from this study and how it can inform future research, it is necessary to acknowledge its limitations. The review was conducted by a single researcher and did not involve any formal quality assurance due to the difficulty in applying consistent evaluation criteria across widely differing methodologies. For this reason, it has been described as a narrative review, despite the systematic approach generally adopted. This systematic approach extended to the selection of search terms, which were deliberately broad and identified in part from previous work. However, the wide range of terms used to describe the subject of interest (i.e. factors considered during NICE technology appraisal) and their often non-specific nature (‘other factors’, ‘criteria’, ‘judgements’ and so on) make it possible that other relevant articles were missed. This risk was mitigated by conducting supplementary hand-searches of all included articles.
It should also be noted that the findings draw heavily on the results of several large retrospective analyses of NICE decisions, which are designed to demonstrate correlation rather than causation. As such, they are unable to conclusively prove that any given factor has influenced NICE decision-making.
6. Conclusions
In conclusion, this review demonstrates that though NICE decision-making has historically been strongly influenced by concern for cost-effectiveness, this is by no means the only consideration. Many other factors have also been observed to play a substantive role in decision-making, interacting with each other in ways that appear complex and are yet to be fully understood.
The review also highlights an over-representation in the literature of appraisals conducted early on in NICE's life, under methods that have since been superseded, offering a potentially misleading view of the importance of allocative efficiency to NICE's current approach. NICE's recent update to its processes and methods represents the next stage in the evolution of this approach and will likely further reduce the relevance of much of the existing literature. Given the consequence of NICE's advice on the way that resources are allocated within the NHS, and the organisation's status as a global authority on health care priority-setting, further research that provides an empirical basis for scrutiny of NICE's approach, now and in the future, should be considered a priority.
Acknowledgements
My thanks go to Dr Courtney Davis and Dr Annette Rid, for their input to the design and preparation of this article and their support to the wider project of which it forms part. Thanks also to Professor John Abraham, who was kind enough to provide feedback on an earlier version of this article. This research was funded by the Wellcome Trust (Grant number 203351/Z/16/Z). For the purpose of open access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission.
Conflict of interest
The author declares none.
Appendix 1: Search strategy
Search of academic literature
Search A: conducted on 19 October 2019
Title/abstract/key (‘National Institute of Health and Care Excellence’ OR ‘National Institute of Health and Clinical Excellence’ OR ‘National Institute of Clinical Excellence’ OR ‘National Institute for Health and Care Excellence’ OR ‘National Institute for Health and Clinical Excellence’ OR ‘National Institute for Clinical Excellence’)
AND
Title/abstract/key (‘social value*’ OR ‘social norm*’ OR ‘societal value*’ OR ‘societal norm*’ OR ‘moral value*’ OR ‘core value*’ OR ‘other value*’ OR ‘other factor*’ OR ‘value judgement*’ OR ‘value judgment*’ OR criterion OR criteria OR modifier* OR equity OR fair* OR justice OR ‘trade off’ OR trade-off OR tradeoff OR ethic* OR substantive OR normative)
Search B: conducted on 5 November 2019
Title (NICE)
AND
Title/abstract/key (‘social value*’ OR ‘social norm*’ OR ‘societal value*’ OR ‘societal norm*’ OR ‘moral value*’ OR ‘core value*’ OR ‘other value*’ OR ‘other factor*’ OR ‘value judgement*’ OR ‘value judgment*’ OR criterion OR criteria OR modifier* OR equity OR fair* OR justice OR ‘trade off’ OR trade-off OR tradeoff OR ethic* OR substantive OR normative)
Both searches were repeated on 16 November 2020, using the date range: ‘1 October 2019 to present’.
Search of grey literature
Multiple simple searches were conducted via Google Scholar on 19 October 2019 and were repeated on 5 November 2020. In each case, the first 10 pages of results based on relevance (i.e. the 100 results returned) were screened. Patents and citations were excluded and a date range of 1999–present was applied.
Search i: ‘National Institute for Clinical Excellence’, ‘social value’
Search ii: ‘National Institute for Care Excellence’, ‘social value’
Search iii: ‘National Institute for Health and Care Excellence’, ‘social value’
Search iv: NICE, health, ‘social value’
Search v: NICE, health, justice
Search vi: NICE, health, ‘decision factor’
Search vii: NICE, health, ‘other factor’
Appendix 2: Index of observed factors