1. Introduction
The role of evidence in healthcare, and the way evidence is used to inform decisions on technology introduction and adoption, differs from other industries (Barlow, Reference Barlow2020). This is because new technologies are taken-up within a complex environment made of: intertwined policies and regulations at both institutional and organisational level, multiple professional cultures, and different stakeholders who take part in the decision-making process (Herzlinger, Reference Herzlinger2006; Fung et al., Reference Fung, Lim, Mattke, Damberg and Shekelle2008; Roth, Reference Roth2019; Barlow, Reference Barlow2020). The importance of evidence is partly related to how decisions on adoption and diffusion of technology innovations are made within the healthcare ecosystem, and partly to the methods used for assessing innovations, aspect intrinsically linked to the basis of that evidence (Haynes, Reference Haynes1990; Lomas, Reference Lomas2007; Barlow, Reference Barlow2020). Evidence generation, interpretation, and validity are complex and controversial, as each step needs to be judged both pertinent and sufficient from a variety of professional groups operating within the healthcare ecosystem, including representatives of a wide range of organisations and institutions (Davies et al., Reference Davies, Nutley and Smith2000; Webster and Wyatt Reference Webster and Wyatt2020; Mackenzie et al., Reference Mackenzie, O'Donnell, Halliday, Sridharan and Platt2010; Savory and Fortune, Reference Savory and Fortune2013). In this complex system, each decision about the introduction and spread of an innovation needs to engage and persuade all active stakeholders on board. Nevertheless, (i) the techniques for evidence generation may be underdeveloped and positivist scientific methods, such as randomised controlled trial (RCTs), may be not appropriate when assessing complex innovations like medical devices (MDs); (ii) different stakeholders may have different expectations on what constitutes evidence and the evidence basis, as well as contest its interpretation; (iii) there may be no agreed criteria to assess evidence validity (Barlow, Reference Barlow2020). Health Technology Assessment (HTA) has been adopted worldwide as a cross-disciplinary and multidimensional measurement framework for evaluating the performance of a medical technology at different time points of its lifecycle (O'Rourke et al., Reference O'Rourke, Oortwijn and Schuller2020). In a broader term, HTA is the systematic evaluation of the clinical, health economic, societal, legal, and ethical issues related to the introduction, dissemination, and use of a medical technology (Banta, Reference Banta2003). It aims to generate and synthesise multi-disciplinary evidence to inform health policy, resource allocation, and clinical decision-making (Manetti et al., Reference Manetti, Turchetti and Fusco2020a, Reference Manetti, Vainieri, Guidotti, Zuccarino, Ferré, Morelli and Emdin2020b). An international joint task group co-led by the International Network of Agencies for Health Technology Assessment (INAHTA) and Health Technology Assessment International (HTAi) has developed a new and internationally accepted definition of HTA, that is: ‘HTA is a multidisciplinary process that uses explicit methods to determine the value of a health technology at different points in its lifecycle. The purpose is to inform decision-making in order to promote an equitable, efficient, and high-quality health system’(INHTA, 2024). The concept of HTA is intrinsically embedded with the approach of evidence-based to medicine and management, being an integral component of healthcare governance to set guidelines and standards, provide feedback and forwards actions on delivery of care, and improve quality and performance of health services alongside clinical practice (Madden, Reference Madden2012; Barbazza and Tello, Reference Barbazza and Tello2014; Nuti, Reference Nuti2016; Tarricone et al., Reference Tarricone, Torbica and Drummond2017; Barlow, Reference Barlow2020). Historically, the object of HTA has been restricted to pharmaceuticals, rather than MDs, and evidence on clinical efficacy/effectiveness (i.e., can it work/does it work) and/or cost-effectiveness (i.e., is it worth it) has formed a key part of the formal assessment, taking over other relevant evidence types (e.g., human factor) when assessing the impact of a healthcare innovation (Schmitz et al., Reference Schmitz, McCullagh, Adams, Barry and Walsh2016; Barlow, Reference Barlow2020; Manetti et al., Reference Manetti, Turchetti and Fusco2020a, Reference Manetti, Vainieri, Guidotti, Zuccarino, Ferré, Morelli and Emdin2020b). In terms of evidence generation, HTA has been traditionally based on positive scientific methods, such as systematic reviews (SR) and RCTs, which have been preferred due to their lower risk of bias by design compared to real-world studies (Kent et al., Reference Kent, Salcher-Konrad, Boccia, Bouvy, de Waure, Espin, Facey, Nguyen, Rejon and Jonsson2021). However, traditional methods for evidence generation, such as RCTs, raise general concerns about generalisability and external validity (Maggioni et al., Reference Maggioni, Orso, Calabria, Rossi, Cinconze, Baldasseroni and Martini2016; Fuchs et al., Reference Fuchs, Olberg, Panteli, Perleth and Busse2017; Makady et al., Reference Makady, de Boer, Hillege, Klungel and Goettsch2017; Tarricone et al., Reference Tarricone, Torbica and Drummond2017; Torbica et al., Reference Torbica, Tarricone and Drummond2017). Moreover, such methods assume or imply that useful data on an innovation can be gathered according to study design, and this assumption is no longer appropriate when assessing complex healthcare innovations, like MDs, whose key features differ from other medical technologies and demand ‘a more pluralist approach to gather evidence on their impact’ (Tarricone et al., Reference Tarricone, Torbica and Drummond2017; Barlow, Reference Barlow2020). In terms of efficacy/effectiveness, MDs are performance dependent on user skills and training, have a learning curve, may be used to treat different conditions in different clinical settings and present a faster product lifecycle (Ciani et al., Reference Ciani, Wilcher, van Giessen and Taylor2017; Fuchs et al., Reference Fuchs, Olberg, Panteli, Perleth and Busse2017; Tarricone et al., Reference Tarricone, Torbica and Drummond2017). In this sense, MDs represent a ‘dynamic’ innovation, whose attributes are not well-defined and specified, making trial results difficult to compare and quickly outdated (Crispi et al., Reference Crispi, Naci, Barkauskaite, Osipenko and Mossialos2019; Goring et al., Reference Goring, Taylor, Müller, Li, Korol, Levy and Freemantle2019). This aspect has, in turn, a negative incentive on clinical evidence generation that is usually limited at each stage of an MD lifecycle and less stringent in terms of market approval than pharmaceuticals. Finally, MDs may bring together elements of new technology (i.e., physical innovation) and organisational process changes (i.e., service, staff, professional role) and, in this sense, are more complex to assess than traditional innovations (Crispi et al., Reference Crispi, Naci, Barkauskaite, Osipenko and Mossialos2019; Barlow, Reference Barlow2020). In recent years, there has been a growing interest in the use of non-randomised studies, which are becoming the main source of evidence for assessing MDs (Crispi et al., Reference Crispi, Naci, Barkauskaite, Osipenko and Mossialos2019; Kent et al., Reference Kent, Salcher-Konrad, Boccia, Bouvy, de Waure, Espin, Facey, Nguyen, Rejon and Jonsson2021). Real-world data (RWD) are data related to patient health and/or the delivery of routine clinical practice collected by multiple sources, such as registries, observational studies, health surveys, claims and administrative datasets, electronic health records (EHR), social media, mobile and wearable technologies to which MDs are connected (Garrison et al., Reference Garrison, Neumann, Erickson, Marshall and Mullins2007; Berger et al., Reference Berger, Sox, Willke, Brixner, Eichler, Goettsch, Madigan, Makady, Schneeweiss, Tarricone, Wang, Watkins and Daniel Mullins2017; U.S. Food and Drug Administration, 2017; Johnston et al., Reference Johnston, Chitnis, Gagne and Ernst2019). The related concept of real-world evidence (RWE), i.e., evidence obtained from the analysis of RWD, and the increased conduction of studies using RWE/RWD might satisfy the urgent need for data sharing, traceability, and help to understand the risks and benefits derived from medium and long-term use of MDs in routine clinical practice and current applications (Pane et al., Reference Pane, Francisca, Verhamme, Orozco, Viroux, Rebollo and Sturkenboom2019). Uncertainties and limitations concerning evidence on safety, efficacy/effectiveness, and cost-effectiveness, as well as rate of innovation uptake, are intrinsically linked to the special characteristics of MDs. Only limited and fragmented information is available on real-world performance of new or novel MDs, making challenging understanding what happens in real-life at different time points of the post-market phase (i.e., adoption, diffusion/monitoring, and obsolescence). Recent public scandals involving MDs after their successful introduction into routine clinical practice have raised medium-term safety concerns about public health showing the urgent need for evidence generation and monitoring (Pane et al., Reference Pane, Francisca, Verhamme, Orozco, Viroux, Rebollo and Sturkenboom2019). Indeed, postlaunch RWE-generation studies can enhance long-term safety monitoring, planning and investment decisions on MDs (Serrano-Aguilar et al., Reference Serrano-Aguilar, Gutierrez-Ibarluzea, Díaz, Imaz-Iglesia, González-Enríquez, Castro, Espallargues, García-Armesto, Arriola-Bolado, Rivero-Santana, Perestelo-Pérez, González-Pacheco, Álvarez-Pérez, Faraldo-Vallés, Puñal-Riobóo, Ramallo-Fariña, Sánchez-Gómez, Asua-Batarrita, Reviriego-Rodrigo, Moreno-Rodríguez, Juárez-Rojo, Vicente-Saiz, Orejas-Pérez, Knabe-Guerra, Prieto-Yerro and González Del Yerro-Valdés2021). Examples are the MAGEC spinal implant devices market withdrawal as a result of long-term RWE collection (Hothi, Reference Hothi2022) and the Food and Drugs (FDA) approval of additional indications for transcatheter hearth valves by using data collected in a post-market valve registry (Dhruva et al., Reference Dhruva, Ross and Desai2018). Finally, several RWE frameworks have been developed to expand the incorporation of RWE in HTA in the last years. Among others, the French Haute Autorité de santé (HAS) 2021 guidelines for the utilisation of RWE in HTA, the National Institute for Healthcare and Excellence (NICE) 2022 RWE framework as a reference for best practices and the Canadas' Drug and Health Technology Agency (CADTH) – The Institut National d'Excellence en Santé et en Services Sociaux (INESSS) – Canadian regulatory authority (Health Canada) joint 2022 national framework for the collaborative application of RWE in HTA (Claire et al., Reference Claire, Elvidge, Hanif, Goovaerts, Rijnbeek, Jónsson, Facey and Dawoud2024). The aim of this study was thus to provide a detailed overview of published and peer-reviewed practice in post-market assessment of MDs using RWE/RWD. Specifically, we conducted a SR and set the following objectives: (i) to select application papers reporting on RWE/RWD when assessing post-market MDs; (ii) to map the use of RWE/RWD (i.e., evidence type, source, observation time horizon, and aggregation level) throughout MD maturity and type.
2. Materials and methods
2.1 Literature search
A SR was performed using Ovid MEDLINE (1946–2024, February Week 4), EMBASE (1974–2024, February Week 4), and Scopus (2004–2024, February Week 4) databases. We supplemented this search by performing (i) a check on the reference list of the included studies; (ii) a search on Google and Google Scholar (July 2020–February 2024).
Initial searches were carried out in May 2019, updated in July 2020 and in February 2024 to identify the most up-to-date published research. Studies published as of February 2024 were included in the analysis.
A search strategy was developed using both subject headings and free-text terms to capture three main concepts: (i) RWE/RWD; (ii) MD or biomedical technology; (iii) post-market assessment strategy, with a special attention to HTA and health economics analyses. Full details of the search strategy, which was developed in consultation with an expert medical librarian at Oxford University, are provided in the Supplementary file.
2.2 Inclusion and exclusion criteria
For inclusion, studies were required to be full-text publications of peer-reviewed original research published in English and incorporating RWE/RWD into any sort of post-market assessment strategy for an MD working in real-life conditions. Non-English language articles were not included in the analysis due to financial considerations and time constraints related to document translation. Given the exploratory aim of this research, the authors did not apply restrictions in terms of MD types, clinical specialties, comparators, outcomes, or assessment dimensions. Moreover, studies were retained for further analyses irrespective of whether they assessed post-market performance using one-dimensional or multidimensional evaluation strategies, as well as whether they incorporated RWE/RWD alone or in combination with other data sources. As intended in this SR, one-dimensional evaluation studies were defined as studies focusing on a single assessment dimension (e.g., clinical) among those traditionally included into an HTA strategy. Additionally, RWE/RWD were defined as data collected outside the traditional RCT setting (Garrison et al., Reference Garrison, Neumann, Erickson, Marshall and Mullins2007; Goettsch and Makady, Reference Goettsch and Makady2016; Makady et al., Reference Makady, van Veelen, Jonsson, Moseley, D'Anton, de Boer, Hillege, Klungel and Goettsch2018). We subsequently excluded studies designed as RCTs and/or controlled clinical trials, which were categorized as ‘non-RWE’ studies. A pilot screening of the first 604 articles was independently undertaken by two pairs of authors (EG and FV) and (MV and SM) to develop a common assessment strategy. A first-round screening of titles and abstracts was followed by a second-round screening of full-text articles. The two rounds of screening were independently conducted by two reviewers (SM and EG), and possible discrepancies over the eligibility were resolved by consensus or through discussions with the senior reviewer (MV) until consensus was reached.
Data extraction was undertaken using a pre-designed data extraction form developed in Microsoft Excel (Excel 2016 for Windows, Microsoft Corporation, Redmond, WA) and iteratively refined to capture the key features of the retrieved publications. More specifically, data extracted from each article included: country, MD type and maturity (i.e., adoption/monitoring), MD risk class, innovation complexity, clinical specialty, funding, evidence aggregation level (i.e., monocentric/multicentric study), comparator (if any), population to be treated, patient sample size, time horizon, evidence generation (i.e., source), methodology, and evidence type(s) incorporated into the evaluation strategy. To classify MDs retrieved from literature, the authors were consistent with (i) the updated European risk classification (The European Parliament and the Council of the European Union, 2007); (ii) the classification of healthcare innovations according to their complexity into ‘discrete or simple innovations’, which may not require new training or redesign of organisational process to be used straightway, and ‘fuzzy or complex innovations’, which bring together elements of new technology and organisational (or service) model changes (Atun et al., Reference Atun, Kyratsis, Jelic, Rados-Malicbegovic and Gurol-Urganci2007; Barlow, Reference Barlow2020). Moreover, the parameters of benefit (param) for which the post-market assessment exercise was undertaken, as well as strengths, limitations, key findings, and study outcomes were extracted and categorised. More specifically, the item ‘study outcome’ was codified according to the following algorithm: (i) positive, i.e., statement identifying recommendations to use (or continue to use) the target MD (e.g., cost-effectiveness achieved); (ii) neutral, i.e., statement identifying recommendations to use (or not to use) the target MD, as equal benefits are achieved (e.g., equal costs) versus comparator (e.g., usual care); (iii) negative, i.e., statement identifying recommendations to prefer not to use (or stop the use of) the target MD; (iv) unknown, if recommendations could not be clearly identified as positive/neutral/negative; (v) not identified, if no statement regarding recommendations could be found.
2.3 Data analysis
We employed narrative synthesis to illustrate evidence retrieved from literature. Narrative synthesis, which is based on the application of texts and words to describe literature findings into an appropriate textual narrative, is particularly suitable in cases where a high level of heterogeneity from multiple studies prevents the use of meta-analysis to synthesise evidence (Campbell et al., Reference Campbell, Katikireddi, Sowden, McKenzie and Thomson2018). This SR was conducted in line with the Preferred Reporting Items for Systematic reviews and Meta-Analyses Protocols (PRISMA) guidelines in an effort to limit any risk of bias and error (Moher et al., Reference Moher, Liberati, Tetzlaff and Altman2009). Information about internal validity and study quality of the included studies was extracted and assessed using the Quality Appraisal Checklist (Excellence, 2018) developed by NICE to review HTA evidence on innovative MDs. The QAC checklist is constituted by 14 items measured on a 3-point Likert scale. Moreover, a reduced version of the Consolidated Health Economic Evaluation Reporting Standards (CHEERS) guidelines (Husereau et al., Reference Husereau, Drummond, Petrou, Carswell, Moher, Greenberg, Augustovski, Briggs, Mauskopf and Loder2013) was employed to extract and appraise economic and/or health economic evidence. Finally, an overall score (i.e., ‘plus plus’, ‘plus’, ‘minus’) was recorded for each study considering the fulfilment of the checklist criteria.
3. Results
The literature search identified a total of 9775 hits, of which 2146 were duplicates (Figure 1). A further 7025 hits were excluded at title and abstract review stage for specific reasons, as outlined in Figure 1. Overall, 604 articles were assessed for eligibility. A detailed review at the full article review stage further excluded 459 articles, primarily because the articles employed non-RWE sources, assessed non-medical device interventions, and focused on different phases of an MD lifecycle than post-market. Finally, a total of 145 primary research articles were included in the SR.
The temporal trend of the included articles (Figure 2) highlights, in recent years, the increased interest paid by scientific community to measure post-market performance of new or novel MDs using RWE/RWD sources. Indeed, research articles incorporating RWE sources seem to have kept growing since 2013, with a noteworthy increase after COVID-19 pandemic.
3.1 Types of post-market assessment methods
Quantitative methodologies (n = 119/145 [82%]) represented the most common methods used to assess evidence gathered. Only a limited number of quantitative studies reported decision-analytic models (n = 9/145 [6%]) with any form of sensitivity analyses (i.e., PSA vs deterministic). The remaining selected articles employed mixed (n = 13/145 [9%]) or qualitative methods alone (n = 13/145 [9%]).
3.2 Study recommendations
Only nine publications (Webb and Holman, Reference Webb and Holman1990; Tsilimparis et al., Reference Tsilimparis, Dayama and Ricotta2015; Gregori et al., Reference Gregori, Callaway, Hoeppner, Yuan, Rachitskaya, Feuer, Ameri, Arevalo, Augustin, Birch, Dagnelie, Grisanti, Davis, Hahn, Handa, Ho, Huang, Humayun, Iezzi and Zacks2018; Lambers et al., Reference Lambers, Rieger, Kop, D'Alessandro and Yates2019; Bowen, Reference Bowen2020; Tomaiuolo et al., Reference Tomaiuolo, Derrico, Ritrovato, Locatelli, Milella, Restelli, Lago, Giuliani and Banfi2021; Goodney et al., Reference Goodney, Mao, Columbo, Suckow, Schermerhorn, Malas, Brooke, Hoel, Scali, Arya, Spangler, Alabi, Beck, Gladders, Moore, Zheng, Eldrup-Jorgensen and Sedrakyan2022; Diedisheim et al., Reference Diedisheim, Pecquet, Julla, Carlier, Potier, Hartemann, Jacqueminet, Vidal-Trecan, Gautier, Dubois Laforgue, Fagherazzi, Roussel, Larger, Sola-Gazagnes and Riveline2023; Ding et al., Reference Ding, Morris, Messina and Fairman2023) reported no statement concerning study recommendations, which were classified as ‘not identified’ (n = 9/145 [6%]). The majority of the selected application papers reported a clear positive recommendation to use or continue to use the MD in routine clinical practice (n = 93/145 [64%]), whereas 18 articles included a study outcome categorised as ‘unknown’, which could not be clearly identified as positive/neutral/negative recommendation (n = 22/145 [15%]). The remaining studies reported negative outcomes (i.e., not use or stop to use the device) (n = 13/145 [9%]) or neutral recommendations on the use of the MD in clinical practice (n = 8/145 [6%]).
3.3 Study characteristics
The use of RWE/RWD (i.e., evidence type, source, time of observation, and aggregation level) was mapped throughout MD maturity and types in each of the included studies (Table 1).
a Mixed funding.
b Public funding.
Evidence generated by each study was grouped into (i) clinical (124/145 [85%]), which was the most frequently reported; (ii) economic (36/145 [25%]); (iii) social (36/145 [25%]); (iv) organisational (14/145 [10%]); (v) human factor (10/145 [7%]); (vi) ethical (8/145 [6%]). Frequency of reporting performance indicators specifically for each type of evidence (e.g., efficacy/effectiveness for clinical evidence) is shown in Figure 3.
The frequencies of using RWE/RWD sources among the selected studies (Figure 4) showed that observational prospective/retrospective studies were the most frequently reported (n = 79/145 [54%]), followed by claim/administrative databases (n = 34/145 [23%]).
Among the selected publications, the total use of each RWE/RWD source increased over time; for instance, the use of claim/administrative databases tripled from 2010 to 2024, with a significant increase after 2020. Studies were also grouped into those including a time horizon less than or equal to 1 year (n = 49/145 [33%]), between 1 and 5 years (n = 34/145 [24%]), and greater than or equal to 5 years (n = 46/145 [32%]). The remaining studies did not clearly specify the time horizon (n = 16/145 [11%]). In terms of aggregation level, studies mostly reported evidence aggregated at hospital (n = 72/145 [50%]) or national (n = 54/145 [37%]) level. Only few studies were conducted at international (n = 12/145 [8%]), regional level (n = 7/145 [5%]). Among the studies that reported patient samples, samples greater than 300 patients (n = 42/145 [29%]) and samples ranging from 100 to 300 patients (n = 36/145 [25%]) were the most utilised. Publications informed by registries (n = 26/145 [18%]) reported more detailed information of the populations to be treated (e.g., age, gender, comorbidities, habits); however, among these, only few studies (Abizaid et al., Reference Abizaid, Costa, Banning, Bartorelli, Dzavik, Ellis, Gao, Holmes, Jeong, Legrand, Neumann, Nyakern, Orlick, Spaulding, Worthley and Urban2012; Seth et al., Reference Seth, Hiremath, Dani, Kapoor, Jain, Abhaichand, Trivedi, Kaul, Patil, Khemnar and Rangnekar2013; Good et al., Reference Good, Cakulev, Orlov, Hirsh, Simeles, Mohr, Moll and Bloom2016; Tasca et al., Reference Tasca, Lindner, Barandon, Santavy, Antona, Burkert and Gamba2019; Dake et al., Reference Dake, Ansel, Bosiers, Holden, Iida, Jaff, Lottes, O'Leary, Saunders, Schermerhorn, Yokoi and Zeller2020) focused on clinically complex populations and elderly patients.
Studies were grouped according to the MD type into (i) therapeutic, – mainly implantable devices –, (n = 74/145 [51%]); (ii) diagnostic (n = 39/145 [27%]), and (iii) surgical (n = 22/145 [15%]) and monitoring (n = 10/145 [7%]). Of the 22 studies assessing surgical devices, 19 reported general surgical procedures involving the specific MD. Furthermore, a noteworthy increase of articles assessing diagnostic devices was observed after 2020 (n = 18/39 [46%]). The most frequently reported clinical specialty was cardio-vascular (n = 53/145 [36%]), while only a small number of studies were identified for the other specialties (e.g., orthopaedics n = 10/145 [7%]). In terms of MD maturity, the monitoring stage was the most frequently reported (n = 106/145 [73%]). A single MD intervention was assessed by the majority of the selected studies (n = 103/145 [71%]), whereas the remaining studies evaluated two (n = 29/145 [20%])or three MDs (n = 5/145 [3%]). Only half of the studies were comparative analysis (n = 76/145 [52%]) that mostly utilised a non-MD intervention (i.e., clinical procedures). Only 28 studies employed another MD as comparator, five papers no intervention and one publication a pharmaceutical intervention. Table 1 shows a narrative synthesis of the included studies.
3.4 Appraisal of the included studies
Quality assessment conducted according to the Quality Appraisal Checklist (Excellence, 2018) revealed considerable heterogeneity. More than half of the selected studies (n = 97/145[67%]) were rated ‘minus’ (low study quality) because few or no checklist criteria were fulfilled. Some of the checklist criteria were fulfilled by 28% (n = 41/145 [28%]) of the selected publications, which were classified as good quality studies (‘plus’). Only eight of the included studies met all or most of the checklist criteria (Briggs et al., Reference Briggs, Sculpher, Dawson, Fitzpatrick, Murray and Malchau2004; Close et al., Reference Close, Robertson, Rushton, Shirley, Vale, Ramsay and Pickard2013; Hympánová et al., Reference Hympánová, Rynkevic, Román, Mori da Cunha, Mazza, Zündel, Urbánková, Gallego, Vange, Callewaert, Chapple, MacNeil and Deprest2020; Tsai et al., Reference Tsai, Lin, Chang, Chang and Lee2020; Bizzi et al., Reference Bizzi, Pascuzzo, Blevins, Moscatelli, Grisoli, Lodi, Doniselli, Castelli, Cohen, Stamm, Schonberger, Appleby and Gambetti2021; Campbell et al., Reference Campbell, Lasocki, Oon, Bressel, Goroncy, Dwyer, Wiltshire, Seymour, Mason, Tange, Xu and Wheeler2021; Bougma et al., Reference Bougma, Mei, Palmieri, Onyango, Liu, Mesarina, Akelo, Mwando, Zhou, Meng and Jefferds2022; Goodney et al., Reference Goodney, Mao, Columbo, Suckow, Schermerhorn, Malas, Brooke, Hoel, Scali, Arya, Spangler, Alabi, Beck, Gladders, Moore, Zheng, Eldrup-Jorgensen and Sedrakyan2022) and were classified as excellent quality studies (‘plus plus’). A synthesis of the quality assessment is shown in the Supplementary file.
4. Discussion
In this review, we described the incorporation of RWE/RWD into post-market assessment of MDs, and we identified limits, opportunities, and implications of current practices for RWE/RWD generation to guide future research.
Multisource evidence based on non-randomised evidence is increasingly being utilised to inform decisions on the introduction and use of healthcare innovations (Hatswell et al., Reference Hatswell, Freemantle and Baio2017; Kent et al., Reference Kent, Salcher-Konrad, Boccia, Bouvy, de Waure, Espin, Facey, Nguyen, Rejon and Jonsson2021), COVID-19 pandemic seems to have further pushed this phaenomenon (Schad and Thronicke, Reference Schad and Thronicke2022). The review confirmed the increasing reporting of RWE/RWD as the main source of evidence for MDs while highlighting differences in non-randomised evidence generation across time. Claim and/or administrative databases were mostly utilised in the setting of observational studies, associated with multidimensional post-market assessment strategies, and their use tripled between 2010 and 2024, whereas registries were mostly reported by mono-dimensional clinical studies and their use was limited while keeping growing since 2013. The review also revealed that all publications were ‘one shot’ and ‘ad hoc’ studies, as no study was part of a continuous nor periodical post-market monitoring strategy. Moreover, the key limitations identified across all the retrieved publications included: (i) adoption of a narrow approach to the post-market assessment with a focus on a limited number of evidence types, i.e., two dimensions at maximum (n = 122/145 [84%]); (ii) stress on clinical and/or economic evidence gathered in a short/medium time horizon (between 1 and 5 years); (iii) little attention to other relevant evidence dimensions for an MD working in real-life conditions, such as contextual influence and organisational impact; (iv) very limited incorporation of patient perspectives and preference; (v) focus on MDs with a relatively low innovation complexity. Even though in recent years there has been an increasing understanding of the need to seek a broader approach by considering additional parameters of benefit to the traditional ones (i.e., clinical and/or economic), only few publications assessed organisational requirements and/or human factors, which were reduced to usability and/or acceptability excluding a considerable contribution to the assessment in terms of human efficacy and effectiveness (UK Department of Health, 2010; Kelley et al., Reference Kelley, Egan, Stockley and Johnson2018; Anderson et al., Reference Anderson, Naci, Morrison, Osipenko and Mossialos2019; Manetti et al., Reference Manetti, Turchetti and Fusco2020a, Reference Manetti, Vainieri, Guidotti, Zuccarino, Ferré, Morelli and Emdin2020b). Moreover, the majority of the retrieved studies investigated short (up to 1 year) and/or medium-term (between 1 and 5 years) impact, which may be insufficient to observe longer events related with the MD usage alongside current applications and clinical pathways. Recent public scandals of MDs after their successful introduction into clinical routine practice raised concerns about public health and hopes are addressed to the new European Directive of MDs that should lead to the inclusion of a continuous and systematic life cycle assessment of the devices to overcome limitations of ‘one-shot’ and short-term studies (U.S. Department of Health and Human Services Food and Drug Administration, 2013; Fraser et al., Reference Fraser, Byrne, Kautzner, Butchart, Szymanski, Leggeri, de Boer, Caiani, Van de Werf, Vardas and Badimon2020). This review showed that there has been an overemphasis on researching and assessing well-defined, clearly bounded innovations (i.e., relative ‘discrete’ or simple MDs) being adopted by a single organisational unit (i.e., single hospital or team) rather than complex innovations, which bring together technology and organisational or service changes. This SR further revealed substantial heterogeneity in terms of study quality. Firstly, all the 63 publications classified as observational studies usually did not present clear statements of the type(s) of RWE source employed to conduct the study. This may lead to confusion between two separate concepts: data source (e.g., registry) and study design (e.g., observational study), as previously highlighted by Makady and colleagues (Makady et al., Reference Makady, de Boer, Hillege, Klungel and Goettsch2017). Second, of the 102 retrieved publications including the health economic dimension, only 54 studies specified the decision analytic model used and/or conducted sensitivity analysis. Third, 20 publications did not report the patient sample size. Overall, we documented a general lack of conformity with good practices and little attention to manage decision-making uncertainty. It should be stressed a general lack of inclusion of patient characteristics, preferences, and other relevant user perspectives. Only 22% of the retrieved publications included Patient Reported Outcome Measures (PROMs) in the form of quality-of-life and/or pain assessment data, mainly assessed using standardised generic questionnaires, such as EuroQoL five dimensions (EQ-5D). Few publications (9%) included Patient Reported Experience Measures (PREMs), mostly evaluated through open interviews. However, the use of such source of evidence increased after the COVID-19. To our knowledge, this review is one of the first attempt to systematise key features, empirical uses, and quality of RWE/RWD studies across time in response to the increasing attention paid by scientific community when assessing post-market MDs. A strength of this study is that it is consistent with the PRISMA guidelines and followed its checklist to pilot reporting extracted features from the included studies. This SR also revealed that all publications were ‘one shot’ studies and there was huge heterogeneity in terms of evidence generation, MD type and clinical application, as well as study quality. Furthermore, this study provided the chance to gain a snapshot of the effect of the COVID-19 pandemic over the use of RWE and RWD to assess post-market MDs. As a result, the pandemic has provided a boost to the adoption of RWE for MDs assessment, as the number of publications has significantly increased after 2020. This probably occurred as a response to the limited time and resources available for conducting RCTs ‘ex-novo’ in pandemic moments. An uprising in diagnostic technologies focused studies was observed, jointly with hospital level papers. Hospital level studies were probably preferred as national and international exchanges have mostly been interrupted during COVID-19 pandemic. Two articles dedicated to COVID-19 specific technologies were included in this SR (Tomaiuolo et al., Reference Tomaiuolo, Derrico, Ritrovato, Locatelli, Milella, Restelli, Lago, Giuliani and Banfi2021; Mafi et al., Reference Mafi, Rogez, Darreye, Alain and Hantz2023). Further studies should be carried out to gain a full picture of the effect of the pandemic on the topic.
Potential limitations include the English language, which affected the geographical distribution of the results, as most of the included studies come from English speaking countries (e.g., UK and Canada). Non English-speaking countries experiences should be further explored, since different post-marketing HTA RWE adoption could emerge. For example, Asian countries recognise the value of and already use RWD/RWE in HTA, given the lack of relevant pre-marketing data from clinical trials for the region. These countries rely on RWE to adjust prices and reassess funded technologies or make reimbursement decisions based on RWE several years after market entry (Lou et al., Reference Lou, KC, Toh, Dabak, Adler, Ahn, Bayani, Chan, Choiphel, Chua, Genuino, Guerrero, Kearney, Lin, Liu, Nakamura, Pearce, Prinja, Pwu, Shafie, Sui, Suwantika, Teerawattananon, Tunis, Wu, Zalcberg, Zalcberg, Zhao, Isaranuwatchai and Wee2020). The grey literature encompassing non-peer-reviewed publications, such HTA reports, was also excluded, which may limit the comprehensiveness of the use of RWE in practice. Therefore, for some innovations regional and/or national bodies act as ‘gatekeepers’ to the health system by gathering evidence and produce HTA reports written in local languages (e.g., French, German etc.). In addition to this, the authors faced with substantial publication biases. The review confirmed that almost all post-market studies funded by private bodies reported a clear positive outcome of the study. Health economic analyses are generally not reported by HTA bodies and come from private funders (Turchetti et al., Reference Turchetti, Pierotti, Palla, Manetti, Freschi, Ferrari and Cuschieri2017). The authors expect that such analyses are only published when the outcome is positive (i.e., publication bias). The previous limitations prevent the authors to take a definitive picture of the current practices in post-market assessment of MDs and make comparisons across regions. After the pandemic, the interest and the publications related to Machine Learning Models and Artificial Intelligence as components of existent or new MDs increased but findings related to post-market assessment of these type of MDs are yet at the beginning.
5. Conclusions
The use of non-randomised evidence is growing steadily when assessing post-market MDs (Crispi et al., Reference Crispi, Naci, Barkauskaite, Osipenko and Mossialos2019; Kent et al., Reference Kent, Salcher-Konrad, Boccia, Bouvy, de Waure, Espin, Facey, Nguyen, Rejon and Jonsson2021). Indeed, RWE/RWD are particularly relevant for MDs because of their peculiarities, such as user-dependency. In fact, uncertainties and limitations concerning evidence on safety, efficacy/effectiveness, and cost-effectiveness, as well as the rate of innovation uptake, are intrinsically linked to the unique challenges of MDs compared to traditional health technologies, such as drugs and pharmaceuticals. To the best of our knowledge, despite the large use of non-randomised evidence when assessing MDs, empirical studies and reviews focusing on a specific device and/or a target clinical area have already been published, however a comprehensive picture on the current practice and the implications of using RWE to inform policy decisions is currently lacking. In this sense, our study is the first in its kind to provide a holistic picture of how non-randomised evidence has been used when assessing MDs working in real-life conditions. This review seeks to provide an empirical-based foundation for the use of RWE/RWD in adopting, monitoring, and assessing post-market performance of new or novel MDs alongside clinical routine practice. Our findings led the authors to draw some policy implications addressed to governments and healthcare organisations.
Firstly, the review highlighted the need for a shift from ‘ad hoc’ and ‘one-shot’ studies to monitoring systems that allow the continuous performance assessment of post-market MDs. Indeed, the variability in the quality of care, access, equity, and the financial aspects related to the use of MDs across countries, regions or hospitals and health organisations can be observed and reduced if the monitoring system is continuous and systematic using a benchmarking approach. This can lead to a continuous RWE/RWD generation alongside clinical routine practice, prevent public safety scandals, as well as ensure a fairer allocation of health resources. Hence, we recommend including MD performance indicators with a population-based perspective into wider performance evaluation systems at healthcare pathway level (see, for instance, the Italian experience of measuring the performance path (Gunn et al., Reference Gunn, Bertelsen, Regeer and Schuitmaker-Warnaar2021)). Indeed, HTA and performance evaluation systems can be considered as naturally linked as they are both data-driven systems adopting indicators to report on several healthcare organisations' dimensions performances and they inform policy-makers' decisions (Nuti et al., Reference Nuti, Noto, Vola and Vainieri2018). Specifically, Figure 3 shows that commonality exists between performance evaluation systems and HTA dimensions. However, further studies should be conducted to explore the relationship between HTA and performance management in depth. Secondly, the review highlighted that few of the included studies deals with a medium-long time horizon (i.e., greater than 5 years). It should be stressed that the adoption of a short time window may be insufficient to observe longer events related with the MD usage alongside current applications, especially for implantable devices, whose side effects on safety and effectiveness are little known during adoption. For specific types of MDs (e.g., TAVI), healthcare organisations activated devices' registries and traceability systems, however no evidence in terms of iterative or periodical assessment has been found in the literature retrieved.
Thirdly, health managers and policy-makers might finance more multidimensional assessment studies, focus on more innovative MDs that require significant organisational changes into current frameworks, as well as promote more publicly funded RWE/RWD studies. Among the dimensions of interest, environmental and assessment is gaining particular attention, since healthcare systems account for a substantial proportion of global carbon emissions and contribute to wider environmental degradation (Pinho-Gomes et al., Reference Pinho-Gomes, Yoo, Allen, Maiden, Shah and Toolan2022). This specifically referred to MDs, given their well-known environmental impact (Sousa et al., Reference Sousa, Veiga, Maurício, Lopes, Santos and Neto2021). Encouraging public research on post-market assessment/monitoring is desirable not only to increase knowledge into MDs' routinary use and applications but also to generate independent evidence that ensure more transparency of the results obtained. Indeed, studies funded by public bodies can contribute to generating evidence for MDs' ‘non-use’, which is currently lacking. Furthermore, a research agenda has been identified for research scholars aiming to increase efficacy and quality of evidence generation in post-market phases with a population-based approach. Future research is needed to close the gaps highlighted by this review. In particular, scholars are asked to (i) close the evidence gap between RCT and real-world by continue to conduct real-life assessment studies; (ii) shift their research efforts on more complex or fuzzy boundaries innovations involving multiple changes to healthcare practices and targeted at service and/or professional role redesign, this also considering the arrival of particularly innovative technologies based on Artificial Intelligence and on Machine Learning Models; (iii) incorporate the personal value in future RWE/RWD studies, i.e., the value determined by the fit between the study outcome and the individual user including patient value; (iv) incorporate the personal value in future RWE/RWD studies, i.e., the value determined by the fit between the study outcome and the individual user including patient value (Gray and El Turabi, Reference Gray and El Turabi2012; Gray et al., Reference Gray, Pitini, Kelley and Bacon2017; Li et al., Reference Li, Basu, Bennette, Veenstra and Garrison2019; Gunn et al., Reference Gunn, Bertelsen, Regeer and Schuitmaker-Warnaar2021). (v) generate more multidimensional evidence on both MDs' use and ‘non-use’ and also consider novel relevant MDs' dimensions (e.g., environmental and sustainability dimension); (vi) consider also to provide evidence on the last stage of MDs maturity, such as obsolescence and replacement, which are under-investigated by scientific literature which can be really relevant in terms of disinvestment; (vii) examine the relationship between RWE post-market HTA and other systems or theories that present common and innovative elements to learn from and that can support further MDs' HTA development.
Although it might be an issue covered by grey literature and reports, it could be relevant to have an overview of the MDs that are disinvested, replaced, or re-adopted/re-allocated in other clinical settings.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S1744133124000148.
Acknowledgements
The authors would like to thank the Management and Healthcare Laboratory and Prof. Sabina Nuti for their constant supervision and support.
Ethical standards
All the methods were in accordance with the declaration of Helsinki/ relevant national/institutional guidelines.
Consent for publication
Not required.
Availability of data and materials
The datasets analysed are available from the corresponding author upon reasonable request.
Financial support
This study was funded by the Italian Ministry of Health through the 2018 National Grant ‘INTEGRATE-HEALTH-GOV’ (NET-2018-12368077).
Competing interests
None.
Authors' contributions
Stefania Manetti (SM) and Milena Vainieri (MV) conceptualised and designed the work with. SM and Elisa Guidotti (EG) performed data selection and screening, with the support of Federico Vola (FV) and MV. EG performed all the analyses. EG and SM drafted the manuscript. All the authors contributed to the interpretation of results. Federico Vola (FV) and MV critically revised the whole work. All the authors gave the final approval of the version to be published.