Introduction
Healthcare-associated infections (HAIs) remain a major threat to patient safety and quality of care worldwide (WHO, 2022). 1 Conventional IPC measures, such as active surveillance, outbreak investigations, and transmission-based precautions, are essential but often reactive and resource-intensive. 1
Artificial intelligence (AI) has emerged as a promising tool to enhance IPC by linking diverse data sources, including laboratory results, resistance profiles, patient movement, and compliance behaviors. Reference Ellahham2,Reference Maddox, Rumsfeld and Payne3,Reference Hanna and Medford4 Applications include predictive analytics, infection detection, hand hygiene monitoring, and compliance auditing. While evidence demonstrates technical feasibility, translation into everyday practice remains limited due to fragmented data, poor integration, and organizational or ethical concerns. Reference Topol5,6
This review synthesizes evidence on real-world AI applications for IPC, identifies barriers and risks, and develops an evidence-informed readiness checklist to support an easier and safer adoption. The checklist aligns empirical findings with international governance frameworks, such as the WHO ethics guidance on AI in health 6 and the EU Artificial Intelligence Act. 7
Methods
This review followed the Joanna Briggs Institute (JBI) methodology for scoping reviews Reference Aromataris, Lockwood, Porritt, Pilla and Jordan8 and is reported in line with PRISMA-ScR guidelines. Reference Peters, Godfrey, McInerney, Munn, Tricco and Khalil9 We addressed two questions: (1) What are the documented applications, barriers, and real-world outcomes of AI in IPC? and (2) How can these findings inform a structured checklist for feasibility, risk, and organizational readiness?
Eligible studies were primary research (prospective cohorts, retrospective analyses, pilots, multicenter trials) describing AI applications in healthcare IPC with practical outcomes (eg, effectiveness, feasibility, workflow integration, risks). We excluded theoretical models, algorithm-only studies without real-world application, and non-peer-reviewed material such as editorials or conference abstracts. Digital health tools or Internet of Things (IoT) solutions were included only if they incorporated an AI component relevant to IPC functions. Searches covered PubMed, Scopus, and Web of Science (period Jan 2014–Dec 2024; last search Feb 2025) using a Population–Concept–Context framework. Reference Aromataris, Lockwood, Porritt, Pilla and Jordan8 No language restrictions were applied. Search terms are detailed in Supplementary Appendix A.
Records were screened in Rayyan Reference Ouzzani, Hammady, Fedorowicz and Elmagarmid10 tool by two reviewers (SG, ET) with adjudication by a third (GS). The PRISMA flow diagram (Figure 1) summarizes the selection. Data were extracted into a piloted template and charted in Excel. Variables included study characteristics, AI type, IPC application, implementation features, outcomes, barriers, and risks. Two reviewers (SG, ET) cross-validated entries.

Figure 1. PRISMA 2020 flow diagram for the scoping review.
Data synthesis combined descriptive and thematic analysis. Descriptive synthesis grouped studies by AI type and IPC function; thematic synthesis identified implementation barriers, risks, and facilitators. Rooted on this analysis, we developed a 41-item readiness checklist, mapped to six domains: governance and policy, data quality and interoperability, technical and infrastructure, human and workflow, economic and resource, and risk and compliance. The full data set, including structured classifications by AI method, IPC focus, implementation features, and risks, is available in Supplementary Appendix B. Each study in the following tables retains the same numeric identifier as in Appendix B, allowing readers to cross-reference study characteristics with the full citation.
Results
Out of 2,143 identified records, 382 duplicates were removed. Of the remaining 1,761 titles and abstracts screened, 272 full-text articles were reviewed. A total of 100 studies met all inclusion criteria and were included in the final study.
Of the 100 studies included in this review, only one (1%) was published before 2017. In contrast, 31% were published between 2017 and 2021, while more than two-thirds (68%) appeared between 2022 and 2024, indicating an acceleration in research activity in recent years.
Geographically, the evidence was mostly generated in high-income settings. The United States accounts for 35% of studies, primarily focused on predictive analytics. China contributed 17%, often exploring deep-learning approaches. European countries represented the 19%, with emphasis on workflow integration and hospital-acquired infection (HAI) surveillance. The Asia-Pacific region outside China added another 14% and Latin America contributed 5%. A small number of studies (6%) originated from sub-Saharan Africa and the Middle East, reflecting emerging research capacity but limited scale.
Using the four-tier scheme (Experimental, Interventional/Implementation, Retrospective, and Validation) applied to the 100 eligible papers in Table 1, almost half of the evidence base remains anchored in retrospective work (49%), which relied on existing health or laboratory data to build or test models.
Table 1. Design standardized classification distribution

Experimental designs accounted for 30%, focusing on technical feasibility in controlled settings. Far fewer studies examined real-world use: 17% reported interventional or implementation pilots, while only 4% carried out independent validation before wider deployment.
Types of AI technology
A full breakdown of AI techniques and corresponding references is available in Table 2 and detailed in Supplementary Appendix B.
Table 2. AI technology categories distribution

Machine learning (ML) was the dominant technique, used in 75% of the 100 studies. This technique often relies on supervised algorithms such as logistic regression, random forests, or gradient boosting to support tasks like infection risk prediction and HAIs surveillance based on EHRs or sensor data. For more details, see next section on IPC applications blocks.
Deep learning approaches appeared in 21 studies (21%) and included for example the use of convolutional neural networks (CNNs) technologies for surface contamination detection Reference Abubeker60 but also on video-based hand hygiene monitoring, Reference Kim, Choi, Jo, Kim, Kim and Kim63 surgical site infections (SSI), Reference Asif, Xu and Zhao96 and early detection of multidrug-resistant infections. Reference Gouareb, Bornet, Proios, Pereira and Teodoro52
Generative AI was used in just four studies (4%). These projects piloted large language models (LLMs) for IPC education, Reference Jawanpuria, Behera, Dash and Rahman11 policy summarisation Reference Chester and Mandler31 through expert consensus, HAI’s surveillance Reference Wiemken and Carrico41 and Hand hygiene. Reference Simioli, Annunziata and Coppola49
IPC applications supported by AI distribution
Among the 100 primary studies, nine distinct IPC application areas were identified (Table 3 and Figure 2).

Figure 2. IPC applications distribution.
Table 3. IPC applications supported by AI distribution

Predictive-analytics systems accounted for 53% of the studies. These papers typically trained supervised models such as logistic regression, gradient boosting, or recurrent networks on electronic-health-record data to forecast patient-level risks such as, for example, central-line associated bacteremia, Reference Montella, Ferraro, Sperlì, Triassi, Santini and Improta56,Reference Beeler, Dbeibo and Kelley39 Multi Drug Resistance (MDR) Reference Kamruzzaman, Heavey and Song35,Reference Gouareb, Bornet, Proios, Pereira and Teodoro52 or SSI. Reference Gutierrez-Naranjo12,Reference Bartz-Kurycki, Green and Anderson38,Reference Chen, Lu, You, Zhou, Xu and Chen74,Reference Fletcher, Schneider and Hedt-Gauthier107 In some cases, ward-level early-warning scores were generated by combining laboratory, vital-sign, and admission data streams. Reference Montella, Ferraro, Sperlì, Triassi, Santini and Improta56
Hand hygiene compliance monitoring was reported in 13 studies (13%), with applications varying through video analytics using CNNs, badge-based proximity sensors, Reference Wang, Xia, Wu, Ni, Long, Li, Zhao, Chen, Wang, Xu, Huang and Lin17,Reference Asif, Xu and Zhao96 alcohol-dispenser sensors linked to compliance dashboards, Reference Xu, Yang and Chen109 and optical microscopy with ML classification for hand contamination. Reference Claudinon, Steltenkamp and Fink40
Thirteen studies (13%) focused on HAI detection, applying classification algorithms, microbiology, imaging, or pharmacy records to flag active infections such as SSI, Reference Wu, Tsai, Ho, Lai, Tai and Lin36,Reference Kiser, Shi and Bucher64,Reference Bucher, Shi and Ferraro77,Reference Zhu, Simon and Wick76,Reference Colborn, Zhuang and Dyas79 central-line–associated bloodstream infection Reference Roimi, Neuberger and Shrot94 or urinary tract infections (UTIs). Reference van der Werff, Thiman and Tanushi86 Many of these tools operated continuously in clinical workflows. Reference Abubeker60,Reference Verberk and van der Werff98
Eight studies (8%) targeted HAI surveillance by aggregating time-series data for incidence-trend analysis. Reference Rennert-May, Leal, MacDonald, Cannon, Smith, Exner, Larios, Bush and Chew25,Reference Lukasewicz Ferreira, Franco Meneses and Vaz54,Reference Flores-Balado, Méndez and González108 Automated case-finding rules were often calibrated against conventional manual surveillance Reference Cho, Kim and Chung45 but also using Natural Language Processing (NLP) Reference Shi, Liu and Pruitt29,Reference Tvardika, Kergourlay, Bittar, Segond, Darmoni and Metzger34,Reference dos Santos, Silva and Menezes72,Reference Flores-Balado, Méndez and González108 and Generative AI. Reference Wiemken and Carrico41 Outbreak-detection models were described in four studies (4%). Approaches ranged from supervised clustering of ward alerts Reference Atkinson, Ellenberger and Piezzi101 to whole-genome-sequencing pipelines that feed resistance-related single-nucleotide-polymorphism data into transmission-mapping algorithms. Reference Sundermann, Chen and Miller42,Reference Sundermann, Chen and Miller43 One study integrated environmental IoT sensors to identify spatiotemporal hotspots. Reference Bhatia, Sood and Kaur15
Less-common themes included education and training platforms (3 %), where generative- or retrieval-augmented language models generated scenario-based learning modules Reference Jawanpuria, Behera, Dash and Rahman11 or PPE monitoring through computer vision founded on Human-AI Collaboration; Reference Kim, Park and Sippel44,Reference Segal, Bradley and Williams55 environmental monitoring (3 %), employing image recognition or air-quality sensors for surface-cleanliness verification; Reference Hattori, Sekido, Leong, Tsutsui, Arima, Tanaka, Yokota, Washio, Kawai and Okochi27,Reference Hu, Zhong, Li, Tan and He69,Reference Lee, Shim and Lim106 and AMR prediction (1%), also using Plasmonic Nanosensors. Reference Yu, Fu and He110 Finally, decision-support dashboards (2 %) evaluating ChatGPT’s reliability in agreeing with expert statements Reference Chester and Mandler31 or risk stratification for UTIs Reference Jakobsen59 were described in two studies.
Barrier profile
A total of 4 barrier categories were coded across 100 primary studies, clustering into nine composite categories (Table 4 and Figure 3).

Figure 3. Barrier categories distribution.
Table 4. Barrier categories

Data-related barriers were the most common, reported in 45 studies (45%), with issues such as incomplete records, non-standard terminologies, and delayed feeds. Reference Cho, Lee, Kim, Yoo, Choi, Lee and Choi22,Reference Chester and Mandler31,Reference Bucher, Shi and Ferraro77,Reference Chen, Joisa and Stem78 Barriers combining technical and economic elements (16%) were also frequent. Examples included high setup and maintenance costs for AI surveillance tools, Reference Kamruzzaman, Heavey and Song35 computational demands and complexity of real-time systems, Reference Bartz-Kurycki, Green and Anderson38 and substantial infrastructure requirements for whole-genome sequencing pipelines. Reference Sundermann, Chen and Miller42,Reference Sundermann, Chen and Miller43 A further 16% of studies reported barriers blending integration challenges with data shortcomings. Purely technical problems, including outages, high computational requirements, and sensor failures, appeared in 9%. Human factors coupled with data gaps, such as low trust or alert fatigue, were noted in 5%. Less frequent were mixed technical-human (3%), economic-data (3%), stand-alone economic (2%), and economic-human (1%) barriers.
Risk taxonomy
A total of 4 risk categories were coded across 100 primary studies, yielding eight composite categories (Table 5 and Figure 4). Risk reporting was highly granular, and many papers logged more than one category.

Figure 4. Risk categories distribution.
Table 5. Risk categories

Operational issues were the most frequently documented risk type, recorded in 35 of 100 studies (35%). These papers described, for example, over-reliance on predictions without clinical confirmation Reference Gutierrez-Naranjo12,Reference Liang, Zhao, Xu, Zhou and Huang16 or potential misinterpretation if model outputs are not validated. Reference Rabhi, Jakubowicz and Metzger18
Technical failures followed closely, appearing in 33 studies (33%). Examples included server Risk of model overfitting Reference Sun, Zuccarelli and Zerhouni71,Reference Marra, Alzunitan and Abosi73 or risk of under-detection in systems with incomplete digital documentation. Reference Bucher, Shi and Ferraro77,Reference van der Werff, Thiman and Tanushi86
Data-security concerns were reported in 12 studies (12%), typically focusing on patient or staff privacy regulations. Reference Bhatia, Sood and Kaur15,Reference Zhang, White, Schmidt and Dennis37,Reference Xu, Yang and Chen109
Combined technical-and-operational risks featured in nine studies (9 %), while paired operational-and-data-security risks were noted in four studies (4 %).
Pure human-factor risks, such as over-reliance on AI outputs Reference Yu, Fu and He110 or diminished vigilance during downtimes, Reference Balczewski and Keidan102 were documented in four studies (4 %). Two papers (2 %) combined technical and data-security risks, Reference Shi, Liu and Pruitt29,Reference Shrimali and Teuscher32 and one study (1 %) reported a mixed human-and-data-security category. Reference Simioli, Annunziata and Coppola49
Digital-integration status
Of the 100 studies reviewed, only 15 papers (15 %) described AI tools already integrated into another digital layer of hospital or public health information systems. The main integration pathways were:
-
• Automated surveillance shells, documented in 4 studies (4 %), which connected AI analytics to existing IPC dashboards. Reference Bucher, Shi and Ferraro77,Reference Chen, Joisa and Stem78,Reference van der Werff, Thiman and Tanushi86,Reference Verberk and van der Werff98
-
• IoT data streams, featured in 2 studies (2 %), linking badge or environmental-sensor feeds to analytic engines.15,60
-
• Whole-genome sequencing pipelines, reported in 2 studies (2 %), integrating AI outbreak detection into genomic workflows. Reference Sundermann, Chen and Miller42,Reference Sundermann, Chen and Miller43
-
• EHR-embedded models, described in 2 studies (2 %), integrating real-time AI tools directly into electronic health record systems to enable dynamic risk prediction and decision support within clinical workflows. Reference Cotia, Scorsato and Prado58,Reference Colborn, Zhuang and Dyas79
Furthermore, single studies (each 1 %) detailed integration via mHealth apps, Reference Ke13 computer-vision systems, Reference Kim, Park and Sippel44 computational-fluid-dynamics models, Reference Lee, Shim and Lim106 wearable devices Reference Xu, Yang and Chen109 and plasmonic nanosensors. Reference Yu, Fu and He110
The remaining 85 studies (85 %) reported prototypes without wider digital connectivity.
Checklist refinement and practical use
To help IPC teams assess readiness for adopting AI tools, we translated the findings of this review into a structured, evidence-informed checklist (Table 6). The checklist contains 41 items grouped into six domains reflecting the most common barrier clusters identified across the literature: Governance and Policy, Data Quality and Interoperability, Technical and Infrastructure, Human and Workflow, Risk and Compliance and Economic and Resource.
Table 6. Structured readiness checklist for AI implementation in IPC

Each item was derived inductively from recurrent challenges and risks documented in the 100 included studies, ensuring that the checklist reflects real-world implementation experiences rather than theoretical considerations. To support practical use, two additional ratings were applied: maturity and priority.
By combining these two layers, IPC teams can gauge both how close their systems are to readiness and where to focus resources first. The maturity-priority framework thus transforms the checklist into a practical roadmap for planning, implementation, and risk mitigation.
The development of these scales followed a four-step, evidence-informed process:
-
1. Thematic synthesis: reviewers inductively coded implementation barriers and risks from the included studies and organized them into the six checklist domains.
-
2. Item drafting: recurring, actionable challenges within each domain were translated into checklist items.
-
3. Maturity anchors: the 4-level maturity scale was defined to reflect observable progression from non-existent capacity to embedded practice. This structure aligns with established digital health maturity concepts such as the Healthcare Information and Management Systems Society’s (HIMSS) Electronic Medical Record Adoption Model (EMRAM) 111 and with implementation science frameworks on organizational readiness such as the Consolidated Framework for Implementation Research (CFIR). Reference Damschroder, Aron and Keith112
-
4. Priority criteria: item priority was determined using risk-management principles (safety and regulatory criticality, dependency and sequencing, feasibility and resource burden), aligned with two normative governance frameworks central to this review—the WHO guidance on ethics and governance of AI in health 6 and the EU Artificial Intelligence Act, 7 which classifies AI applications by risk. This ensures that high-priority items correspond to safeguards required for high-risk use cases.
The maturity scale was designed as a four-point ordinal measure:
-
• 0—Absent: no sign AI solution has been considered.
-
• 1—Emergent: early steps are present (e.g., draft policy, small pilot, budget line), but the system is not yet operational.
-
• 2—Functional: the system is operational but limited in scope or reliability.
-
• 3—Mature: the element is fully embedded, routinely monitored, and continuously improved.
This progression mirrors patterns observed in the literature, moving from complete absence of capacity (eg, no data stewardship), through pilots and partial functionality, to fully institutionalized, monitored systems. A neutral midpoint was deliberately excluded to encourage decisive appraisal of whether an element is operational or still requires attention.
The priority scale distinguishes between:
-
• High Priority items, which are essential for safe and effective AI use, such as complete data, model accuracy, cybersecurity, and legal compliance.
-
• Medium Priority items, which facilitate adoption and sustainability over time, including vendor support, return-on-investment data, or stress-testing of alert volumes.
Discussion
This review maps how AI has been applied in IPC across 100 studies and distills common barriers and risks into a readiness checklist. Applications ranged from prediction to surveillance and compliance monitoring, but most systems remain at early stages. Success depends not only on algorithmic accuracy but on the conditions enabling reliable use: high-quality data, seamless integration, and organizational readiness. Three key messages emerge: data quality drives performance; integration is the main bottleneck; and barriers span technical, economic, and organizational domains.
Data quality drives AI performance
According to our analysis, data completeness and quality influences AI systems performance and applicability. Across nearly every IPC application area, the most effective systems were built on complete, high-fidelity data. When microbiology codes, admission timestamps, and vital signs were reliably captured, predictive models for HAIs’ prevention such as surgical site infection (SSI) and Clostridioides difficile risk prediction routinely achieved high accuracy. Reference Gutierrez-Naranjo12,Reference Ke13,Reference Chen, Joisa and Stem78 Conversely, nearly half of all studies reported stalled or degraded performance due to missing fields, non-standard coding, or delayed data streams. Reference Bucher, Shi and Ferraro77,Reference Kim and Canovas-Segura100 From a practical standpoint, this suggests that data readiness must come before model adoption. Installing even the most sophisticated AI algorithm on top of fragmented or inconsistent inputs is unlikely to yield clinical value and may even introduce misleading noise.
Digital integration is the real bottleneck
The study design distribution highlighted a pronounced translation gap: while experimental (30%) and retrospective investigations (49%) dominate, fewer than one-fifth of studies evaluate real-world implementation (17%), and rigorous external validation remains exceptional (4%). External validation or device-level verification is the step that exposes hidden overfitting, data-mapping errors, and workflow frictions before large-scale rollout. The scarcity of such studies therefore highlights a critical translational gap: most AI-for-IPC tools remain unproven outside their development sandbox. Moving the field beyond proof-of-concept will therefore require prospective, implementation-science designs that capture workflow integration, user acceptance, and downstream infection-control impact. The vast majority remained confined to isolated research platforms, often lacking connectivity with EHR interfaces, IPC dashboards, or real-time data streams from environmental or wearable sensors. However, by curating case definitions into formats interpretable by AI agents, even complex surveillance tasks such as the detection of HAIs, can be substantially automated. Reference Rennert-May, Leal, MacDonald, Cannon, Smith, Exner, Larios, Bush and Chew25,Reference Shi, Liu and Pruitt29,Reference Tvardika, Kergourlay, Bittar, Segond, Darmoni and Metzger34,Reference Wiemken and Carrico41,Reference Cho, Kim and Chung45,Reference Lukasewicz Ferreira, Franco Meneses and Vaz54,Reference dos Santos, Silva and Menezes72,Reference Flores-Balado, Méndez and González108
Such frameworks could also be extended to support automated case-to-definition matching in outbreak investigations, reportable disease tracking, and clinical decision-making, particularly when embedded directly within electronic health records (EHR). Reference Bhatia, Sood and Kaur15,Reference Sundermann, Chen and Miller42,Reference Sundermann, Chen and Miller43,Reference Atkinson, Ellenberger and Piezzi101
Where digital integration did occur, for example, monitoring of PPE adherence or linking hand hygiene systems to IPC dashboards, teams were more likely to act on the outputs, leading to improved compliance and earlier intervention. Reference Kim, Park and Sippel44,Reference Xu, Yang and Chen109
Failures were often linked to technical incompatibilities (eg, software version mismatches after EHR upgrades) or infrastructure gaps, such as the absence of real-time IoT data pipelines but also due to high setup costs, requiring integration with existing systems for smooth workflow. Reference Haghpanah, Vali and Torkamani66,Reference Van Lissa, Stroebe and vanDellen67,Reference Hu, Zhong, Li, Tan and He69 This highlights a key insight: accuracy is not enough, usefulness depends on connectivity.
Barriers compound across domains
Barriers rarely occur in isolation. Data gaps, fragile digital infrastructure, budget constraints, and users’ skepticism often interact, reinforcing one another. For instance, dependency on EHR structure and data quality and high computational cost contributed to model drift, which in turn produced irrelevant alerts that clinicians learned to ignore. Reference Kamruzzaman, Heavey and Song35,Reference Ferrari, Arina and Edgeworth85
Human factors, like alert fatigue or distrust, became more acute when paired with poor feedback loops or missing context. Reference Balczewski and Keidan102 A simple AI model that functions transparently and within a familiar dashboard may be more sustainable than a black-box tool that overloads staff with unexplained alerts.
Risk is not just technical, it’s operational
The risk landscape described across these studies suggests that operational vulnerabilities are at least as common, and potentially more disruptive, than algorithmic failures. In some studies, challenges were reported due to explainability of ML models and integration into daily routines due to complexity Reference Jakobsen59 or misaligned response protocols. Reference Chester and Mandler31 These risks affect patient care directly and erode staff trust.
Operational and technical risks frequently co-occurred, particularly in complex systems depending on high-quality data and computational power. Reference Jakobsen59 Although relatively few papers discuss it (12%), data security such as patient privacy remain critical, especially as cloud-based IPC solutions scale across institutions combined with wireless solutions.
Maturity levels vary across IPC application areas
Some IPC application areas appear closer to routine readiness. Predictive analytics (53%) and hand hygiene monitoring (13%) stand out: models often performed well, and compliance tools that provided immediate feedback were linked to tangible improvements. Reference Gutierrez-Naranjo12,Reference Ke13,Reference Wang, Xia, Wu, Ni, Long, Li, Zhao, Chen, Wang, Xu, Huang and Lin17 One pilot study demonstrated that a human-AI collaboration system accurately monitored PPE donning and doffing procedures in a simulated setting, suggesting that such systems could serve as a substitute or enhancement to in-person observers. Reference Kim, Park and Sippel44,Reference Segal, Bradley and Williams55 These systems are beginning to move beyond pilot status in select settings. One study reported that despite some usability concerns, particularly related to AI system design and EHR integration, most users expressed overall satisfaction with AI-based hand hygiene monitoring. Reference Lintz68 Importantly, the system was perceived to reduce HAIs and positively influence provider well-being, with younger and more experienced staff reporting greater satisfaction with AI use in direct patient care. Reference Lintz68
By contrast, AMR prediction through WGS, Reference Sundermann, Chen and Miller42,Reference Sundermann, Chen and Miller43 and sensor-based environmental monitoring Reference Hattori, Sekido, Leong, Tsutsui, Arima, Tanaka, Yokota, Washio, Kawai and Okochi27,Reference Hu, Zhong, Li, Tan and He69,Reference Lee, Shim and Lim106 remain largely exploratory. Their value is conceptually clear, but technical complexity and cost remain barriers to widespread use.
For decision-makers, this suggests a tiered approach: focus first on AI tools with strong implementation evidence and realistic integration paths, while continuing to evaluate and pilot newer innovations with longer lead times.
AI tools already match or surpass conventional IPC approaches in prediction and compliance monitoring (eg, Hand Hygiene or PPE compliance) when high-quality data and robust integration are in place. However, data fragmentation, fragile interfaces, and unfunded maintenance obligations consistently limit scale-up. We also illustrate how these findings informed a readiness checklist that will offer IPC leaders a concise, evidence-based instrument to assess feasibility, allocate resources and mitigate risk before AI deployment.
Future directions
Few studies addressed explainability, equity, or sustainability. Post hoc explanation methods (eg, SHAP) were rare, Reference Li, Liu and Wu95 and almost none reported energy or carbon footprints even if the Wash Ring project Reference Xu, Yang and Chen109 showed that efficiency gains are possible. Future research should combine performance metrics with explainability, sustainability reporting, and validation in diverse, low-resource settings.
Limitations
Most included studies were single-center pilots from high-income countries (85%), limiting generalizability. Evidence on cost-effectiveness and patient outcomes was scarce. The proposed checklist is an initial synthesis that requires prospective validation and iterative refinement through consensus methods such as Delphi.
Conclusion
Artificial intelligence offers real opportunities to strengthen infection prevention and control, from improving prediction and surveillance to supporting compliance monitoring. Yet most applications remain at an early stage, with progress slowed by gaps in data quality, weak system integration, and uneven readiness across healthcare settings. The evidence-informed checklist developed in this review is intended as a practical guide for IPC teams, helping them assess maturity, set priorities, and align implementation with international governance frameworks. 6,7 Moving AI from promising pilots to routine practice will require reliable data, robust integration, and above all, clear accountability to ensure safe and effective use.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/ash.2025.10191.
Data availability statement
All data supporting the findings of this review are contained within the manuscript and its supplementary materials.
Acknowledgments
This review was made possible through the voluntary efforts and collaborative dedication of a multidisciplinary team committed to advancing infection prevention and control. We thank all contributing reviewers and technical advisors who supported the identification, screening, and synthesis of evidence. Special thanks to those who participated in the refinement of the readiness checklist and provided critical feedback on its practical relevance and usability.
Author contribution
SG conceptualized the study, coordinated the review process, led the data extraction and analysis and wrote the first draft. GS contributed to the development of the search strategy, supervised the evidence screening process, and supported synthesis and interpretation of findings. ET provided methodological guidance and critically revised the manuscript for intellectual content. BA offered strategic oversight, validated key findings, and provided substantial input to the paper draft. All authors contributed to the drafting and revision of the manuscript, approved the final version, and take responsibility for the accuracy and integrity of the work.
Financial support
No external funding was received for this work, and all contributions were provided on a voluntary, non-commercial basis.
Competing interests
All authors report no conflicts of interest relevant to this article.
Disclaimer
The opinions expressed in this article are those of the authors and do not reflect the official position of WHO or Istituto Superiore di Sanità (ISS). WHO and ISS take no responsibility for the information provided or the views expressed in this article.





