Impact statement
Growth in the need for rigorous implementation science in global mental health research has outpaced the development and validation of pragmatic tools to measure implementation processes and outcomes in diverse global settings. Of the few implementation measures that are currently in use, essentially all were developed for use in high-income settings, and few have been psychometrically assessed or validated. Our objectives were to (1) bring together a panel of experts and build consensus around best practices for implementation measurement in diverse global settings and (2) survey investigators applying these measures to identify strengths and opportunities in current practice. The results will support guidance for use by investigators planning to quantitatively measure implementation process and outcomes in diverse global settings. This guidance could facilitate novel, rigorous and replicable implementation research in areas of high need.
Introduction
Mental, neurological and substance-use (MNS) disorders are the leading causes of disability globally, yet most people in need of treatment for MNS disorders never receive care (Thornicroft et al., Reference Thornicroft, Chatterji, Evans-Lacko, Gruber, Sampson, Aguilar-Gaxiola, Al-Hamzawi, Alonso, Andrade and Borges2017; Pathare et al., Reference Pathare, Brazinova and Levav2018; Vos et al., Reference Vos, Lim, Abbafati, Abbas, Abbasi, Abbasifard, Abbasi-Kangevari, Abbastabar, Abd-Allah, Abdelalim, Abdollahi, Abdollahpour, Abolhassani, Aboyans, Abrams, Abreu, Abrigo, Abu-Raddad, Abushouk, Acebedo, Ackerman, Adabi, Adamu, Adebayo, Adekanmbi, Adelson, Adetokunboh, Adham, Afshari, Afshin, Agardh, Agarwal, Agesa, Aghaali, Aghamir, Agrawal, Ahmad, Ahmadi, Ahmadi, Ahmadieh, Ahmadpour, Akalu, Akinyemi, Akinyemiju, Akombi, Al-Aly, Alam, Alam, Alam, Alam, Alanzi, Albertson, Alcalde-Rabanal, Alema, Ali, Ali, Alicandro, Alijanzadeh, Alinia, Alipour, Aljunid, Alla, Allebeck, Almasi-Hashiani, Alonso, Al-Raddadi, Altirkawi, Alvis-Guzman, Alvis-Zakzuk, Amini, Amini-Rarani, Aminorroaya, Amiri, Amit, Amugsi, Amul, Anderlini, Andrei, Andrei, Anjomshoa, Ansari, Ansari, Ansari-Moghaddam, Antonio, Antony, Antriyandarti, Anvari, Anwer, Arabloo, Arab-Zozani, Aravkin, Ariani, Ärnlöv, Aryal, Arzani, Asadi-Aliabadi, Asadi-Pooya, Asghari, Ashbaugh, Atnafu, Atre, Ausloos, Ausloos, Ayala Quintanilla, Ayano, Ayanore, Aynalem, Azari, Azarian, Azene, Babaee, Badawi, Bagherzadeh, Bakhshaei, Bakhtiari, Balakrishnan, Balalla, Balassyano, Banach, Banik, Bannick, Bante, Baraki, Barboza, Barker-Collo, Barthelemy, Barua, Barzegar, Basu, Baune, Bayati, Bazmandegan, Bedi, Beghi, Béjot, Bello, Bender, Bennett, Bennitt, Bensenor, Benziger, Berhe, Bernabe, Bertolacci, Bhageerathy, Bhala, Bhandari, Bhardwaj, Bhattacharyya, Bhutta, Bibi, Biehl, Bikbov, Bin Sayeed, Biondi, Birihane, Bisanzio, Bisignano, Biswas, Bohlouli, Bohluli, Bolla, Boloor, Boon-Dooley, Borges, Borzì, Bourne, Brady, Brauer, Brayne, Breitborde, Brenner, Briant, Briggs, Briko, Britton, Bryazka, Buchbinder, Bumgarner, Busse, Butt, Caetano dos Santos, Cámera, Campos-Nonato, Car, Cárdenas, Carreras, Carrero, Carvalho, Castaldelli-Maia, Castañeda-Orjuela, Castelpietra, Castle, Castro, Catalá-López, Causey, Cederroth, Cercy, Cerin, Chandan, Chang, Charlson, Chattu, Chaturvedi, Chimed-Ochir, Chin, Cho, Christensen, Chu, Chung, Cicuttini, Ciobanu, Cirillo, Collins, Compton, Conti, Cortesi, Costa, Cousin, Cowden, Cowie, Cromwell, Cross, Crowe, Cruz, Cunningham, Dahlawi, Damiani, Dandona, Dandona, Darwesh, Daryani, Das, Das Gupta, das Neves, Dávila-Cervantes, Davletov, De Leo, Dean, DeCleene, Deen, Degenhardt, Dellavalle, Demeke, Demsie, Denova-Gutiérrez, Dereje, Dervenis, Desai, Desalew, Dessie, Dharmaratne, Dhungana, Dianatinasab, Diaz, Dibaji Forooshani, Dingels, Dirac, Djalalinia, Do, Dokova, Dorostkar, Doshi, Doshmangir, Douiri, Doxey, Driscoll, Dunachie, Duncan, Duraes, Eagan, Ebrahimi Kalan, Edvardsson, Ehrlich, El Nahas, El Sayed, El Tantawi, Elbarazi, Elgendy, Elhabashy, El-Jaafary, Elyazar, Emamian, Emmons-Bell, Erskine, Eshrati, Eskandarieh, Esmaeilnejad, Esmaeilzadeh, Esteghamati, Estep, Etemadi, Etisso, Farahmand, Faraj, Fareed, Faridnia, Farinha, Farioli, Faro, Faruque, Farzadfar, Fattahi, Fazlzadeh, Feigin, Feldman, Fereshtehnejad, Fernandes, Ferrari, Ferreira, Filip, Fischer, Fisher, Fitzgerald, Flohr, Flor, Foigt, Folayan, Force, Fornari, Foroutan, Fox, Freitas, Fu, Fukumoto, Furtado, Gad, Gakidou, Galles, Gallus, Gamkrelidze, Garcia-Basteiro, Gardner, Geberemariyam, Gebrehiwot, Gebremedhin, Gebreslassie, Gershberg Hayoon, Gething, Ghadimi, Ghadiri, Ghafourifard, Ghajar, Ghamari, Ghashghaee, Ghiasvand, Ghith, Gholamian, Gilani, Gill, Gitimoghaddam, Giussani, Goli, Gomez, Gopalani, Gorini, Gorman, Gottlich, Goudarzi, Goulart, Goulart, Grada, Grivna, Grosso, Gubari, Gugnani, Guimaraes, Guimarães, Guled, Guo, Guo, Gupta, Haagsma, Haddock, Hafezi-Nejad, Hafiz, Hagins, Haile, Hall, Halvaei, Hamadeh, Hamagharib Abdullah, Hamilton, Han, Han, Hankey, Haro, Harvey, Hasaballah, Hasanzadeh, Hashemian, Hassanipour, Hassankhani, Havmoeller, Hay, Hay, Hayat, Heidari, Heidari, Heidari-Soureshjani, Hendrie, Henrikson, Henry, Herteliu, Heydarpour, Hird, Hoek, Hole, Holla, Hoogar, Hosgood, Hosseinzadeh, Hostiuc, Hostiuc, Househ, Hoy, Hsairi, Hsieh, Hu, Huda, Hugo, Huynh, Hwang, Iannucci, Ibitoye, Ikuta, Ilesanmi, Ilic, Ilic, Inbaraj, Ippolito, Irvani, Islam, Islam, Islam, Islami, Iso, Ivers, Iwu, Iyamu, Jaafari, Jacobsen, Jadidi-Niaragh, Jafari, Jafarinia, Jahagirdar, Jahani, Jahanmehr, Jakovljevic, Jalali, Jalilian, James, Janjani, Janodia, Jayatilleke, Jeemon, Jenabi, Jha, Jha, Ji, Jia, John, John-Akinola, Johnson, Johnson, Jonas, Joo, Joshi, Jozwiak, Jürisson, Kabir, Kabir, Kalani, Kalani, Kalankesh, Kalhor, Kamiab, Kanchan, Karami Matin, Karch, Karim, Karimi, Kassa, Kassebaum, Katikireddi, Kawakami, Kayode, Keddie, Keller, Kereselidze, Khafaie, Khalid, Khan, Khatab, Khater, Khatib, Khayamzadeh, Khodayari, Khundkar, Kianipour, Kieling, Kim, Kim, Kim, Kimokoti, Kisa, Kisa, Kissimova-Skarbek, Kivimäki, Kneib, Knudsen, Kocarnik, Kolola, Kopec, Kosen, Koul, Koyanagi, Kravchenko, Krishan, Krohn, Kuate Defo, Kucuk Bicer, Kumar, Kumar, Kumar, Kumar, Kumaresh, Kurmi, Kusuma, Kyu, La Vecchia, Lacey, Lal, Lalloo, Lam, Lami, Landires, Lang, Lansingh, Larson, Larsson, Lasrado, Lassi, Lau, Lavados, Lazarus, Ledesma, Lee, Lee, LeGrand, Leigh, Leonardi, Lescinsky, Leung, Levi, Lewington, Li, Lim, Lin, Lin, Linehan, Linn, Liu, Liu, Liu, Looker, Lopez, Lopukhov, Lorkowski, Lotufo, Lucas, Lugo, Lunevicius, Lyons, Ma, MacLachlan, Maddison, Maddison, Madotto, Mahasha, Mai, Majeed, Maled, Maleki, Malekzadeh, Malta, Mamun, Manafi, Manafi, Manguerra, Mansouri, Mansournia, Mantilla Herrera, Maravilla, Marks, Martins-Melo, Martopullo, Masoumi, Massano, Massenburg, Mathur, Maulik, McAlinden, McGrath, McKee, Mehndiratta, Mehri, Mehta, Meitei, Memiah, Mendoza, Menezes, Mengesha, Mengesha, Mereke, Meretoja, Meretoja, Mestrovic, Miazgowski, Miazgowski, Michalek, Mihretie, Miller, Mills, Mirica, Mirrakhimov, Mirzaei, Mirzaei, Mirzaei-Alavijeh, Misganaw, Mithra, Moazen, Moghadaszadeh, Mohamadi, Mohammad, Mohammad, Mohammad Gholi Mezerji, Mohammadian-Hafshejani, Mohammadifard, Mohammadpourhodki, Mohammed, Mokdad, Molokhia, Momen, Monasta, Mondello, Mooney, Moosazadeh, Moradi, Moradi, Moradi-Lakeh, Moradzadeh, Moraga, Morales, Morawska, Moreno Velásquez, Morgado-da-Costa, Morrison, Mosser, Mouodi, Mousavi, Mousavi Khaneghah, Mueller, Munro, Muriithi, Musa, Muthupandian, Naderi, Nagarajan, Nagel, Naghshtabrizi, Nair, Nandi, Nangia, Nansseu, Nayak, Nazari, Negoi, Negoi, Netsere, Ngunjiri, Nguyen, Nguyen, Nguyen, Nguyen, Nichols, Nigatu, Nigatu, Nikbakhsh, Nixon, Nnaji, Nomura, Norrving, Noubiap, Nowak, Nunez-Samudio, Oţoiu, Oancea, Odell, Ogbo, Oh, Okunga, Oladnabi, Olagunju, Olusanya, Olusanya, Oluwasanu, Omar Bali, Omer, Ong, Onwujekwe, Orji, Orpana, Ortiz, Ostroff, Otstavnov, Otstavnov, Øverland, Owolabi, P a, Padubidri, Pakhare, Palladino, Pana, Panda-Jonas, Pandey, Park, Parmar, Pasupula, Patel, Paternina-Caicedo, Pathak, Pathak, Patten, Patton, Paudel, Pazoki Toroudi, Peden, Pennini, Pepito, Peprah, Pereira, Pereira, Perico, Pham, Phillips, Pigott, Pilgrim, Pilz, Pirsaheb, Plana-Ripoll, Plass, Pokhrel, Polibin, Polinder, Polkinghorne, Postma, Pourjafar, Pourmalek, Pourmirza Kalhori, Pourshams, Poznańska, Prada, Prakash, Pribadi, Pupillo, Quazi Syed, Rabiee, Rabiee, Radfar, Rafiee, Rafiei, Raggi, Rahimi-Movaghar, Rahman, Rajabpour-Sanati, Rajati, Ramezanzadeh, Ranabhat, Rao, Rao, Rasella, Rastogi, Rathi, Rawaf, Rawaf, Rawal, Razo, Redford, Reiner, Reinig, Reitsma, Remuzzi, Renjith, Renzaho, Resnikoff, Rezaei, Rezai, Rezapour, Rhinehart, Riahi, Ribeiro, Ribeiro, Ribeiro, Rickard, Roberts, Roberts, Robinson, Roever, Rolfe, Ronfani, Roshandel, Roth, Rubagotti, Rumisha, Sabour, Sachdev, Saddik, Sadeghi, Sadeghi, Saeidi, Safi, Safiri, Sagar, Sahebkar, Sahraian, Sajadi, Salahshoor, Salamati, Salehi Zahabi, Salem, Salem, Salimzadeh, Salomon, Salz, Samad, Samy, Sanabria, Santomauro, Santos, Santos, Santric-Milicevic, Saraswathy, Sarmiento-Suárez, Sarrafzadegan, Sartorius, Sarveazad, Sathian, Sathish, Sattin, Sbarra, Schaeffer, Schiavolin, Schmidt, Schutte, Schwebel, Schwendicke, Senbeta, Senthilkumaran, Sepanlou, Shackelford, Shadid, Shahabi, Shaheen, Shaikh, Shalash, Shams-Beyranvand, Shamsizadeh, Shannawaz, Sharafi, Sharara, Sheena, Sheikhtaheri, Shetty, Shibuya, Shiferaw, Shigematsu, Shin, Shiri, Shirkoohi, Shrime, Shuval, Siabani, Sigfusdottir, Sigurvinsdottir, Silva, Simpson, Singh, Singh, Skiadaresi, Skou, Skryabin, Sobngwi, Sokhan, Soltani, Sorensen, Soriano, Sorrie, Soyiri, Sreeramareddy, Stanaway, Stark, Ştefan, Stein, Steiner, Steiner, Stokes, Stovner, Stubbs, Sudaryanto, Sufiyan, Sulo, Sultan, Sykes, Sylte, Szócska, Tabarés-Seisdedos, Tabb, Tadakamadla, Taherkhani, Tajdini, Takahashi, Taveira, Teagle, Teame, Tehrani-Banihashemi, Teklehaimanot, Terrason, Tessema, Thankappan, Thomson, Tohidinik, Tonelli, Topor-Madry, Torre, Touvier, Tovani-Palone, Tran, Travillian, Troeger, Truelsen, Tsai, Tsatsakis, Tudor Car, Tyrovolas, Uddin, Ullah, Undurraga, Unnikrishnan, Vacante, Vakilian, Valdez, Varughese, Vasankari, Vasseghian, Venketasubramanian, Violante, Vlassov, Vollset, Vongpradith, Vukovic, Vukovic, Waheed, Walters, Wang, Wang, Wang, Ward, Watson, Wei, Weintraub, Weiss, Weiss, Westerman, Whisnant, Whiteford, Wiangkham, Wiens, Wijeratne, Wilner, Wilson, Wojtyniak, Wolfe, Wool, Wu, Wulf Hanson, Wunrow, Xu, Xu, Yadgir, Yahyazadeh Jabbari, Yamagishi, Yaminfirooz, Yano, Yaya, Yazdi-Feyzabadi, Yearwood, Yeheyis, Yeshitila, Yip, Yonemoto, Yoon, Yoosefi Lebni, Younis, Younker, Yousefi, Yousefifard, Yousefinezhadi, Yousuf, Yu, Yusefzadeh, Zahirian Moghadam, Zaki, Zaman, Zamani, Zamanian, Zandian, Zangeneh, Zastrozhin, Zewdie, Zhang, Zhang, Zhao, Zhao, Zheng, Zhou, Ziapour, Zimsen, Naghavi and Murray2020). Effective, affordable, scalable and sustainable services are needed to bridge this global gap (Lancet Global Mental Health Group et al., Reference Chisholm, Flisher, Lund, Patel, Saxena, Thornicroft and Tomlinson2007). A broad range of preventive and treatment interventions for high-burden MNS conditions have demonstrated promising cost-effectiveness in both high- and low-resource settings (Patel et al., Reference Patel, Chisholm, Dua, Laxminarayan and Medina-Mora2016); in response, researchers and funders alike have called for an increased scientific focus on strengthening intervention implementation and scale-up, particularly in low- and middle-income countries (LMICs), through the application of the methods of implementation science (Betancourt and Chambers, Reference Betancourt and Chambers2016). The primary aim of implementation science is to design and test ways to promote and sustain the delivery of evidence-based practices in routine healthcare (Eccles and Mittman, Reference Eccles and Mittman2006). These implementation strategies target specific aspects of the environment of service delivery, or of the intervention providers or of the intervention itself, all with the goal of improving uptake and sustainment. Implementation success is assessed through a range of implementation outcomes, including acceptability, adoption, appropriateness, cost, feasibility, fidelity, penetration and sustainability (Proctor et al., Reference Proctor, Silmere, Raghavan, Hovmand, Aarons, Bunger, Griffey and Hensley2011). For example, if unhelpful attitudes or beliefs among clinic staff are thought to be hindering implementation of evidence-based mental health care, the use of peer influencers or opinion leaders might be considered as an implementation strategy to improve provider acceptance of mental health services. Application of implementation science methods to the field of global mental health has grown rapidly in recent years (Wagenaar et al., Reference Wagenaar, Hammett, Jackson, Atkins, Belus and Kemp2020).
This growth has outpaced the development and validation of pragmatic tools for implementation measurement in diverse global settings. As with any science, valid measurement is critical to the utility and reproducibility of implementation research (Lewis et al., Reference Lewis, Fischer, Weiner, Stanick, Kim and Martinez2015). For example, many implementation studies begin with an assessment of the multi-level contextual determinants of implementation effectiveness (Damschroder et al., Reference Damschroder, Aron, Keith, Kirsh, Alexander and Lowery2009). These determinants can inform the choice of implementation strategies; they are also useful for understanding the process of implementation and they may moderate or mediate intervention effects (Waltz et al., Reference Waltz, Powell, Fernández, Abadie and Damschroder2019). Measurement of implementation outcomes is also critical to judging the effectiveness of implementation strategies. While some implementation constructs may be manifest, or measured through observable indicators (e.g., rate of provider serviced delivery as an indicator of penetration) (Willmeroth et al., Reference Willmeroth, Wesselborg and Kuske2019), many are latent, implying some level of self-report (e.g., provider acceptability). Many quantitative measures of latent implementation constructs exist and have been identified and catalogued through systematic review; relatively few, however, have been assessed for validity or have documented strong psychometric properties, though the number of measures with strong psychometric properties is increasing (Khadjesari et al., Reference Khadjesari, Boufkhed, Vitoratou, Schatte, Ziemann, Daskalopoulou, Uglik-Marucha, Sevdalis and Hull2020; Mettert et al., Reference Mettert, Lewis, Dorsey, Halko and Weiner2020). Even fewer measures have been assessed for their pragmatic qualities, including burden, length, reliability and sensitivity to change (Hull et al., Reference Hull, Boulton, Jones, Boaz and Sevdalis2022). Importantly, almost all extant, validated, pragmatic, quantitative implementation measures were developed for use in high-income countries (Lewis et al., Reference Lewis, Weiner, Stanick and Fischer2015). These implementation measures – and their corresponding theories, models and frameworks – may need to be appropriately translated, adapted and validated for use in diverse global contexts (Means et al., Reference Means, Kemp, Gwayi-Chore, Gimbel, Soi, Sherr, Wagenaar, Wasserheit and Weiner2020).
To date, most implementation studies by global mental health researchers have relied exclusively on qualitative assessment, with relatively few using quantitative implementation measures (Wagenaar et al., Reference Wagenaar, Hammett, Jackson, Atkins, Belus and Kemp2020). Though qualitative methods are a crucial part of implementation science, valid quantitative measurement allows for larger studies and improves study rigor and reproducibility (Palinkas et al., Reference Palinkas, Aarons, Horwitz, Chamberlain, Hurlburt and Landsverk2011; Palinkas, Reference Palinkas2014). Investigators have several factors to consider when choosing quantitative measures for use – in addition to whether an appropriate measure exists – including different aspects of measure validity and reliability, as well as each measure’s pragmatic qualities (e.g., length, cost) (Powell et al., Reference Powell, Stanick, Halko, Dorsey, Weiner, Barwick, Damschroder, Wensing, Wolfenden and Lewis2017). Given that almost all existing implementation measures were developed for use in high-resource settings, global mental health researchers must carefully consider the validity and appropriateness of each measure in their setting. There are several distinct approaches available for establishing validity and other measure characteristics in novel settings (Boateng et al., Reference Boateng, Neilands, Frongillo, Melgar-Quiñonez and Young2018). Table 1 describes these characteristics and approaches in detail and notes which approaches are designed to assess which characteristics. For example, cross-cultural validity can be established using translation, back-translation, expert advice and pre-testing.
Limited guidance exists to support global mental health services investigators in the choice and use of quantitative implementation measures – or the choice and use of approaches to adapt and validate those measures. Our objectives in this project were to (1) bring together a panel of experts to better understand and develop consensus on best practices for implementation measurement, with a particular focus on mental health implementation research in LMICs, and (2) survey investigators applying these measures to identify strengths and opportunities in current practice.
Methods
Expert panel
Participants
We used purposive sampling to select and invite a panel of experts at the intersection of implementation science, psychometrics and global mental health, starting from a list generated by members of the study team. Specifically, we approached experts in our extended professional networks who we knew had experience with developing, adapting or validating implementation measures for use in global mental health research. We recruited eight panel members (see Supplementary Material for a full list of panel participants). One panel member withdrew between the first and second panel discussions.
Delphi process
The goal of our modified Delphi process was to develop consensus among the panel members on: (1) prioritization of different types of measure validity, reliability and pragmatic qualities for assessment and confirmation when using measures under different circumstances and in different settings (see Table 1 for definitions of each quality); (2) feasibility and utility of different measure validation approaches (see Table 1 for definitions of each approach) and (3) a minimal set of validation approaches for use when applying implementation measures in new contexts and settings. We followed the steps of a conventional Delphi process, including an exploratory phase, a first round of quantitative questionnaires, analysis/summation and results discussion (Avella, Reference Avella2016). A preliminary discussion was held in March 2020 to orient panelists to the Delphi process. Questionnaires were then distributed and completed electronically. Questionnaire responses were aggregated and anonymized, and summary statistics of responses were presented to the panel. Following the distribution of the questionnaire analysis, available panel members were convened virtually to review the results and, if possible, achieve consensus on recommendations.
Questionnaires included three sections (see Supplementary Material). In the first section, panel members were given different measurement scenarios (e.g., use of an implementation measure developed in a US context to assess the same construct in a novel, lower-resource context) and were asked which types of measurement characteristics (e.g., different types of validity, reliability or pragmatic qualities; Table 1) need to be established prior to measure use in a novel context. In the second section, panel members rated distinct validation strategies (e.g., informal expert elicitation, pilot survey with subsequent real-world outcomes; Table 1) on nine dimensions of rigor, feasibility and resource intensiveness. Finally, in the third section panel members proposed a minimal set of validation strategies that researchers could use under most circumstances when applying an implementation measure in a diverse new setting.
One author (KD) had access to the questionnaire responses and interview data and completed all analyses (Linstone and Turoff, Reference Linstone and Turoff1975). To maintain confidentiality and promote the rigor of the process, no identifying information was shared with other members of the research team or expert panel. Results draw from all questionnaire and interview responses as well as discussion during the second-round call. CK moderated and LA attended, but did not contribute to, both rounds of panel discussion.
The aim was to achieve a reasonable degree of consensus among panel members. No a priori target for degree of consensus was set for this study, and a full consensus-based approach was not pursued. This was done for reasons of appropriateness and feasibility; in particular, there are only a small number of experts at the intersection of global mental health and implementation measurement worldwide, and ongoing travel restrictions and social distancing measures related to the COVID-19 pandemic meant in-person consensus-building activities were impossible at the time. Though we did not use a quantitative threshold (e.g., calculating an agreement statistic or a formal vote) to assess consensus, we did bring the expert panel together for a Zoom-based discussion of the summary of their questionnaire results, with a particular focus on areas of divergence. Panel members agreed with the synthesis of results and concluded that the rankings of results within each subsection were acceptable and reflected their judgement.
Investigator survey
Participants
We also conducted a survey of global mental health researchers to understand current practice in implementation measurement. We searched NIH RePORTER and the Grand Challenges Canada website on May 18, 2020, for descriptions of funded implementation research studies related to mental health services in LMIC settings (see Supplementary Material for the NIH RePORTER search strategy). The names and contact information for the lead principal investigator for each study, as well as study descriptions, were abstracted into a sampling frame. One of three authors (C.G.K., K.D., L.A.) screened each study and associated principal investigator for inclusion; studies were excluded if they were not conducted in an LMIC or were not related to mental health. We contacted all remaining principal investigators and invited them to participate in a structured online survey related to the measurement of implementation processes and outcomes in their study. Principal investigators could also nominate a study team member or collaborator – someone who was directly involved in the implementation measurement component of the study – to participate in their place. Between NIH RePORTER, Grand Challenges Canada and this snowball sampling approach, we anticipated reaching most investigators with experience leading formal global mental health implementation research. Contacted investigators were sent a reminder email if they did not initially respond to the online questionnaire within a 2-week period, and a final reminder was sent 2 weeks later. Survey recruitment and data collection occurred from July to November 2020.
Survey measures
We designed the survey to assess: (1) the scope and nature of global mental health implementation research conducted by each investigator, (2) the range of implementation process and outcome measures used by investigators across any of their implementation studies and (3) the study setting, population, sample size, types of measure adaptation or validation used if any, assessment of measure performance and any recommendations for measure improvement.
Analysis
Categorical responses were summarized using simple descriptive statistics at the level of the respondent. Open-text responses were reviewed for recurring themes or approaches to adaptation and validation.
Research ethics
The Human Subjects Division of the University of Washington determined that both components of this study qualified for exemption status under 45 CFR 46.101 (b).
Results
Expert panel
Section 1: Measure characteristics
There was substantial concordance across panel members indicating it was reasonable to rely on evidence of most measure characteristics that had been established in similar contexts (e.g., another low-resource setting) without needing to establish those characteristics in every new setting (Supplementary Material, Section 1). This was true for all types of measure validity, reliability and dimensionality, except for cross-cultural validity (i.e., adequate adaptation for and performance in a new context), which was judged important to be established in each new setting. In contrast, there was limited agreement on the need to establish the pragmatic qualities of measures in each new setting. Though qualities like measure cost, length, ease of completion and assessor burden were judged to be unnecessary to be established in new settings if already established in similar settings, qualities related to how the measure would be used (e.g., whether it would inform decision-making, whether it fit with organizational activities) were felt to be important to establish in each new setting.
Panel members were then asked whether it was ever possible to rely on evidence of measure characteristics that had been established in other settings, even settings that were substantially different (e.g., high-income country). Respondents indicated that if investigators established the face validity of an implementation measure in a new setting – for example, through informal expert review and a small pilot use with confirmatory factor analysis – it would not then be necessary to conduct an intensive validation process. Respondents suggested that because implementation measures were not used directly to guide patient care, the stakes were lower than for other measures (e.g., diagnostic or screening tools), and correspondingly the bar for validation was lower.
Panel members were also asked about how they would choose between different hypothetical implementation measures based on their pragmatic qualities, assuming the hypothetical measures were equally valid. Respondents scored nearly all pragmatic qualities as important in making this decision, though acceptability, ease of completion, cost and language accessibility were rated as the most important qualities that would be considered (Table 2). In follow-up conversations with panel members, nearly all highlighted measure length as a key issue with current implementation measures, raising concerns related to respondent fatigue, assessor fatigue and artificial inflation of internal consistency. Respondents also felt that the results from most currently available measures were difficult to interpret, and that this was holding back their use and applicability. They suggested that the inclusion of quantitative thresholds and other guidance on how to judge what measure scores “mean” would be beneficial.
Section 2: Validation strategies
Respondents identified a trade-off between the rigor of different validation approaches and their resource-intensiveness (Supplementary Material, Section 2). The two survey-based validation strategies, one using other established measures and the other using subsequent real-world outcomes for validation, were judged to be the most rigorous as well as the most expensive and time-consuming. Respondents rated the two forms of expert elicitation (informal and formal) as moderately or highly feasible and inexpensive, but there was no agreement on the assumed rigor of the results. Translation/back-translation scored consistently and moderately on all dimensions. Respondents disagreed most about the vignette-based strategy; they did not agree on the amount of time and resources required, nor whether it was feasible to develop vignettes that could provide high-confidence results in diverse low-resource settings. One respondent cautioned that developing good vignettes for community mental health programs could be hampered by the fact that these services are often uncommon in low-resource settings, and thus there is no “gold standard” program to which one can refer. Instead, vignettes must use hypothetical examples that take longer to explain and may produce unreliable results.
Section 3: Package of validation strategies
Translation/back-translation was the most frequently recommended strategy followed by informal expert elicitation. No other strategy was recommended by more than two respondents. Several respondents struggled with the tension between cost and rigor and wondered whether a minimal set of validation strategies might be feasible in most situations but ultimately insufficient for establishing validity. Most respondents suggested using a combination of validation strategies was the most appropriate approach; nearly all respondents argued that strategies should be “fit for purpose” and only as rigorous and complex as necessary. Respondents also debated the most appropriate approach to disseminate guidance on implementation measurement to mental health services researchers across diverse global settings. One respondent argued for the provision of step-by-step guidance, while another cautioned against offering overly prescriptive guidance to LMIC-based investigators.
Complete Delphi panel results are presented in the Supplementary Material.
Investigator survey
We invited 107 investigators to participate in the survey or suggest other investigators for participation. Sixty-two investigators responded. We sent survey links to 45 investigators who indicated interest in participation. Thirty-eight investigators started the survey. Table 3 presents the characteristics of the 28 investigators who completed the survey. The majority (61%) were based in the United States, most (82%) were at universities or other academic institutions and almost all (96%) were focused on research as opposed to clinical service delivery or program implementation. Investigators had been involved in a mean of 2.2 implementation studies related to mental health.
a ≥1 response per participant possible.
Table 4 describes the usage of implementation measures reported by at least two investigators in LMIC settings. The most used implementation measures included the Consolidated Framework for Implementation Research Inner Setting measures (n = 7) (Fernandez et al., Reference Fernandez, Walker, Weiner, Calo, Liang, Risendal, Friedman, Tu, Williams, Jacobs, Herrmann and Kegler2018), the Program Assessment Sustainability Tool (n = 5) (Luke et al., Reference Luke, Calhoun, Robichaux, Elliott and Moreland-Russell2014) and the Acceptability of Intervention Measure, Intervention Appropriateness Measure and Feasibility of Intervention Measure (n = 5) (Weiner et al., Reference Weiner, Lewis, Stanick, Powell, Dorsey, Clary, Boynton and Halko2017). Measures were most commonly used prior to intervention implementation (n = 18) or mid-implementation (n = 18) as opposed to post-implementation (n = 7) and were most often used to assess contextual determinants of implementation effectiveness (n = 20) rather than to assess implementation outcomes (n = 9). Providers were the most common group sampled (n = 25), followed by clients (n = 9). Measures were used in a diverse range of contexts across Latin America, Sub-Saharan Africa, Eastern Europe and South/Southeast Asia. Adaptation approaches were generally limited to translation and back-translation (n = 23) and stakeholder feedback (n = 16), and only one investigator reported conducting any measure validation prior to use (pilot testing). Limited response variability, positive response bias, measure length and item relevance were the most common challenges reported.
Note: Measures reported as used by only one investigator, or used only in a high-income country setting, are not included in Table 4. Responses related to the Acceptability of Intervention, Intervention Appropriateness, and Feasibility of Intervention Measures were collapsed across the scales as there was complete overlap within respondents for these measures. Responses related to the Applied Mental Health Research implementation measures, which include client-, provider-, organizational- and policy-level scales for several implementation outcomes and contextual determinants, were collapsed for the same reason.
AIM, Acceptability of Intervention Measure; AMHR/mhIST, Applied Mental Health Research/Mental Health Implementation Science Tool; CFIR, Consolidated Framework for Implementation Research; EBPAS, Evidence-Based Practice Attitude Scale; FIM, Feasibility of Intervention Measure; IAM, Appropriateness of Intervention Measure; ORIC, Organization Readiness for Implementing Change; PSAT, Program Sustainability Assessment Tool.
Other measures reported as used by individual investigators included the Implementation Leadership Scale (Aarons et al., Reference Aarons, Ehrhart and Farahnak2014), the Theory of Planned Behavior measures (Ajzen, Reference Ajzen2011), the Feelings Thermometer (ALWIN, Reference Alwin1997), the Systems Usability Scale (Lewis, Reference Lewis2018), the Organizational Social Context scale (Glisson et al., Reference Glisson, Landsverk, Schoenwald, Kelleher, Hoagwood, Mayberg and Green2008), several intervention-specific fidelity scales and several measures developed new for individual studies.
Discussion
This study sought to improve quantitative implementation measurement in the field of global mental health by generating consensus recommendations on best practices for measure choice and validation and by surveying the field to understand current practice. Our expert panel concluded that pragmatic concerns are key to choosing between measures and validation approaches. They noted that many quantitative implementation measures are lengthy and identified a trade-off between resources and rigor in the various approaches available for adapting and validating implementation measures in diverse global settings. However, they concluded that in many cases, it is sufficient for investigators to establish the face validity of an implementation measure in a new setting through some combination of reviewing the use of that measure in a similar setting, convening an informal expert and stakeholder panel, conducting translation and back-translation and piloting the measure to confirm its dimensionality and internal reliability. Though confirming the predictive validity of a measure by correlating it with subsequent real-world outcomes would be the gold standard for measure validation, panel members felt this was unnecessary prior to using most implementation measures. Survey results suggested that though several implementation measures have been used or are in use in global mental health studies across a variety of levels and study phases, almost none have been formally validated as part of those studies.
Quantitative measures must be reliable, valid and practical to be useful for implementation research or practice, though comprehensive reviews of published implementation measures have noted that the field faces several major issues. These include the poor distribution of quantitative measures across implementation constructs and analytic levels; a lack of measures with strong psychometric qualities; measure synonymy (the same measure items are sometimes used to measure different constructs), homonymy (different measure items are used to measure the same construct) and instability (measure items are often changed with each use) and the reality that many implementation measures exhibit poor pragmatic qualities (Lewis et al., Reference Lewis, Mettert, Dorsey, Martinez, Weiner, Nolen, Stanick, Halko and Powell2018). Nevertheless, a growing number of strong implementation measures do exist: the challenge for investigators in diverse global settings in choosing and adapting these – or developing new ones – and ensuring that they perform well. Notably, the Psychometric and Pragmatic Evidence Rating Scale has been developed through stakeholder consensus to provide clear criteria for measure quality, both to inform measure development and measure choice (Stanick et al., Reference Stanick, Halko, Nolen, Powell, Dorsey, Mettert, Weiner, Barwick, Wolfenden, Damschroder and Lewis2019). In addition, domain-specific resources are increasingly available to support investigators in choosing between manifest and latent indicators of implementation process and outcomes, including the HIV Implementation Outcomes Crosswalk (Li et al., Reference Li, Audet and Schwartz2020).
Several key limitations should be noted. Our expert panel consisted of only seven members, reflecting the relatively small number of individuals with intersecting expertise in global mental health, implementation science and psychometrics. In response, we opted for depth over breadth and sought to reach panel consensus across a wide range of issues related to measure use and validation, rather than for one or two key questions. Our Delphi panel size is considered acceptable for non-statistical analysis (Rowe and Wright, Reference Rowe and Wright1999). All panel procedures were carried out during the first 6 months of the COVID-19 pandemic, meaning procedures were remote and sometimes asynchronous. For our survey, we sampled investigators from NIH RePORTER and Grand Challenges Canada; these are two of the most prolific funders of global mental health implementation research, though this approach likely biased our sample toward investigators based in North America. To mitigate this risk, we used snowball sampling to attempt to identify and recruit other investigators that would have been missed with this approach. Our overall response rate was low, which again may reflect the small number of individuals actively using quantitative measures in their global mental health implementation studies; many investigators we contacted declined to participate because they were not using quantitative implementation measures.
Despite these limitations, our findings may directly support the growing field of global mental health implementation research. We have used our results to compile a set of guidance documents for investigators planning to quantitatively measure latent implementation processes and outcomes in diverse global settings. These include a compendium of available measures across implementation constructs and detailed descriptions of common adaptation and validation approaches. This guidance should facilitate rigorous and replicable implementation research in an area of high need, though it is not intended to be prescriptive, and local investigators are encouraged to adapt and apply the guidance only where it is useful. Moving forward, as the quantity and quality of implementation measures designed for use in for diverse global contexts increase (Aldridge et al., Reference Aldridge, Kemp, Bass, Danforth, Kane, Hamdani, Marsch, Uribe-Restrepo, Nguyen and Bolton2022), the standards for measure adaptation and validation may also shift. Less emphasis may be placed on establishing measure validity for the sake of scientific rigor, with a corresponding increased emphasis on measure pragmatic qualities and capacity to inform real-world health service delivery.
Open peer review
To view the open peer review materials for this article, please visit http://doi.org/10.1017/gmh.2023.63.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/gmh.2023.63.
Data availability statement
Study data are not publicly available as they contain information that could compromise the privacy of research participants.
Acknowledgments
The study team would like to thank our fantastic panel and the survey participants for their valuable contributions to this research.
Author contribution
All listed authors qualify for authorship based on making one or more substantial contributions to the manuscript. C.G.K., K.D., L.A., L.K.M. and E.E.H. contributed to the conceptualization of this study. C.G.K., K.D. and L.A. contributed to the formal analysis. C.G.K. wrote the original draft of the manuscript; K.D., L.A., L.K.M. and E.E.H. contributed to reviewing and editing subsequent drafts of the manuscript. All authors read and approved the final manuscript.
Financial support
This study was funded by a grant from the National Institute of Mental Health (#R01MH115495-02S1; PIs: Laura Murray, Izukanji Sikazwe). L.A. was supported by the National Institute of Mental Health T32 training grants in Global Mental Health (#T32MH103210; PI: Judith K. Bass) during study conceptualization and analysis and in Mental Health Services and Systems (#T32MH109436; PIs: Emma Elizabeth McGinty, Elizabeth A. Stuart) during manuscript preparation. E.E.H. was supported by a Mentored Career Development Award from the National Institute of Mental Health (#K01MH116335).
Competing interest
None declared.
Comments
Dear Drs. Bass and Chibanda,
We wish to submit a new manuscript entitled “Implementation measurement in global mental health: results from a modified Delphi panel and investigator survey” for consideration Cambridge Prisms: Global Mental Health.
We confirm that this work is original and has not been published elsewhere nor is it currently under consideration for publication elsewhere. We also confirm that we have no competing interests, and that all authors have approved the manuscript for submission.
In this paper, we bring together a panel of experts and build consensus around best practices for implementation measurement in diverse global settings, and survey investigators applying these measures to identify strengths and opportunities in current practice. We hope the results will facilitate novel, rigorous, and replicable implementation research in areas of high need. This manuscript should be of relevance to readers with an interest in implementation science.
Please address all correspondence concerning this manuscript to ckemp11@jhu.edu.
Thank you for your consideration of this manuscript.
Sincerely,
Christopher Kemp, PhD MPH