Introduction
Employment is key to improve community functioning and mental health in people with mental illnesses (Drake and Wallach, Reference Drake and Wallach2020). Work fosters a sense of pride and self-esteem, offers financial independence, provides coping strategies for psychiatric symptoms and ultimately facilitates the process of recovery (Dunn et al., Reference Dunn, Wewiorski and Rogers2008). However, depending on the diagnosis, only between 14 and 33% of the working-age adults (18–65 years old) with mental illnesses are employed, which is substantially lower than the general population (Marwaha et al., Reference Marwaha, Johnson, Bebbington, Stafford, Angermeyer, Brugha, Kilian, Hansen and Toumi2007; Kozma et al., Reference Kozma, Dirani, Canuso and Mao2010; Hakulinen et al., Reference Hakulinen, Elovainio, Arffman, Lumme, Suokas, Pirkola, Keskimäki, Manderbacka and Böckerman2020). Therefore, ongoing support in obtaining and sustaining competitive employment is needed for people with mental illnesses to create a strong and inclusive labour market. Individual placement and support (IPS) is the most effective rehabilitation programme to help people with mental illnesses into competitive employment (Modini et al., Reference Modini, Tan, Brinchmann, Wang, Killackey, Glozier, Mykletun and Harvey2016; Metcalfe et al., Reference Metcalfe, Drake and Bond2018a).
IPS was originally developed to support people with severe mental illness (SMI) in achieving competitive employment. IPS is based on eight basic principles (Becker and Drake, Reference Becker and Drake2003; See Box 1). The overall effectiveness of IPS is well-established for people with SMI (Modini et al., Reference Modini, Tan, Brinchmann, Wang, Killackey, Glozier, Mykletun and Harvey2016; Metcalfe et al., Reference Metcalfe, Drake and Bond2018a). Because of its success, IPS is also increasingly offered to people with other diagnoses, such as common mental disorders (CMD), affective disorders, post-traumatic stress disorders (PTSD) and substance use disorders (SUD) (Bond et al., Reference Bond, Drake and Pogue2019). Results indicated the beneficial effects of IPS for people with PTSD and SUD (Bond et al., Reference Bond, Drake and Pogue2019). However, mixed indications of the effectiveness of IPS for people with CMD (including affective disorders) were found (Hellström et al., Reference Hellström, Pedersen, Christensen, Wallstroem, Bojesen, Stenager, Bejerholm, Van Busschbach, Michon and Eplov2021; Probyn et al., Reference Probyn, Engedahl, Rajendran, Pincus, Naeem, Mistry, Underwood and Froud2021). The reasons for the diminished effectiveness of IPS for CMD are unclear, because a consistent definition of CMD is lacking. Diagnostic criteria for labelling CMD differ between studies, but most studies define CMD to include affective and/or anxiety disorders, of varying duration of illness (Vollebergh et al., Reference Vollebergh, Iedema, Bijl, de Graaf, Smit and Ormel2001; Steel et al., Reference Steel, Marnane, Iranpour, Chey, Jackson, Patel and Silove2014; De Vries et al., Reference De Vries, Ten Have, van Dorsselaer, van Wezep, Hermans and de Graaf2016).
Employment outcomes in IPS programmes also vary between service users with different clinical, functional and personal characteristics, such as symptom severity, substance use, involuntary hospitalisation, social functioning, work experience, education level, duration of illness, age and age of onset of psychiatric disorder (Catty et al., Reference Catty, Lissouba, White, Becker, Drake, Fioritti, Knapp, Lauber, Rössler, Tomov, Van Busschbach and Wiersma2008; Marwaha et al., Reference Marwaha, Johnson, Bebbington, Angermeyer, Brugha, Azorin, Kilian, Hansen and Toumi2009; Campbell et al., Reference Campbell, Bond, Drake, McHugo and Xie2010; Luciano et al., Reference Luciano, Bond and Drake2014; Fyhn et al., Reference Fyhn, Øverland and Reme2020; Christensen et al., Reference Christensen, Wallstrøm, Bojesen, Nordentoft and Eplov2021). However, these moderating effects have been inconsistent across studies, resulting in ambiguity about the effectiveness of IPS in different subgroups. As IPS has been increasingly expanded to different populations, it is timely to investigate how well the effectiveness of IPS generalises to new target groups.
Therefore, in this meta-analysis we analysed the relative effectiveness of IPS for different subgroups of service users as reported in randomised controlled trials of IPS. We assessed the relative effectiveness of IPS by examining study-level outcomes for subgroups of studies with different diagnostic, clinical, functional and personal characteristics using sensitivity analyses of subgroup differences (Borenstein and Higgins, Reference Borenstein and Higgins2013). This is the first meta-analysis that specifically focused on the relative effectiveness of IPS in different target groups with a focus on both target groups with and without SMI. This gives some unique insights into the relative effectiveness of IPS, and valuable addition to the recent contributions about this topic in comparable reviews (i.e. Bond et al., Reference Bond, Drake and Pogue2019; Probyn et al., Reference Probyn, Engedahl, Rajendran, Pincus, Naeem, Mistry, Underwood and Froud2021). The meta-analysis addressed the following research questions:
1. How does the effectiveness of IPS differ between subgroups of service users with distinct clinical, functional and personal characteristics?
2. What is the relative effectiveness of IPS for specific diagnostic subgroups of service users with CMD, SMI, schizophrenia spectrum disorders and mood disorders?
Materials and methods
Our meta-analysis followed the latest PRISMA guidelines (Page et al., Reference Page, McKenzie, Bossuyt, Boutron, Hoffmann, Mulrow, Shamseer, Tetzlaff, Akl, Brennan, Chou, Glanville, Grimshaw, Hróbjartsson, Lalu, Li, Loder, Mayo-Wilson, McDonald, McGuinness, Stewart, Thomas, Tricco, Welch, Whiting and Moher2021). Our protocol was preregistered in PROSPERO (CRD42020220080).
Search strategy
We identified records through searches in PubMed, PsycInfo and Cochrane of peer-reviewed journals until July 2019. The search was based on terms related to specific primary diagnoses (e.g., schizophrenia, mood disorder, anxiety disorder, but also CMD and SMI), IPS and other vocational rehabilitation programmes and competitive employment (see online Supplementary materials 1). We found additional references through reference lists of identified studies and systematic reviews.
Study selection process
The included studies meet the following criteria:
Participant population
We included studies that investigated people who were diagnosed with any mental disorder, as determined by DSM-III to DSM-5 (American Psychiatric Association, Reference Bond, Becker, Drake and Vogler1997, 2000, 2013) or ICD 10–11 criteria (World Health Organization, 2016, 2019). Participants without mental disorders or at risk of developing mental health problems were excluded from the meta-analysis.
Study design
We included all randomised controlled trials that evaluated the effectiveness of IPS compared to at least one control condition in the meta-analysis.
Intervention
We included studies investigating a treatment arm comprised of IPS as a stand-alone intervention, not augmented with another active intervention, such as cognitive remediation or social skills training, confirmed by an IPS fidelity assessment, receiving at least ‘fair’ fidelity. For studies investigating the effectiveness of both IPS and IPS augmented with another intervention, we only included the IPS-only arm in the analyses and excluded the IPS augmented with another intervention- arm from the analysis.
Comparison
The control group could be any other vocational service or a passive control group (i.e., service as usual or waiting list).
Outcomes
The study reported competitive employment outcomes.
Two authors (LdW & CC) independently executed study selection, including both title and abstract screening and full-text screening. Disagreements of the full-text selection process were resolved by consensus.
Data extraction
We extracted study details, participant characteristics, treatment variable, outcomes and study design data from all studies in this meta-analysis. Author LdW executed data extraction and discussed and resolved uncertainties with CC. Details about the data-extraction are presented in online Supplementary materials 2.
Data synthesis
Assessment of outcomes
The included studies reported a variety of competitive employment outcomes. Therefore, we focused on three outcome measures of competitive employment that were reported by at least ten studies, the minimum number to provide outcomes with sufficient statistical power in meta-analyses (Jackson and Turner, Reference Jackson and Turner2017; Borenstein et al., Reference Borenstein, Hedges, Higgins and Rothstein2021): (1) Competitive employment rate (i.e., the proportion of participants competitively employed for at least one day during the study period); (2) Job duration (i.e., days, weeks or months competitive employed during the study period); (3) Wages (i.e., total earnings from competitive employment during the study period).
Assessment of study design and region
The included studies also differed in study design, which might affect outcomes: studies compared IPS with a variety of control groups and outcomes were analysed over different follow-up periods. Furthermore, previous research also indicated regional differences (i.e., European versus non-European studies) in the effectiveness of IPS (Drake et al., Reference Drake, Becker and Bond2019). Therefore, we analysed the study outcomes within specific subgroups based on these three confounding factors as follows: (1) type of control group: studies with an active control group encompassing treatment as usual combined with any other vocational services versus studies with a passive control group with treatment as usual and no primary focus on improvement of vocational functioning; (2) follow-up duration: an assessment period of 12 months or less versus more than 12 months; (3) Region: European studies versus non-European studies.
Assessment of moderators of outcomes
In order to answer our two research questions, we selected moderators of study outcomes from the included studies. The selection was based on the identification of relevant moderators analysed in previous studies and the availability of extractable raw data of these moderators in at least ten of our included studies (Borenstein et al., Reference Borenstein, Hedges, Higgins and Rothstein2021).
Assessment of diagnostic subgroups . For this study, we assessed subgroups of SMI or CMD based on diagnosis, duration of illness and inclusion criteria of studies (see Table 1). These three criteria were partly based on previous literature (i.e., Steel et al., Reference Steel, Marnane, Iranpour, Chey, Jackson, Patel and Silove2014; De Vries et al., Reference De Vries, Ten Have, van Dorsselaer, van Wezep, Hermans and de Graaf2016). However, due to the lack of a consistent definition of CMD, we pragmatically translated these criteria based on the availability of data in the included studies. If studies met none of the three criteria we labelled them as ‘unclear’ and did not include these studies in the analysis. We were also able to include SSD and major depressive disorder as separate moderators. We divided these moderators into subgroups of studies in which the majority (i.e., >50%) of the study sample was diagnosed with the specific diagnosis and subgroups of studies in which the minority was diagnosed with the specific disorder (see Table 1).
a BPRS, Brief Psychiatric Rating Scale; CAPS-IV, Administered PTSD Scale for DSM-IV; DSM-IV or −5, Diagnostic and Statistical Manual 4 or 5; DTS, Davidson Trauma Scale; GAF, Global Assessment of Functioning; GAS, Global Assessment Scale; HADS, Hamilton Anxiety and Depression Scale; HDRS, Hamilton Depression Rating Scale; ICD-10, International Classification of Diseases – 10; MADRS, Montgomery-Åsberg Depression Self Rating Scale; MHI-5, Mental Health Inventory – 5; MINI, Mini-International Neuropsychiatric Interview; PANSS, Positive and Negative Symptom Scale; PCL-5, PTSD checklist for DSM-5; SANS, Scale for the Assessment of Negative Symptoms; SAS-II, Simpson Angus Scale – II; SCAN, Structured Clinical Assessment in Neuropsychiatry; SCID, The Structured Clinical Interview for DSM-5; SOFAS, Social and Occupational Functioning Assessment Scale; SF-12, Short Form Health Survey 12; UPSA, UCSD performance-based skills assessment; WHO-DAS 2.0, World Health Organization Disability Assessment Scale 2.0.
Clinical, functional and personal characteristics . We identified eight other moderators of outcomes: duration of illness, the severity of symptoms, level of functioning, age, comorbid alcohol and substance use, education level and employment history. We assessed these moderators at baseline and operationalised those into subgroups. Subgroups were generally assessed based on the available data in the included studies in this meta-analysis, in order to achieve equally distributed subgroups. Criteria for the operationalisation into subgroups of each moderator are described in Table 1.
Risk of bias assessment
We assessed the risk of bias for each study through the Cochrane Collaboration risk of the bias assessment tool (Higgins and Green, Reference Higgins and Green2008). Potential bias (i.e., high, low or unclear) is assessed as a judgment for individual elements from five domains (selection, performance, attrition, reporting and other bias). Author LdW rated all studies and CC independently rated the risk of bias of 50% of all studies. The inter-rater reliability (Cohen's kappa; McHugh, Reference McHugh2012) was substantial (κ = 0.61; Landis and Koch, Reference Landis and Koch1977) and disagreements were resolved through consensus.
Statistical analysis
Meta-analytic procedure
Meta-analyses were conducted using RevMan 5.3 (The Nordic Cochrane Centre, 2014). We assessed the effectiveness of IPS by analysing differences between IPS and the control group over the study period by calculating the standardised mean difference (d) for continuous outcomes (i.e., job duration and wages) and the odds ratio (OR) for categorical outcomes (i.e., employment rate). For studies reporting multiple outcome assessments for related outcome measures, we pooled the effect sizes into an overall effect size. We used random-effects models, weighted by the method of inverse variance (Higgins and Green, Reference Higgins and Green2008). The magnitude of effect sizes was assessed based on the criteria described by Chinn (Reference Chinn2000). Statistical heterogeneity was assessed by calculating the I2 statistic (Higgins and Thompson, Reference Higgins and Thompson2002). We performed the overall meta-analysis within separate subgroups based on the type of control group, follow-up duration and geographical region. We controlled for the potential influence of these factors using an analysis of subgroup differences (Borenstein and Higgins, Reference Borenstein and Higgins2013). One study had both an active and passive control group (Mueser et al., Reference Mueser, Clark, Haines, Drake, McHugo, Bond, Essock, Becker, Wolfe and Swain2004), and multiple studies both followed service users after ⩽ 12 months and >12 months of follow-up. For the overall meta-analysis we pooled the effect sizes for all control groups or follow-up assessments within the study into one overall effect size, but we analysed both effect sizes separately during the analysis of subgroup differences, controlling for methodological confounders.
Calculating moderating effects
We analysed moderating effects through a sensitivity analysis of subgroup differences (Borenstein and Higgins, Reference Borenstein and Higgins2013), in which we compared subgroup outcomes with high levels or presence of the moderator versus those with low levels or absence of the moderator (see Table 1). Furthermore, the positive or negative influence of specific subgroups on employment outcomes was assessed by investigating which subgroups' confidence intervals of treatment effect exceeded the upper (‘positive’ influence) or lower (‘negative’ influence) bound of the confidence interval of the overall effect size of treatment effect.
Outliers and publication bias
We addressed the potential influence of outliers (i.e., if the confidence interval [CI] of an individual study outcome exceeded the CI of the overall effect size) by comparing the overall effect size of the outcome, including the outliers, with the overall effect size when outliers are removed through an analysis of subgroup differences. Potential publication bias was detected by visual inspection of funnel plots.
Results
Study flow
Of the 1333 records retrieved through database search and reference tracking, 1170 records were excluded after the title and abstract screening. Of the remaining 163 reports, 115 reports were excluded after full-text selection (see Fig. 1 for reasons of exclusion). The remaining 48 reports reported the results of 32 studies.
Study characteristics
As shown in Table 2, the 32 studies included 3818 participants receiving IPS and 3847 participants receiving a control intervention. The mean age of the aggregated sample (n = 7665) was 38.9 years (study range: 20.4–51.0); 44.1% of the participants were female. A total of 3454 (45.1%) participants were diagnosed with a schizophrenia spectrum disorder (SSD), and 2587 (33.8%) had a main diagnosis of major depressive disorder (MDD). The remaining 1624 (21.2%) had other diagnoses, such as anxiety disorder, PTSD, SUD or personality disorders. Twenty studies met the criteria for SMI and five studies met the criteria for CMD.
a References of reports of included studies: A. Areberg and Bejerholm (Reference Areberg and Bejerholm2013); B. Bejerholm et al. (Reference Bejerholm, Areberg, Hofgren, Sandlund and Rinaldi2015); C. Bejerholm et al. (Reference Bejerholm, Larsson and Johanson2017); D. Bond et al. (Reference Bond, Salyers, Dincin, Drake, Becker, Fraser and Haines2007); E. Bond et al. (Reference Bond, Campbell and Becker2013); F. Bond et al. (Reference Bond, Kim, Becker, Swanson, Drake, Krzos, Fraser, O'Neill and Frounfelker2015); G. Burns et al. (Reference Burns, Catty, Becker, Drake, Fioritti, Knapp, Lauber, Rössler, Tomov, Van Busschbach, White and Wiersma2007); H. Burns and Cathy (Reference Burns and Catty2008); I. Kilian et al. (Reference Kilian, Lauber, Kalkan, Dorn, Rössler, Wiersma, Van Buschbach, Fioritti, Tomov, Catty, Burns and Becker2012); J. Drake et al. (Reference Drake, McHugo, Becker, Anthony and Clark1996); K. Christensen et al. (Reference Christensen, Wallstrøm, Stenager, Bojesen, Gluud, Nordentoft and Eplov2019); L. Clark et al. (Reference Clark, Xie, Becker and Drake1998); M. Davis et al. (Reference Davis, Leon, Toscano, Drebing, Ward, Parker, Kashner and Drake2012); N. Davis et al. (Reference Davis, Pilkinton, Poddar, Blansett, Toscano and Parker2014); O. Davis et al. (Reference Davis, Kyriakides, Suris, Ottomanelli, Drake, Parker, Mueller, Resnick, Toscano, Blansett, McCall and Huang2018a); P. Davis et al. (Reference Davis, Kyriakides, Suris, Ottomanelli, Mueller, Parker, Resnick, Toscano, Scrymgeour and Drake2018b); Q. Drake et al. (Reference Drake, McHugo, Bebout, Becker, Harris, Bond and Quimby1999); R. Dixon et al. (Reference Dixon, Hoch, Clark, Bebout, Drake, McHugo and Becker2002); S. Drake et al. (Reference Drake, Frey, Bond, Goldman, Salkever, Miller, Moore, Riley, Karakus and Milfort2013); T. Metcalfe et al. (Reference Metcalfe, Riley, McGurk, Hale, Drake and Bond2018b); U. Erickson et al. (Reference Erickson, Roes, DiGiacomo and Burns2021); V. Gold et al. (Reference Gold, Meisler, Santos, Carnemolla, Williams and Keleher2006); W. Hellström et al. (Reference Hellström, Bech, Hjorthøj, Nordentoft, Lindschou and Eplov2017); X. Howard et al. (Reference Howard, Heslin, Leese, McCrone, Rice, Jarrett, Spokes, Huxley and Thornicroft2010); Y. Heslin et al. (Reference Heslin, Howard, Leese, McCrone, Rice, Jarrett, Spokes, Huxley and Thornicroft2011); Z. Hoffmann et al. (Reference Hoffmann, Jäckel, Glauser and Kupper2012); AA. Hoffmann et al. (Reference Hoffmann, Jäckel, Glauser, Mueser and Kupper2014); AB. Jäckel et al. (Reference Jäckel, Kupper, Glauser, Mueser and Hoffmann2017); AC. Killackey et al. (Reference Killackey, Jackson and McGorry2008); AD. Killackey et al. (Reference Killackey, Allott, Jackson, Scutella, Tseng, Borland, Proffitt, Hunt, Kay-Lambkin, Chinnery, Baksheev, Alvarez-Jimenez, McGorry and Cotton2019); AE. Latimer et al. (Reference Latimer, Lecomte, Becker, Drake, Duclos, Piat, Lahaie, St-Pierre, Therrien and Xie2006); AF. Lehman et al. (Reference Lehman, Goldberg, Dixon, McNary, Postrado, Hackman and McDonnell2002); AG. Lones et al. (Reference Lones, Bond, McGovern, Carr, Leckron-Myers, Hartnett and Becker2017); AH. Michon et al. (Reference Michon, van Busschbach, Stant, Van Vugt, van Weeghel and Kroon2014); AI. Mueser et al. (Reference Mueser, Becker and Wolfe2001); AJ. Mueser et al. (Reference Mueser, Clark, Haines, Drake, McHugo, Bond, Essock, Becker, Wolfe and Swain2004); AK. Mueser et al. (Reference Mueser, Bond, Essock, Clark, Carpenter-Song, Drake and Wolfe2014); AL. Oshima et al. (Reference Oshima, Sono, Bond, Nishio and Ito2014); AM. Poremski et al. (Reference Poremski, Rabouin and Latimer2017); AN. Reme et al. (Reference Reme, Monstad, Fyhn, Sveinsdottir, Løvvik, Lie and Øverland2019); AO. Tsang et al. (Reference Tsang, Chan, Wong and Liberman2009); AP. Tsang et al. (Reference Tsang2011); AQ. Twamley et al. (Reference Twamley, Vella, Burton, Becker, Bell and Jeste2012); AR. Viering et al. (Reference Viering, Jäger, Bärtsch, Nordt, Rössler, Warnke and Kawohl2015); AS. Waghorn et al. (Reference Waghorn, Dias, Gladman, Harris and Saha2014); AT. Wong et al. (Reference Wong, Chiu, Tang, Mak, Liu and Chiu2008); AU. Zhang et al. (Reference Zhang, Tsui, Lu, Yu, Tsang and Li2017)
b A, active control group; E, excellent fidelity; F, Fair fidelity; G, good fidelity; NR, Not Reported; P, Passive control group;.
c IPS-15 item scale (Bond et al., Reference Bond, Becker, Drake and Vogler1997): item scale range: 15–75; Fidelity ratings: <55 = No IPS; 56–65 = Fair fidelity (F); >65 = Good fidelity (G); IPS-25 item scale (Bond, Peterson, Becker and Drake, Reference Bond, Peterson, Becker and Drake2012): item scale range: 25–125; Fidelity ratings: <74 = No IPS; 74–99 = Fair fidelity (F); 100–114 = Good fidelity (G); 115–125 = Exemplary fidelity (E).
Twenty-one studies compared IPS with an active control group and 12 studies compared IPS with a passive control group (including one study with both a passive and active control group). The overall study attrition rate (i.e., lost to follow-up) was 16.3% and only two studies reached a ‘high’ attrition rate exceeding 40%. Instruments for fidelity assessment differed between studies (see Table 2), but the majority of the studies (75.0%) achieved at least ‘good’ IPS programme fidelity.
Quality assessment
Quality assessment is reported in Fig. 2. Overall we found low levels of selection and attrition bias, but relatively higher levels of performance and detection bias. The majority of studies (81.3%) reported a low risk of selection bias (i.e., random sequence generation and allocation concealment). In the majority of studies (53.1%) the participants and personnel were not blinded or information about blinding was unclear (43.4%). However, given the nature of the intervention and the study design, it was generally not feasible to achieve proper blinding of participants, so a certain level of performance bias was inevitable. In nine studies (29.0%) the outcome assessors were not blinded, indicating a high risk of detection bias. This is a relatively large number of studies with a high risk of detection bias, compared with the other risk of bias domains. However, the outcomes we used in our meta-analysis (i.e., employment rate, job duration and wages) are objective outcome measures and not sensitive for the interpretation of the outcome assessor. Therefore, this might not have a large influence on the study outcomes. In most studies (67.7%) we found a low risk of attrition bias (i.e., incomplete outcome data). Five studies reported other sources of bias: three studies reported baseline differences between IPS and the control group that potentially influenced outcomes and one study indicated the potential influence of allegiance bias because specialists favoured one intervention over the other and one study had a low fidelity score during the first part of the study which may have negatively influenced study outcomes at the start of the study. There were no indications of selective outcome reporting in any of the 32 studies.
Overall meta-analysis
Thirty-one studies reported employment rate outcomes (see Table 3). A higher percentage of IPS participants (48.8%) than control group participants (28.3%) were employed during follow-up, showing small effect sizes (OR = 2.62 [2.37–2.89], p < 0.01). Outcomes were moderately heterogeneous (I2 = 74% [67–80%]; p < 0.01). The overall effect sizes of the employment rate were not influenced by the follow-up duration. However, we did find more favourable employment rate outcomes for IPS in non-European studies compared with European studies (χ2 = 10.54; p < 0.01) and in studies that compared IPS with an active control group compared with a passive control group (χ2 = 10.77; p < 0.01).
a Some studies have used multiple follow-up assessments or have multiple treatment arms. Therefore, some studies are included in the analysis of both follow-up subgroups and one study compared IPS with both an active and passive control group. Therefore, the total amount of studies and sample sizes analysed in each comparison is sometimes lower than the sum of studies analysed in both follow-up subgroups.
b Summary statistics for each of the three employment outcomes are assessed as follows: Employment rate: number and percentage of people in competitive employment at the follow-up assessment; Job duration: percentage of time within the study period that participants are employed; Wages: monthly salary in euros during the study period.
c d > 0 and OR > 1 indicates outcomes are beneficial for IPS compared to the control group; d < 0 and OR < 1 indicates outcomes are beneficial for the control group compared to IPS.
d Magnitude of effect (Chinn, Reference Chinn2000): Not clinically relevant [N]: d > −0.2 – <0.2; OR > 0.67 – <1.5; Small effect [S]: d ⩽ −0.20 and >−0.50 – ⩾0.20 and <0.50; OR ⩽ 0.67 and >0.29 – ⩾1.5 and <3.5; Medium effect [M]: d ⩽ −0.50 and >−0.80 – ⩾0.50 and <0.80; OR ⩽ 0.29 and >0.20 – ⩾3.5 and <5; Large effect [L]: d < −0.80 – >0.80; OR < 0.20 – >5.
Twenty-three studies reported job duration outcomes. Results indicated that IPS participants were longer employed than those in the control group during follow-up, showing small effect sizes (d = 0.41 [0.30–0.52], p < 0.01). Outcomes were moderately heterogeneous (I2 = 77% [69–83%]; p < 0.01). The overall effect sizes of job duration were not influenced by the type of control group, follow-up duration or region.
Fifteen studies reported outcomes of wages. Results indicated that IPS participants earned more wages during the study period than those in the control group, though effect sizes were small (d = 0.31 [0.19–0.44], p < 0.01). Outcomes were moderately heterogeneous (I2 = 65% [51–76%]; p < 0.01). The overall effect sizes of wages were not influenced by the type of control group, follow-up duration or region.
Moderating effects on overall outcomes
Sensitivity analysis outcomes were reported in Table 4 and Fig. 3. We excluded some moderators in the sensitivity analysis of job duration and wages, because these moderators were reported in less than ten studies.
a d > 0 and OR > 1 indicates outcomes are beneficial for IPS compared to the control group; d < 0 and OR < 1 indicates outcomes are beneficial for the control group compared to IPS.
b Underlined moderators were significant moderators of outcome.
c Magnitude of effect: Not clinically relevant [N]: d > −0.2 – <0.2; OR > 0.67 – <1.5; Small effect [S]: d ⩽ −0.20 and >−0.50 – ⩾0.20 and <0.50; OR ⩽ 0.67 and >0.29 – ⩾1.5 and <3.5; Medium effect [M]: d ⩽ −0.50 and >−0.80 – ⩾0.50 and <0.80; OR ⩽ 0.29 and >0.20 – ⩾3.5 and <5; Large effect [L]: d < −0.80 – >0.80; OR < 0.20 – >5.
d Summary statistics for each of the three employment outcomes are assessed as follows: Employment rate: number and percentage of people in competitive employment at the follow-up assessment; Job duration: percentage of time within the study period employed that participants are employed; Wages: monthly salary in euros during the study period.
We found significant favourable employment rate outcomes in the IPS group compared with the control group in all subgroups. However, IPS showed more favourable outcomes in studies targeting participants with SMI than studies targeting CMD (χ 2 = 10.79; df = 1; p < 0.01). These differences between both subgroups were specifically explained by differences in employment rates in the control group (i.e., 38.3% in the CMD subgroup versus 23.9% in the SMI subgroup; χ 2 = 28.84; df = 1; p < 0.01). We also found more favourable outcomes for IPS in subgroups with a majority diagnosed with SSD (χ 2 = 18.24; df = 1; p < 0.01), as well as in subgroups with a minority diagnosed with MDD (χ 2 = 5.36; df = 1; p < 0.05), and in subgroups with a lower baseline level of symptoms (χ 2 = 20.48; df = 1; p < 0.01).
Figure 3 shows all subgroup outcomes. Subgroups with SMI, the majority diagnosed with SSD, low symptom severity, and low comorbid alcohol and substance use problems at baseline had a positive influence on the relative effectiveness of IPS.
None of the potential moderators included in the sensitivity analysis had significant effects on either job duration and wages.
As the type of control group and the region in which the study is executed significantly influenced employment rate outcomes, above-mentioned moderating effects might be explained by an overrepresentation of a specific moderator in one of the subgroups based on the region or type of control group. However, chi-square analyses did not find any indications of overrepresentation in any of these subgroups. We could therefore not explain any moderating effects by regional differences or type of control group.
Assessment of outliers and publication bias
We found two negative outliers and six positive outliers for employment rate, three negative outliers and three positive outliers for job duration and one negative and one positive outlier for wages. Removing these outliers did not positively or negatively influence the study outcomes.
The funnel plots are presented in online Supplementary materials 3. For all outcomes (employment rate, job duration and wages) we found no indications of publication bias.
Discussion
This meta-analysis investigated the relative effectiveness of IPS for different subgroups based on diagnostic, clinical, functional and personal characteristics. Overall, we found that IPS is effective in improving employment outcomes regardless of sample characteristics. However, we did find that IPS was relatively less effective in supporting service users into competitive employment in European studies and in studies comparing IPS to a passive control group. Furthermore, we found that IPS was relatively more effective for people with SMIs, compared with CMD. We also found more favourable outcomes of IPS in subgroups in which the majority was diagnosed with a schizophrenia spectrum disorder (SSD), in which the minority was diagnosed with major depressive disorder (MDD), subgroups with a low baseline symptom severity, and subgroups with a low baseline level of substance and alcohol use problems. These subgroup effects could not be explained by an overrepresentation of non-European studies or an active control group within any subgroup. Despite the fact that we found overall effectiveness of IPS for all subgroups, the issue remains that in many studies the majority of service users that received IPS remain unemployed. This highlights the need for continuous refinement of the IPS model.
The fact that IPS was less effective on employment rate outcomes in European studies is most probably explained by the relatively extensive welfare systems with a disability benefit structure in most European countries. The risk of losing steady income from disability benefits after finding competitive employment (i.e., the ‘benefits trap’) might discourage service users from seeking employment (Burns and Cathy, Reference Burns and Catty2008; Metcalfe et al., Reference Metcalfe, Drake and Bond2018a).
We also found that IPS was relatively more effective on employment rate outcomes when compared with an active control group than when compared with a passive control group. We found a slightly larger employment rate in the IPS group (i.e., 50.1% vs 48.3%) but a slightly smaller employment rate in the control group (i.e., 26.7% vs. 29.4%) when IPS was compared with an active control group. However, differences in both IPS and control groups were negligible and therefore we could not give any clinical meaningful explanations for the differences between both types of the control groups.
Our findings that IPS is relatively more effective for people with SMI and SSD and relatively less effective for people with MDD and CMD are in line with previous research (Hellström et al., Reference Hellström, Pedersen, Christensen, Wallstroem, Bojesen, Stenager, Bejerholm, Van Busschbach, Michon and Eplov2021). The main explanation for differences in the effectiveness of IPS between CMD and SMI subgroups is that employment rate outcomes in the control group were larger in the CMD subgroup, whereas the outcomes were equal in the IPS group. Previous research found even more favourable employment outcomes in the control group for people with mood disorders or less severe thought disorders (Campbell et al., Reference Campbell, Bond, Drake, McHugo and Xie2010; Jonsdottir and Waghorn, Reference Jonsdottir and Waghorn2015). This may indicate that people with CMD also benefit from other vocational rehabilitation interventions. Nevertheless, our meta-analysis indicates that IPS leads to more favourable employment outcomes for people with CMD compared to any control group, and these indications of effectiveness were also found in another recently published meta-analysis (Probyn et al., Reference Probyn, Engedahl, Rajendran, Pincus, Naeem, Mistry, Underwood and Froud2021).
Another possible explanation for the differences between CMD and SMI subgroups is the fact that IPS is originally developed for people with SMI who are generally supported by professionals working in integrated treatment teams, whereas service users with CMD are often supported in different healthcare settings. Previous research indicated that the level of organisational characteristics, such as the type of clinical practice, service intensity and quality of mental health treatment could be an important prerequisite for successful implementation (Lockett et al., Reference Lockett, Waghorn and Kydd2018). This explanation is supported by two studies that conducted IPS for service users with CMD (Hellström et al., Reference Hellström, Bech, Hjorthøj, Nordentoft, Lindschou and Eplov2017; Poremski et al., Reference Poremski, Rabouin and Latimer2017) in another healthcare setting for this group. These differences can also partially be explained by the fidelity scores. Only fifty per cent of all studies that evaluated the effectiveness of IPS for people with CMD reached a fair fidelity. In contrast, 89% of the studies that evaluated the effectiveness of IPS for people with SMI achieved good or excellent fidelity. Given the fact that better fidelity scores lead to better outcomes in IPS (Bond et al., Reference Bond, Peterson, Becker and Drake2012; Kim et al., Reference Kim, Bond, Becker, Swanson and Langfitt-Reese2015; Lockett et al., Reference Lockett, Waghorn, Kydd and Chant2016; De Winter et al., Reference De Winter, Couwenbergh, van Weeghel, Bergmans and Bond2020), this might be an important explanation for the differences in outcomes between SMI and CMD subgroups. In addition to fidelity, other important factors, such as the quality of healthcare services and the intensity of employment support might also be relevant topics for further investigation. Therefore, poorer outcomes for people with CMD compared to people SMI might partially be explained by specific challenges in the implementation of IPS in a different healthcare setting, which underlines our recommendation to adapt implementation for specific subgroups.
We also found more favourable indications of the effectiveness of IPS for people with lower symptom severity and lower comorbid substance and alcohol use problems at the start of IPS. This is in line with previous studies which also indicated that lower symptom severity increased the odds of being employed for people who received any type of vocational rehabilitation (Michon et al., Reference Michon, van Weeghel, Kroon and Schene2005; Campbell et al., Reference Campbell, Bond, Drake, McHugo and Xie2010; Nygren et al., Reference Nygren, Markström, Bernspång, Svensson, Hansson and Sandlund2013). This might be explained by the fact that a lower symptom severity frees up more time for and focus on the adequate job support, because less focus on symptom stabilisation and intensive treatment programmes is needed.
The positive influence of low symptom severity on the effectiveness of IPS may contradict the superior outcomes of IPS for people with SMI compared with CMD, as SMIs are generally associated with higher symptom severity. However, four out of the five studies (80%) that reported outcomes of IPS for people with CMD had a high symptom severity at baseline. Therefore, symptom severity and the severity of illness are not interrelated in this meta-analysis.
This meta-analysis had several limitations. First, all our findings were analysed on a study level and subgroups were based on aggregated scores or percentages of the whole study sample. This analysis provides an overarching overview of the influence of specific service users' characteristics on the effectiveness of IPS, but does not reflect on the specific variability of individual client level characteristics or outcomes. However, despite this limitation, this meta-analysis gives valuable insights toward better understanding of making effective adaptations in the implementation of IPS in real-world settings. Analysis of outcomes on a study level inevitably leads to heterogeneity of outcomes because the context and setting in which the studies are executed differ (Ioannidis, Reference Ioannidis2008). Furthermore, this meta-analysis only focused on the effectiveness of IPS as a stand-alone intervention within a mental health population, in order to achieve a relatively homogeneous sample of studies. As a consequence, we did not include a number of relevant studies that exclusively investigating IPS with an add-on intervention (e.g., McGurk et al., Reference McGurk, Mueser, Xie, Welsh, Kaiser, Drake, Becker, Bailey, Fraser, Wolfe and McHugo2015; Tsang et al., Reference Tsang, Bell, Cheung, Tam and Yeung2016) or studies focused on populations with a high risk of developing mental disorders (e.g., Sveinsdottir et al., Reference Sveinsdottir, Lie, Bond, Eriksen, Tveito, Grasdal and Reme2019). This also partially explained the lack of available study data to investigate the influence of other relevant moderators (such as cognitive functioning). Second, some of our sensitivity analyses, in which we investigated moderating effects, were based on a relatively low number of studies. This might have limited the generalisability of outcomes for subgroups based on a small number of studies. Another limitation is that 12 (37.5%) of our included studies are conducted more than 10 years ago. During that time IPS was executed in a societal setting, that was applicable at that time, with most probably other welfare policies or treatment practices than implemented nowadays. Another potential limitation is the fact that the broad variety of studies might influence the interpretation and representativeness of some moderating effects. Our included studies investigated target groups with different diagnoses and clinical characteristics and were therefore in some cases using different assessment instruments. This was specifically the case in the moderating effects of the severity of symptoms and level of functioning. We tried to solve this issue by using normative data based on representative target groups that matches with each included study. However, this inevitably leads to heterogeneity in the outcomes. Interpretation of the findings on the influence of symptom severity and level of functioning on employment should therefore be handled with caution. Finally, we have executed a relatively high number of sensitivity analyses based on a relatively low number of studies. This increases the chance of false-positive outcomes and alpha inflation (Wang et al., Reference Wang, Sparks, Gonzales, Hess and Ledgerwood2017). We should therefore consider the results of this meta-analysis as exploratory and the findings suggesting potentially valuable trends for improving IPS for different target groups.
Overall this meta-analysis has shown that IPS is implemented for a wide variety of service users: IPS is effective for different subgroups, regardless of distinct diagnostic, clinical, functional and personal characteristics. However, future research should focus on the implementation of IPS for people with CMD and higher symptom severity. It is important to investigate whether, and if so, how to make more effective adaptations in the implementation of IPS to better meet the vocational needs of these groups.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S2045796022000300
Availability of data and materials
The majority of relevant data and materials are presented in the tables and online Supplementary materials, as well as partially available in the review protocol that is submitted in PROSPERO (CRD42020220080). All other raw data is not available online. Further questions and requests about the availability of the data could be sent to the corresponding authors (Lars de Winter; Lwinter@kcphrenos.nl).
Conflict of interest
All authors gave final approval to the submitted manuscript and contributed to the draft, conception and design of the manuscript. Furthermore, all authors report no competing interests or financial relationships with commercial interests.