This research is aimed at studying the psychometric properties of the Spanish Outcome Rating Scale (ORS) (Harris, Reference Harris2010; Miller, Duncan, Sparks, & Claud, Reference Miller, Duncan, Sparks and Claud2003) in a Spanish clinical sample. The ORS is an instrument that assesses well-being, designed for routine outcome monitoring (ROM) in psychotherapy (Boswell, Kraus, Miller, & Lambert, Reference Boswell, Kraus, Miller and Lambert2015). It was developed as an ultra-brief alternative to the Outcome Questionnaire 45.2 (OQ–45.2) (Lambert et al., Reference Lambert, Hansen, Umphress, Lunnen, Okiishi, Burlingame and Reisigner1996). The OQ–45.2 is constituted by three scales: Individual, relational and social functioning, corresponding to the three main areas that psychotherapy improves as valid indicators of treatment outcome. These areas were incorporated into the ORS as three visual analogue scales to assess individual, relational and social well-being. A fourth item was added, general well-being.
Several studies have been conducted to study the psychometric properties of the ORS in different countries and cultures, been translated into several languages, such as Slovak (Biescad & Timulak, Reference Biescad and Timulak2014), Dutch (Hafkenscheid, Duncan, & Miller, Reference Hafkenscheid, Duncan and Miller2010; Janse, Boezen-hilberdink, van Dijk, Verbraak, & Hutschemaekers, Reference Janse, Boezen-Hilberdink, van Dijk, Verbraak and Hutschemaekers2014), and Spanish (Harris, Reference Harris2010; Donoso & Grez, Reference Donoso and Grez2006). The ORS has also been applied to several kind of populations, like students (Bringhurst, Watson, Miller, & Duncan, Reference Bringhurst, Watson, Miller and Duncan2006), primary-care users (DeSantis, Jackson, Duncan, & Reese, Reference DeSantis, Jackson, Duncan and Reese2017), rural clinical samples (Campbell & Hemsley, Reference Campbell and Hemsley2009), and in different formats, e.g. sign languague (Munro & Rodwell, Reference Munro and Rodwell2009). To date, all these studies of the ORS in English and other languages document similar psychometric properties.
Furthermore, the ORS has been utilized in different studies, mainly in those aimed at demonstrating the clinical usefulness, efficacy and effectiveness of feedback informed treatments and ROM (Anker, Duncan, & Sparks, Reference Anker, Duncan and Sparks2009; Anker, Owen, Duncan, & Sparks, Reference Anker, Owen, Duncan and Sparks2010; Miller, Duncan, & Sorrell, Reference Miller, Duncan and Sorrell2006; Overington & Ionita, Reference Overington and Ionita2012; Reese, Norsworthy, & Rowlands, Reference Reese, Norsworthy and Rowlands2009; Reese, Usher, et al., Reference Reese, Norsworthy and Rowlands2009).
Nowadays in Spain we count on just one instrument standardized and validated to assess psychotherapeutic outcome in routine care, the Clinical Outcomes in Routine Evaluation – Outcome Measure (CORE-OM) (Feixas et al., Reference Feixas, Evans, Trujillo, Saúl, Botella, Corbella and López-González2012; Trujillo et al., Reference Trujillo, Feixas, Bados, García-Grau, Salla, Medina and Evans2016). To count on new instruments aimed at providing clinicians with varied tools to evaluate systematically psychotherapeutic outcome, is required to guarantee the effectiveness of treatments imparted. Responding to these needs, this study aims to explore the psychometric properties of the ORS in a Spanish clinical sample, comparing them with the properties of others instruments used in mental health settings already validated in Spain.
Method
Participants and procedures
One-hundred and sixty-five adult participants from different primary care centers of the city of Barcelona were recruited. These participants received psychotherapy in the context of an internship agreement between the University of Barcelona and the Catalan government, earmarked for students of the Master of Cognitive Social Therapy. The latter is a three-year training program that prepares students to do psychotherapy following the models of cognitive-constructivist psychotherapies and systemic family therapy. Participants were referred by their primary care physicians. All participants received an initial assessment before the start of the treatment. Exclusion criteria were to be received by another psychological treatment at the moment of intake, presence of psychotic symptoms, maniac or hypomanic episodes, or suicidal ideation.
The treatment offered is individual psychotherapy of a maximum of sixteen sessions one or two weeks spaced. All therapists were from the third year of the master’s degree and were supervised by experienced teachers and senior therapists.
The approval to conduct this study was given by the primary care centers where the study was conducted. All participants referred were asked to give informed consent for the use of the information from the assessment of their treatment progress and for research. Receiving psychological treatment was not conditional on such consent.
Instruments and measures
Outcome Rating Scale (ORS)
Consists of a four-item scale aimed at measuring four areas of client functioning: Individual, interpersonal, social, and general well-being. The items are answered using a ten-centimeter visual analog scale (VAS) ranging from negative at the left pole, to positive at the right pole. To answer each item, the client has to make a vertical mark on the VAS. Using a ruler each mark is measured to obtain the item score. All scores are summed up to calculate the total score that can range from 0 to 40. High scores represent a good level of functioning and well-being; on the contrary, low scores represent a bad level of functioning and distress.
The Spanish version used for this study is the version translated by Rafael S. Harris, Jr., obtained through the website of Scott D. Miller on 2012 (Harris, Reference Harris2010).
Clinical Outcomes in Routine Evaluation-Outcome Measure (CORE-OM) (Core System Group, 1998)
It is an outcome questionnaire that assesses four factors distributed in thirty-four items: Well-being (four items), problems or symptoms (twelve items), general functioning (twelve items) and risk of self-harm and harm to others (six items). There are multiple versions of the instrument in different languages, developed for men and women. Shorter versions also exist (eighteen, ten or five items) for each gender (Feixas et al., Reference Feixas, Evans, Trujillo, Saúl, Botella, Corbella and López-González2012). The questionnaire is self-reported, the client has to answer each item on a five-option Likert scale that goes from never to most or all the time. The direct score varies from 0 to 136. To estimate the final score, the direct score is divided by the total number of items, obtaining a score that goes from 0 to 4. The same procedure is applied on the subscales to obtain the score for each factor. A total score without the risk scale can also be calculated (Evans et al., Reference Evans, Connell, Barkham, Margison, McGrath, Mellor-Clark and Audin2002). High scores represent high levels of psychological distress. The original English version and the Spanish version have good psychometric properties (Trujillo et al., Reference Trujillo, Feixas, Bados, García-Grau, Salla, Medina and Evans2016). In this study, for the initial assessment, the version of thirty-four items was applied, while for monitoring during psychotherapy sessions, the short version of eighteen items was applied (Short Form B, CORE-SFB).
Beck Depression Inventory–II (BDI–II) (Beck, Steer, & Brown, Reference Beck, Steer and Brown1996; Sanz, Navarro, & Vázquez, Reference Sanz, Navarro and Vázquez2003; Sanz, Perdigón, & Vázquez, Reference Sanz, Navarro and Vázquez2003)
Self-reported questionnaire composed of twenty-one items that have to be answered on a Likert scale. The items have three answer options, except for the items sixteen and eighteen that have seven. The BDI–II assess two factors related to the DSM–IV diagnostic criteria for dysthymia and major depressive disorder: Somatic or motivational factor, and cognitive or affective factor. The total score goes from 0 to 63, and is calculated summing all items. According to the distribution of the scores in clinical samples, four groups of severity can be obtained: Minimal depression (0–13), mild depression (14–19), moderated depression (20–28), and severe depression (29–63) (Sanz, Navarro, et al., Reference Sanz, Navarro and Vázquez2003; Sanz, Perdigón, et al., Reference Sanz, Navarro and Vázquez2003).
Depression, Anxiety and Stress Scale–21 (DASS–21) (Lovibond & Lovibond, Reference Lovibond and Lovibond1995a; Lovibond & Lovibond, Reference Lovibond and Lovibond1995b)
It is a shorter version of the original fourty-two-items DASS (Lovibond & Lovibond, Reference Lovibond and Lovibond1995b). DASS–21 is a self-reported questionnaire structured in three sub-scales: Depression, anxiety and stress. Each sub-scale is composed of seven items that have to be answered on a four-option Likert scale (from 0 to 3). The score of each scale is obtained by adding the answers of the items of the scale, and multiplying it by two. The score of each scale goes from 0 to 42. High scores in each scale represent high levels of depression, anxiety and stress, respectively.
Data analyses
Traditional analyses to study psychometric properties were performed: Internal consistency, test-retest reliability, convergent validity, and sensitivity to change.
Internal reliability was reported as Cronbach’s alpha considering the first administration (n = 147), and all sample administrations (n = 1,875) with no missing item data in both cases. Confidence intervals for Cronbach’s alpha for the first administration were computed through the method proposed by Feldt (Reference Feldt1965) (Feldt, Woodruff, & Salih, Reference Feldt, Woodruff and Salih1987).
Test-retest reliability was analyzed correlating the scores of one administration with the next, considering from the first to the fourth session. Convergent validity was analyzed between ORS and the other instruments at initial assessment and session by session. These two analysis were performed through nonparametric correlations (Spearman’s rho) due to scores did not conform to normal distribution according to normality tests.
Sensitivity to change was estimated through Wilcoxon signed-rank test, considering the first and the last session of therapy. A non-parametric hypothesis testing procedure was chosen because, while it is true that the scores distribution did not show statistically significant heteroscedasticity, it was not Gaussian. Bootstrapped 95% confidence intervals (CI) for the difference in means and the effect sizes (as Pearsons’s correlation coefficient r) were also computed. All the analysis were performed through the software IBM SPSS 24.
Results
Characteristics of the sample
This clinical sample (N = 165) was made up of 120 (72.7%) women, and 45 (27.3%) men. The age of the participants ranged from 18 to 81 years, with a mean of 43.57 (SD = 13.3), and they presented a variety of psychological problems (Table 1).
From the overall sample, two participants did not start the treatment after intake. One-hundred and sixty-three participants received at least one session of psychotherapy. The mean number of sessions was 12.2 (SD = 5.0). There were 62 therapist participants who saw a mean of 3 (SD = 1.5) clients each. The mean, standard deviation and confidence intervals for each scale and total score of the instruments administered at intake and first session of therapy are shown in Table 2.
Note: µ = mean; σ = standard deviation; CI = confidence interval.
Internal consistency
Cronbach’s alpha [95% CI] for the first administration (n = 147) and all administrations in the sample (n = 1,875), results in α = .91 [.88, .93] for the first case, and α = .96 for the second.
Test-retest reliability
It was estimated considering the scores at each administration from the first to the fourth session, correlating one score with the score at subsequent administration (Table 3).
Note: Spearman rho correlation.
Convergent validity
It was estimated through correlations between the instruments administered at intake and the ORS administered at the first session (Tables 4 and 5).
Note: Spearman rho correlation.
Note: Spearman rho correlation.
Furthermore, correlations between the ORS and CORE-SFB administered during all sessions were calculated (Table 6).
Notes: Spearman rho correlation.
All correlations reported were statistically significant (p < .05).
Sensitivity to change
Through Wilcoxon signed-rank test, the total scores of the ORS and CORE-SFB from the first and last session of therapy were compared. Total scores of the ORS at last session (Mdn = 31.0) were significantly higher than total scores at first session (Mdn = 19.6), z = –7.38, p < .05, r = .42. For CORE-SFB, total scores at first session (all items, Mdn = 1.55; non-risk items, Mdn = 1.73) were significantly higher than total scores at last session (all items, Mdn = 1.11; non-risk items, Mdn = 1.25), for all items, z = –4.94, p < .05, r = –.32, and for non-risk items, z = –5.04, p < .05, d = –.33.
Bootstrapped 95% CI for the difference in means and for the effect sizes were also considered. Both instruments revealed a statistically significant improvement of the participants, with medium effect sizes (Table 7).
Note: µ = mean; σ = standard deviation; CI = confidence interval.
Discussion
The present research studied the psychometric properties of the ORS in a Spanish clinical sample, being the first study of this instrument in Spain. Their properties in the sample were described, comparing them to the properties of other instruments already standardized in the country.
Analyzing our results, the total score of the ORS at first administration are similar to those found in other clinical samples at the intake (Anker et al., Reference Anker, Duncan and Sparks2009; Anker et al., Reference Anker, Owen, Duncan and Sparks2010; Biescad & Timulak, Reference Biescad and Timulak2014; Hafkenscheid et al., Reference Hafkenscheid, Duncan and Miller2010; Janse et al., Reference Janse, Boezen-Hilberdink, van Dijk, Verbraak and Hutschemaekers2014; Miller et al., Reference Miller, Duncan, Sparks and Claud2003; Reese, Norsworthy, et al., Reference Reese, Norsworthy and Rowlands2009). Total scores of the CORE-OM at intake are consistent with the scores of the clinical sample in the study of Trujillo et al. (Reference Trujillo, Feixas, Bados, García-Grau, Salla, Medina and Evans2016) at the same point administration (Non-risk items, M = 1.86 SD =.78 CI = .84, 1.05; All items, M = 1.62 SD = .71 CI = .75, .94).
In terms of internal consistency, the results show that is strong, with a high homogeneity and cohesion of their items. The latter is in line with the findings of other studies (Anker et al., Reference Anker, Owen, Duncan and Sparks2010; Bringhurst et al., Reference Bringhurst, Watson, Miller and Duncan2006; Hafkenscheid et al., Reference Hafkenscheid, Duncan and Miller2010; Janse et al., Reference Janse, Boezen-Hilberdink, van Dijk, Verbraak and Hutschemaekers2014; Miller et al., Reference Miller, Duncan, Sparks and Claud2003; Reese, Usher, et al., Reference Reese, Norsworthy and Rowlands2009), including previous Spanish translation (Donoso & Grez, Reference Donoso and Grez2006) and the phenomena observed with other ultra-brief scales (Boulet & Boss, Reference Boulet and Boss1991; Seidel, Andrews, Owen, Miller, & Buccino, Reference Seidel, Andrews, Owen, Miller and Buccino2017). Cronbach’s alpha coefficients are similar to those found in the studies of Campbell & Hemsley (Reference Campbell and Hemsley2009) (α = .90), and Donoso & Grez (Reference Donoso and Grez2006) (α ranged from .91 to .96); and higher than what was found by Anker et al. (Reference Anker, Duncan and Sparks2009) (α = .83), Biescad & Timulak (Reference Biescad and Timulak2014) (α = .87), and Reese, Norsworthy, et al. (2009) (α = .88; .84).
In relation to test-retest reliability, coefficients are adequate and higher than reported in previous studies, e.g. Janse et al. (Reference Janse, Boezen-Hilberdink, van Dijk, Verbraak and Hutschemaekers2014) (.64; .57; .69), but lower than observed in CORE-SFB. Test-retests correlations of the CORE-SFB are similar than those reported by Trujillo et al. (Reference Trujillo, Feixas, Bados, García-Grau, Salla, Medina and Evans2016) for CORE-OM. This could be due to shorter measures nearly always having lower correlations than much longer measures such as CORE. In this sense, the CORE might be capturing more stable aspects of people’s distress than ORS. However, in previous findings, ORS appears to be less sensitive than CORE considering recovered and improved clients as separate groups, but more sensitive than CORE when the whole group is considered (Biescad & Timulak, Reference Biescad and Timulak2014).
Regarding concurrent validity, the correlations between the total score of the ORS and the subscales of DASS–21 seem to be lower than what was reported by Campbell & Hemsley (Reference Campbell and Hemsley2009) (–.71; –.46; –.60). The same is found if we compare the correlation with BDI–II in the study of Biescad & Timulak (Reference Biescad and Timulak2014) (–.73). This could be due to answering a questionnaire that asks about self-assessed well-being through a VAS, it may be capturing different areas of mental health functioning, rather than traditional measures that focus on symptoms and discomfort. Nevertheless, correlations between ORS and CORE are strong, showing that both instruments are measuring similar aspects of the same underlying construct (well-being and/or psychological distress).
Concerning sensitivity to change, the ORS was able to capture the improvement of the participants, being the effect size of the ORS similar to the effect sizes of CORE-SFB. However, in the study of Biescad & Timulak (Reference Biescad and Timulak2014) the effect size of the ORS seems to be lower (d = .87) than the effect sizes of the CORE-OM (Non-risk items, d = .95; All items, d = .98).
The limitations of this study are that it only assesses the psychometric properties in a clinical sample, so no-comparison with non-clinical subjects could be done. In this sense, nonrandom sample frame was applied, and the sample size is small. Another limitation is that the version of the ORS applied was the version translated to Spanish by the team of the original authors, without contrasting if this translation applies to Peninsula Spanish speakers, so generalizability has to be done with caution. Furthermore, the method to assess test-retest reliability is not totally accurate, because during the interval of each administration, the effect of therapy or external factors might be expected to produce change. Duncan et al. (Reference Duncan, Miller, Sparks, Claud, Reynolds, Brown and Johnson2003) have argued that lower test-retest reliability can be obtained in instruments that are sensitive to change.
In summary, this article presents the first study of the ORS in Spain, showing that the instrument seems to be valid to assess well-being and psychotherapeutic outcome, being useful to obtain feedback about the progress of the client during the treatment. The ORS can be an instrument that clinicians can apply to monitor clients’ outcome and to prove the effectiveness of treatments imparted with Spanish speaking clients. Further research is required to adapt and standardize the ORS to Spanish population.