Hostname: page-component-cd9895bd7-lnqnp Total loading time: 0 Render date: 2024-12-25T19:33:59.959Z Has data issue: false hasContentIssue false

The application of Rasch measurement theory to psychiatric clinical outcomes research

Commentary on … Screening for depression in primary care

Published online by Cambridge University Press:  02 January 2018

Skye P. Barbic
Affiliation:
University of British Columbia, Vancouver, Canada
Stefan J. Cano*
Affiliation:
Modus Outcomes, Stotfold, UK
*
Correspondence to Stefan Cano (stefan.cano@modusoutcomes.com)
Rights & Permissions [Opens in a new window]

Summary

This commentary argues the importance of robust, meaningful assessment of clinical and functional outcomes in psychiatry. Outcome assessments should be fit for the purpose of measuring relevant concepts of interest in specific clinical settings. As well, the measurement model selected to develop and test assessments can be critical for guiding care. Three types of measurement models are presented: classical test theory, item response theory, and Rasch measurement theory. To optimise current diagnostic and treatment practices in psychiatry, careful consideration of these models is warranted.

Type
Original Papers
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an open-access article published by the Royal College of Psychiatrists and distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
Copyright © 2016 The Authors

Unlike many fields in medicine, most clinical outcomes in psychiatry are not directly observable and cannot be captured with diagnostic tests such as blood work or imaging. In recent years, the importance of the routine use of clinical outcome assessments (patient-reported outcomes, clinician-reported outcomes, observer-reported outcomes and performance outcomes) for measuring the symptoms of disease and treatment outcomes has been increasingly emphasised. 1 Clinical outcome assessments such as the Patient Health Questionnaire-9 (PHQ-9) Reference Kroenke, Spitzer and Williams2 are now commonly used in clinical research and practice to provide an assessment of a patient's severity of mood and improvement in response to treatment. Reference Thase3 More broadly, as the demand increases for a broad range of mental health services to be patient-centred, clinical outcome assessments are used to capture outcomes such as sustained symptom reduction, return to full functioning and optimal patient well-being. Reference Thornicroft and Slade4

To optimise mental healthcare, clinical outcome assessments used in psychiatry should be shown to be fit for purpose. They should appropriately capture the concept of interest (e.g. depression) in the context of use (e.g. patients attending primary care clinics reporting symptoms of depression). 1 They should also be underpinned by an appropriate measurement model, that is they should have evidence that the summed score of their individual items is ‘psychometrically sound’. 1 To this end, there are three main psychometric approaches based on three types of measurement model: classical test theory (CTT), Rasch measurement theory (RMT) and item response theory (IRT). Reference Cano and Hobart5

The current dominant paradigm in clinical outcomes research is CTT, the foundations of which were laid down by Charles Spearman at the turn of the twentieth century. Reference Spearman6 CTT is associated with the psychometric properties most commonly recognised and understood by clinicians (e.g. reliability, validity and ability to detect change). However, there are four important limitations Reference Hobart and Cano7 that prevent CTT methodology from fulfilling the requirements of scientific rigour demanded of high-stakes clinical decision-making: (a) measurements generated are ordinal rather than interval; (b) scores for persons and samples are scale dependent; (c) scale properties, such as reliability and validity, are sample dependent; (d) data can support group-level inferences but are not suitable for individual patient measurement.

Georg Rasch, a Danish mathematician, argued that the core requirement of social measurement should be the same as that in physical measurement, and developed the simple logistic model now known as the ‘Rasch model’. Reference Rasch8 In essence, RMT methods assess the extent to which observed clinical outcome assessment data (e.g. patient ratings on the items of the PHQ-9) ‘fit’ with predictions of those ratings from the Rasch model (which defines how a set of items should perform to generate reliable and valid measurements). Reference Rasch8 The difference between the expected and observed scores reveals the extent to which valid measurement is achieved. In turn, this gives rise to a range of potential investigations to better understand the extent to which the clinical outcome assessment under investigation is an appropriate measurement instrument (e.g. scale-to-sample targeting, adequacy of type and kind of response options, item and person fit, item dependency (or bias), stability between subgroups). Reference Hobart and Cano7,Reference Andrich9 Importantly, RMT addresses Reference Hobart and Cano7 each of the four limitations of CTT described above: (a) linear measurements can be constructed from ordinal-level data; (b) item estimates provided are free from the sample distribution and person estimates are free from the scale distribution; (c) subsets of items from each scale rather than all items can be used (i.e. the foundation for item banking and computerised adaptive testing); (d) estimates are suitable for individual person analyses rather than only for group comparison studies.

IRT is another body of psychometric methodology that is used to ascertain the degree to which a given model and parameter estimates can account for the structure of and statistical patterns in a clinical outcome assessment dataset. Reference Lord and Novick10 The distinction between RMT and IRT is subtle but important. IRT models are statistical models used to explain data, and the aim of an IRT analysis is to find the statistical model that best explains the observed data. Reference Andrich9 By contrast, the aim of RMT is to determine the extent to which observed clinical outcome assessment data satisfy the measurement model. Reference Rasch8 When the data do not fit the model, they are examined to try to explain the misfit. This is the central tenet of the Rasch model and one that distinguishes it from IRT models. Specifically, its defining property is its mathematical embodiment of the principle of invariant comparison. Thus, the comparison of two people is independent of which items are used within a set of items assessing the same concept of interest. In this way, the Rasch model is taken as a criterion for the structure of the responses, rather than simply a statistical description of the responses from patients. This central tenet distinguishes the RMT diagnostic paradigm from the IRT modelling paradigm. Reference Andrich9

In this issue, Horton and Perry provide an example of diagnostic information that can be attained using RMT methods, not available using information gleaned from CTT or IRT methods. Reference Horton and Perry11 The availability and increased application of RMT psychometric methods for developing and evaluating clinical outcome assessments in psychiatry has important implications for future research and practice. By better understanding the strengths, weaknesses and measurement potential of such assessments, we are able to build an evidence base towards optimising the organisation and delivery of healthcare in psychiatry. Reference Barbic, Kidd, Davidson, McKenzie and O'Connell12

Footnotes

See original paper, pp. 237–243, this issue.

Declaration of interest

None.

References

1 US Food and Drug Administration. Clinical Outcome Assessment Qualification Program. FDA, 2015. Available at: http://www.fda.gov/Drugs/DevelopmentApprovalProcess/DrugDevelopmentToolsQualificationProgram/ucm284077.htm (accessed 23 November 2015).Google Scholar
2 Kroenke, K, Spitzer, R, Williams, J. The PHQ-9: validity of a brief depression severity measure. J Gen Intern Med 2001; 16: 606–13.CrossRefGoogle ScholarPubMed
3 Thase, M. Translating clinical science into effective therapies. J Clin Psychiatry 2014; 75: e11.Google Scholar
4 Thornicroft, G, Slade, M. New trends in assessing the outcomes of mental health interventions. World Psychiatry 2014; 13: 118–24.CrossRefGoogle ScholarPubMed
5 Cano, S, Hobart, J. The problem with health measurement. Patient Pref Adher 2011; 5: 279–90.Google ScholarPubMed
6 Spearman, CE. The proof and measurement of association between two things. Am J Psychol 1904; 15: 72101.CrossRefGoogle Scholar
7 Hobart, J, Cano, S. Improving the evaluation of therapeutic intervention in MS: the role of new psychometric methods. UK Health Techn Assess Prog (Monograph) 2009; 13: 1200.Google Scholar
8 Rasch, G. Probabilistic Models for Some Intelligence and Attainment Tests. Danish Institute for Education Research, reprinted: MESA Press, 1993.Google Scholar
9 Andrich, D. Rating scales and Rasch measurement. Expert Rev. Pharmacoecon Outcomes Res 2011; 11: 571–5.Google Scholar
10 Lord, FM, Novick, MR. Statistical Theories of Mental Test Scores. Addison-Wesley, 1968.Google Scholar
11 Horton, M, Perry, A. Screening for depression in primary care: a Rasch analysis of the PHQ-9? BJPsych Bull 2016; doi: 10.1192/pb.bp.114.050294.Google Scholar
12 Barbic, S, Kidd, S, Davidson, L, McKenzie, K, O'Connell, M. Validation of the Brief Version of the Recovery Self-Assessment (RSA-B) using Rasch measurement theory. Psychiatr Rehabil J 2015; 38: 349–58.Google Scholar
Submit a response

eLetters

No eLetters have been published for this article.