Review of Issues About Classical Change Scores: A Multilevel Modeling Perspective on Some Enduring Beliefs

Zhengguo Gu; Wilco H. M. Emons; Klaas Sijtsma

doi:10.1007/s11336-018-9611-3

Review of Issues About Classical Change Scores: A Multilevel Modeling Perspective on Some Enduring Beliefs

Published online by Cambridge University Press: 01 January 2025

Zhengguo Gu

Wilco H. M. Emons and

Klaas Sijtsma

Show author details

Zhengguo Gu*: Affiliation:
Tilburg University
Wilco H. M. Emons: Affiliation:
Tilburg University
Klaas Sijtsma: Affiliation:
Tilburg University
*: Correspondence should be made to Zhengguo Gu, Department of Methodology and Statistics, TSB, Tilburg University, PO Box 90153, 5000 LE Tilburg, The Netherlands. Email: z.gu@tilburguniversity.edu

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

Change scores obtained in pretest–posttest designs are important for evaluating treatment effectiveness and for assessing change of individual test scores in psychological research. However, over the years the use of change scores has raised much controversy. In this article, from a multilevel perspective, we provide a structured treatise on several persistent negative beliefs about change scores and show that these beliefs originated from the confounding of the effects of within-person change on change-score reliability and between-person change differences. We argue that psychometric properties of change scores, such as reliability and measurement precision, should be treated at suitable levels within a multilevel framework. We show that, if examined at the suitable levels with such a framework, the negative beliefs about change scores can be renounced convincingly. Finally, we summarize the conclusions about change scores to dispel the myths and to promote the potential and practical usefulness of change scores.

Keywords

change score classical test theory measurement precision negative beliefs about change scores reliability

Type: Original Paper
Information: Psychometrika , Volume 83 , Issue 3 , September 2018 , pp. 674 - 695

DOI: https://doi.org/10.1007/s11336-018-9611-3 [Opens in a new window]
Copyright: Copyright © The Psychometric Society 2018

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Electronic supplementary material The online version of this article (https://doi.org/10.1007/s11336-018-9611-3) contains supplementary material, which is available to authorized users.

References

Allison, P. D. (1990). Change scores as dependent variables in regression analysis.Sociological Methodology, 20(1),93–114.CrossRef Google Scholar

Angoff, W. H. (1984). Scales, norms, and equivalent scores.Princeton, NJ:Educational Testing Service.Google Scholar

Bast, J., &Reitsma, P. (1997). Matthew effects in reading: A comparison of latent growth curve models and simplex models with structured means.Multivariate Behavioral Research, 32(2),135–167.CrossRef Google Scholar

Bereiter, C., & Harris, C. W. (1963). Some persisting dilemmas in the measurement of change.Problems in measuring change, 3–20.Madison, WI:University of Wisconsin Press.Google Scholar

Bergman, L. R. (2001). A person approach in research on adolescence: Some methodological challenges.Journal of Adolescent Research, 16(1),28–53.CrossRef Google Scholar

Collins, L. M. (1996). Measurement of change in research on aging: Old and new issues from an individual growth perspective.Handbook of the psychology of aging, 438–56.San Diego, CA:Academic Press.Google Scholar

Collins, L. M. (1996). Is reliability obsolete? A commentary on “Are simple gain scores obsolete?”.Applied Psychological Measurement, 20(3),289–292.CrossRef Google Scholar

Crocker, L., & Algina, J. (2008). Introduction to classical and modern test theory.Mason, OH:Cengage Learning.Google Scholar

Cronbach, L. J., & Furby, L. (1970). How we should measure “change”: Or should we?.Psychological Bulletin, 74((1),)68–80.CrossRef Google Scholar

Denney, C. B.,Rapport, M. D., & Chung, K.-M. (2005). Interactions of task and subject variables among continuous performance tests.Journal of Child Psychology and Psychiatry, 46(4),420–435.CrossRef Google Scholar PubMed

Diggle, P.,Heagerty, P.,Liang, K. Y., & Zeger, S. (2013). Analysis of longitudinal data.2Oxford:OUP Oxford.Google Scholar

Draheim, C.,Hicks, K. L., & Engle, R. W. (2016). Combining reaction time and accuracy: The relationship between working memory capacity and task switching as a case example.Perspectives on Psychological Science, 11(1),133–155.CrossRef Google Scholar PubMed

Embretson, S. E. (1991). A multidimensional latent trait model for measuring learning and change.Psychometrika, 56(3),495–515.CrossRef Google Scholar

Finney, J. W.,Moos, R. H., & Mewborn, C. R. (1980). Posttreatment experiences and treatment outcome of alcoholic patients six months and two years after hospitalization.Journal of Consulting and Clinical Psychology, 48(1),17–29.CrossRef Google Scholar PubMed

Fiszdon, J. M., & Johannesen, J. K. (2010). Comparison of computational methods for the evaluation of learning potential in schizophrenia.Journal of the International Neuropsychological Society: JINS, 16 613–620.CrossRef Google Scholar PubMed

Gjerustad, C., & von Soest, T. (2012). Socio-economic status and mental health: The importance of achieving occupational aspirations.Journal of Youth Studies, 15(7),890–908.CrossRef Google Scholar

Gold, A. B.,Ewing-Cobbs, L.,Cirino, P.,Fuchs, L. S.,Stuebing, K. K., & Fletcher, J. M. (2013). Cognitive and behavioral attention in children with math difficulties.Child Neuropsychology, 19(4),420–437.CrossRef Google Scholar PubMed

Glaser, R. (1963). Instructional technology and the measurement of learning outcomes.American Psychologist, 18 519–521.CrossRef Google Scholar

Guo, Y.,Tompkins, V.,Justice, L., & Petscher, Y. (2014). Classroom age composition and vocabulary development among at-risk preschoolers.Early Education and Development, 25(7),1016–1034.CrossRef Google Scholar PubMed

Hertzog, C.,von Oertzen, T.Ghisletta, P., & Lindenberger, U. (2008). Evaluating the power of latent growth curve models to detect individual differences in change.Structural Equation Modeling, 15(4),541–563.CrossRef Google Scholar

Holahan, C. J., & Moos, R. H. (1981). Social support and psychological distress: A longitudinal analysis.Journal of Abnormal Psychology, 90(4),365–370.CrossRef Google Scholar PubMed

Hox, J. J.,Moerbeek, M., & van de Schoot, R. (2010). Multilevel analysis: Techniques and applications.2New York, NY:Routledge.CrossRef Google Scholar

Hughes, M. M.,Linck, J. A.,Bowles, A. R.,Koeth, J. T., & Bunting, M. F. (2014). Alternatives to switch-cost scoring in the task-switching paradigm: Their reliability and increased validity.Behavior Research Methods, 46(3),702–721.CrossRef Google Scholar PubMed

Jabrayilov, R.,Emons, W. HM., & Sijtsma, K. (2016). Comparison of classical test theory and item response theory in individual change assessment.Applied Psychological Measurement, 40(8),559–572.CrossRef Google Scholar PubMed

Jacobson, N. S.,Follette, W. C., & Revenstorf, D. (1984). Psychotherapy outcome research: Methods for reporting variability and evaluating clinical significance.Behavior Therapy, 15(4),336–352.CrossRef Google Scholar

Jacobson, N. S., & Truax, P. (1991). Clinical significance: A statistical approach to defining meaningful change in psychotherapy research.Journal of Consulting and Clinical Psychology, 59(1),12–19.CrossRef Google Scholar PubMed

Kelly, S., & Ye, F. (2017). Accounting for the relationship between initial status and growth in regression models.The Journal of Experimental Education, 85(3),353–375.CrossRef Google Scholar

Kerckhoff, A. C. (1986). Effects of ability grouping in British secondary schools.American Sociological Review, 51(6),842–858.CrossRef Google Scholar

Kim, S., & Camilli, G. (2014). An item response theory approach to longitudinal analysis with application to summer setback in preschool language/literacy.Large-Scale Assessments in Education, 2(1),1CrossRef Google Scholar

Li, F.,Cohen, A.,Bottge, B., & Templin, J. (2016). A latent transition analysis model for assessing change in cognitive skills.Educational and Psychological Measurement, 76(2),181–204.CrossRef Google Scholar PubMed

Linn, R. L., & Haug, C. (2002). Stability of school-building accountability scores and gains.Educational Evaluation and Policy Analysis, 24(1),29–36.CrossRef Google Scholar

Linn, R. L., & Slinde, J. A. (1977). The determination of the significance of change between pre- and posttesting periods.Review of Educational Research, 47(1),121–150.CrossRef Google Scholar

Lord, F. M. (1956). The measurement of growth.ETS Research Bulletin Series, 1956(1),i–22CrossRef Google Scholar

Lord, F. M. (1963). Elementary models for measuring change.Harris, C. W. Problems in measuring change, 21–38.Madison, WI:The University of Wisconsin Press.Google Scholar

Lord, F. M.Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA:Adison-Wesley.Google Scholar

McArdle, J. J.,Petway, K. T., & Hishinuma, E. S. (2015). IRT for growth and change.Reise, S. P., & Revicki, D. A. Handbook of item response theory modeling: Applications to typical performance assessment. 1 New York, NY:Routledge. 435–456.Google Scholar

Mellenbergh, G. J. (1996). Measurement precision in test score and item response models.Psychological Methods, 1(3),293–299.CrossRef Google Scholar

Meredith, W., & Tisak, J. (1990). Latent curve analysis.Psychometrika, 55(1),107–122.CrossRef Google Scholar

Nesselroade, J. R. (1991). Interindividual differences in intraindividual change.Collins, L. M., & Horn, J. L. Best methods for the analysis of change: Recent advances, unanswered questions, future directions, Washington, DCAmerican Psychological Association 92–105.10.1037/10099-006CrossRef Google Scholar

Norman, G. R.,Sloan, J. A., & Wyrwich, K. W. (2003). Interpretation of changes in health-related quality of life: The remarkable universality of half a standard deviation.Medical Care, 41(5),582–592.CrossRef Google Scholar PubMed

O’Connor, E. F. (1972). Extending classical test theory to the measurement of change.Review of Educational Research, 42(1),73–97.CrossRef Google Scholar

Ogles, B. M.,Lunnen, K. M., & Bonesteel, K. (2001). Clinical significance: History, application, and current practice.Clinical Psychology Review, 21(3),421–446.CrossRef Google Scholar PubMed

Overall, J. E., & Woodward, J. A. (1975). Unreliability of difference scores: A paradox for measurement of change.Psychological Bulletin, 82(1),85–86.CrossRef Google Scholar

Parker, G. R., & Dabros, M. S. (2012). Last-period problems in legislatures.Public Choice, 151(3),789–806.CrossRef Google Scholar

Raaijmakers, J. GW. (2016). On testing the strength independence assumption in retrieval-induced forgetting.Psychonomic Bulletin & Review, 23(5),1374–1381.CrossRef Google Scholar PubMed

Raudenbush, S. W. (2001). Comparing personal trajectories and drawing causal inferences from longitudinal data.Annual Review of Psychology, 52 501–525.CrossRef Google Scholar PubMed

Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods.2London:Sage.Google Scholar

Raykov, T. (1993). A structural equation model for measuring residualized change and discerning patterns of growth or decline.Applied Psychological Measurement, 17(1),53–71.CrossRef Google Scholar

Reckase, M. (2009). Multidimensional item response theory.New York, NY:Springer.CrossRef Google Scholar

Rogosa, D. R.,Brandt, D., & Zimowski, M. (1982). A growth curve approach to the measurement of change.Psychological Bulletin, 92(3),726–748.CrossRef Google Scholar

Rogosa, D. R., & Willett, J. B. (1983). Demonstrating the reliability of the difference score in the measurement of change.Journal of Educational Measurement, 20 335–343.CrossRef Google Scholar

Roohr, K. C.,Liu, H., & Liu, O. L. (2016). Investigating student learning gains in college: A longitudinal study.Studies in Higher Education, 42(12),2284–2300.CrossRef Google Scholar

Sandell, R., & Wilczek, A. (2016). Another way to think about psychological change: Experiential vs. incremental.European Journal of Psychotherapy & Counselling, 18(3),228–251.CrossRef Google Scholar

Schunemann, H. J., & Guyatt, G. H. (2005). Commentary—goodbye M(C)ID! Hello MID, where do you come from?.Health Services Research, 40(2),593–597.CrossRef Google Scholar

Sijtsma, K., & van der Ark, L. A. (2015). Conceptions of reliability revisited and practical recommendations.Nursing Research, 64(2),128–136.CrossRef Google Scholar PubMed

Snijders, T. A., & Bosker, R. J. (2011). Multilevel analysis: An introduction to basic and advanced multilevel modeling.London:Sage.Google Scholar

Son, S-H., & Morrison, F. J. (2010). The nature and impact of changes in home learning environment on development of language and academic skills in preschool children.Developmental Psychology, 46(5),1103–1118.CrossRef Google Scholar PubMed

Stanley, J. C. (1967). General and special formulas for reliability of differences.Journal of Educational Measurement, 4(4),249–252.CrossRef Google Scholar

Stanovich, K. E. (1986). Matthew effects in reading: Some consequences of individual differences in the acquisition of literacy.Reading Research Quarterly, 21(4),360–407.CrossRef Google Scholar

Stevenson, C. E.,Heiser, W. J., & Resing, W. CM. (2013). Working memory as a moderator of training and transfer of analogical reasoning in children.Contemporary Educational Psychology, 38(3),159–169.CrossRef Google Scholar

Streiner, D. L., & Norman, G. R. (2008). Health measurement scales: A practical guide to their development and use.4New York, NY:Oxford University Press.CrossRef Google Scholar

Trompetter, H. R.,Lamers, S. MA.,Westerhof, G. J.,Fledderus, M., & Bohlmeijer, E. T. (2017). Both positive mental health and psychopathology should be monitored in psychotherapy: Confirmation for the dual-factor model in acceptance and commitment therapy.Behaviour Research and Therapy, 91 58–63.CrossRef Google Scholar PubMed

Willett, J. B. (1988). Questions and answers in the measurement of change.Review of Research in Education, 15,345–422.Google Scholar

Williams, B. J., & Kaufmann, L. M. (2012). Reliability of the go/no go association task.Journal of Experimental Social Psychology, 48(4),879–891.CrossRef Google Scholar

Williams, R. H., & Zimmerman, D. W. (1977). The reliability of difference scores when errors are correlated.Educational and Psychological Measurement, 37(3),679–689.CrossRef Google Scholar

Williams, R. H., & Zimmerman, D. W. (1996). Are simple gain scores obsolete?.Applied Psychological Measurement, 20(1),59–69.CrossRef Google Scholar

Wise, E. A. (2004). Methods for analyzing psychotherapy outcomes: A review of clinical significance, reliable change, and recommendations for future directions.Journal of Personality Assessment, 82(1),50–59.CrossRef Google Scholar PubMed

Zimmerman, D. W. (1994). A note on interpretation of formulas for the reliability of differences.Journal of Educational Measurement, 31(2),143–147.CrossRef Google Scholar

Zimmerman, D. W., & Williams, R. H. (1982). Gain scores in research can be highly reliable.Journal of Educational Measurement, 19(2),149–154.CrossRef Google Scholar

Zimmerman, D. W., & Williams, R. H. (1982). On the high predictive potential of change and growth measures.Educational and Psychological Measurement, 42(4),961–968.CrossRef Google Scholar

Gu et al. supplementary material

File 16.3 KB

Article contents

Review of Issues About Classical Change Scores: A Multilevel Modeling Perspective on Some Enduring Beliefs

Abstract

Keywords

Access options

Footnotes

References

Gu et al. supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests