Hostname: page-component-745bb68f8f-b95js Total loading time: 0 Render date: 2025-01-07T17:59:02.783Z Has data issue: false hasContentIssue false

Future of Psychometrics: Ask What Psychometrics Can Do for Psychology

Published online by Cambridge University Press:  01 January 2025

Klaas Sijtsma*
Affiliation:
Tilburg University
*
Requests for reprints should be sent to Klaas Sijtsma, Department of Methodology and Statistics, TSB, Tilburg University, PO Box 90153, 5000 LE Tilburg, The Netherlands. E-mail: k.sijtsma@uvt.nl
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

I address two issues that were inspired by my work on the Dutch Committee on Tests and Testing (COTAN). The first issue is the understanding of problems test constructors and researchers using tests have of psychometric knowledge. I argue that this understanding is important for a field, like psychometrics, for which the dissemination of psychometric knowledge among test constructors and researchers in general is highly important. The second issue concerns the identification of psychometric research topics that are relevant for test constructors and test users but in my view do not receive enough attention in psychometrics. I discuss the influence of test length on decision quality in personnel selection and quality of difference scores in therapy assessment, and theory development in test construction and validity research. I also briefly mention the issue of whether particular attributes are continuous or discrete.

Type
Editorial Notes
Copyright
Copyright © 2011 The Psychometric Society

Footnotes

This article is based on the author’s Presidential Address, presented at the International Meeting of the Psychometric Society 2011, July 18–22, 2011, Hong Kong, China.

References

American Educational Research Association, American Psychological Association & National Council on Measurement in Education (1999). Standards for educational and psychological testing, Washington: American Educational Research Association.Google Scholar
Atkins, D.C., Bedics, J.D., McGlinchey, J.B., Beauchaine, T.P. (2005). Assessing clinical significance: does it matter which method we use?. Journal of Consulting and Clinical Psychology, 73, 982989.CrossRefGoogle ScholarPubMed
Bauer, S., Lambert, M.J., Nielsen, S.L. (2004). Clinical significance methods: a comparison of statistical techniques. Journal of Personality Assessment, 82, 6070.CrossRefGoogle ScholarPubMed
Bentler, P.A., Woodward, J.A. (1980). Inequalities among lower bounds to reliability: with applications to test construction and factor analysis. Psychometrika, 45, 249267.CrossRefGoogle Scholar
Boring, E.G. (1923). Intelligence as the tests test it. New Republic, 35, 3537.Google Scholar
Borsboom, D., Cramer, A.O.J., Kievit, R.A., Zand Scholten, A., Franić, S. (2009). The end of construct validity. In Lissitz, R.W. (Eds.), The concept of validity. Revisions, new directions, and applications (pp. 135170). Charlotte: Information Age Publishing, Inc.Google Scholar
Borsboom, D., Mellenbergh, G.J., van Heerden, J. (2004). The concept of validity. Psychological review, 111, 10611071.CrossRefGoogle ScholarPubMed
Bouwmeester, S., Vermunt, J.K., Sijtsma, K. (2007). Development and individual differences in transitive reasoning: a fuzzy trace theory approach. Developmental Review, 27, 4174.CrossRefGoogle Scholar
Brown, W. (1910). Some experimental results in the correlation of mental abilities. British Journal of Psychology, 3, 296322.Google Scholar
Cronbach, L.J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297334.CrossRefGoogle Scholar
Cronbach, L.J., Furby, L. (1970). How we should measure “change”—or should we?. Psychological Bulletin, 74, 6880.CrossRefGoogle Scholar
De Boeck, P., Wilson, M. (2004). Explanatory item response models. A generalized linear and nonlinear approach, New York: Springer.CrossRefGoogle Scholar
Denollet, J. (2000). Type D personality: a potential risk facor refined. Journal of Psychosomatic Research, 49, 255266.CrossRefGoogle Scholar
Denollet, J. (2005). DS14: standard assessment of negative affectivity, social inhibition, and Type D personality. Psychosomatic Medicine, 67, 8997.CrossRefGoogle ScholarPubMed
Emons, W.H.M., Denollet, J., Sijtsma, K., & Pedersen, S.S. (2011). Dimensional and categorical approaches to the Type D personality construct (in preparation).Google Scholar
Emons, W.H.M., Sijtsma, K., Meijer, R.R. (2007). On the consistency of individual classification using short scales. Psychological Methods, 12, 105120.CrossRefGoogle ScholarPubMed
Evers, A., Sijtsma, K., Lucassen, W., Meijer, R.R. (2010). The Dutch review process for evaluating the quality of psychological tests: history, procedure and results. International Journal of Testing, 10, 295317.CrossRefGoogle Scholar
Ferguson, E., et al. (2009). A taxometric analysis of Type D personality. Psychosomatic Medicine, 71, 981986.CrossRefGoogle ScholarPubMed
Fischer, G.H. (1995). The linear logistic test model. In Fischer, G.H., Molenaar, I.W. (Eds.), Rasch models. Foundations, recent developments and applications (pp. 131155). New York: Springer.Google Scholar
Green, S.A., Yang, Y. (2009). Commentary on coefficient alpha: a cautionary tale. Psychometrika, 74, 121135.CrossRefGoogle Scholar
Guttman, L. (1945). A basis for analyzing test-retest reliability. Psychometrika, 10, 255282.CrossRefGoogle ScholarPubMed
Hermans, H.J.M. (2011). Prestatie Motivatie Test voor Kinderen 2 (PMT-K-2) (Performance motivation test for children 2), Amsterdam: Pearson Assessment.Google Scholar
Jacobson, N.S., Truax, P. (1991). Clinical significance: a statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59, 1219.CrossRefGoogle ScholarPubMed
Jansen, B.R.J., Van der Maas, H.L.J. (1997). Statistical test of the rule assessment methodology by latent class analysis. Developmental Review, 17, 321357.CrossRefGoogle Scholar
Jansen, B.R.J., Van der Maas, H.L.J. (2002). The development of children’s rule use on the balance scale task. Journal of Experimental Child Psychology, 81, 383416.CrossRefGoogle ScholarPubMed
Kapinga, T.J. (2010). Drempelonderzoek. Didactische plaatsbepaling binnen het voortgezet onderwijs en praktijkonderwijs. 5e versie 2010 (Threshold investigation. Didactical location within secondary education and practical education. 5th Version 2010), Ridderkerk: 678 Onderwijs Advisering.Google Scholar
Korkman, M., Kirk, U., Kemp, S. (2010). NEPSY-II-NL. Nederlandstalige bewerking (A developmental neuropsycological assessment, II, Dutch version), Amsterdam: Pearson Assessment.Google Scholar
Kruyen, P.M., Emons, W.H.M., & Sijtsma, K. (in press). Test length and decision quality in personnel selection: when is short too short? International Journal of Testing.Google Scholar
Lissitz, R.W. (2009). The concept of validity. Revisions, new directions, and applications, Charlotte: Information Age Publishing, Inc..Google Scholar
Lord, F.M., Novick, M.R. (1968). Statistical theories of mental test scores, Reading: Addison-Wesley.Google Scholar
Mellenbergh, G.J. (1996). Measurement precision in test score and item response models. Psychological Methods, 1, 293299.CrossRefGoogle Scholar
Mellenbergh, G.J. (1999). A note on simple gain score precision. Applied Psychological Measurement, 23, 8789.CrossRefGoogle Scholar
Michell, J. (1999). Measurement in psychology. A critical history of a methodological concept, Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Nicewander, W.A., Price, J.M. (1983). Reliability of measurement and the power of statistical tests: some new results. Psychological Bulletin, 94, 524533.CrossRefGoogle Scholar
Novick, M.R., Lewis, C. (1967). Coefficient alpha and the reliability of composite measurements. Psychometrika, 32, 113.CrossRefGoogle ScholarPubMed
Ogles, B.M., Lunnen, K.M., Bonesteel, K. (2001). Clinical significance: history, application, and current practice. Clinical Psychology Review, 21, 421446.CrossRefGoogle ScholarPubMed
Raykov, T. (2001). Bias of coefficient α for fixed congeneric measures with correlated errors. Applied Psychological Measurement, 25, 6976.CrossRefGoogle Scholar
Reise, S.P., Haviland, M.G. (2005). Item response theory and the measurement of clinical change. Journal of Personality Assessment, 84, 228238.CrossRefGoogle ScholarPubMed
Ruscio, J., Haslam, N., Ruscio, A.M. (2006). Introduction to the taxometric method: a practical guide, Mahwah: Erlbaum.Google Scholar
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores, Richmond: Psychometric Society.CrossRefGoogle Scholar
Schlichting, L., Lutje Spelberg, H. (2010). Schlichting Test voor Taalproductie—II (Schlichting test for language production—II), Houten: Bohn Stafleu van Loghum.Google Scholar
Siegler, R.S. (1981). Developmental sequences within and between concepts. Monographs of the Society for Research in Child Development, 46(2, Serial No. 189).CrossRefGoogle Scholar
Sijtsma, K. (2009). On the use, the misuse, and the very limited usefulness of Cronbach’s alpha. Psychometrika, 74, 107120.CrossRefGoogle ScholarPubMed
Sijtsma, K. (2009). Reliability beyond theory and into practice. Psychometrika, 74, 169173.CrossRefGoogle ScholarPubMed
Sijtsma, K. (2011). Psychological measurement between physics and statistics. Theory & Psychology.Google Scholar
Sijtsma, K., Emons, W.H.M. (2011). Advice on total-score reliability issues in psychosomatic measurement. Journal of Psychosomatic Research, 70, 565572.CrossRefGoogle ScholarPubMed
Singh, S. (1997). Fermat’s last theorem, London: Harper Perennial.Google Scholar
Spearman, C. (1910). Correlation calculated from faulty data. British Journal of Psychology, 3, 271295.Google Scholar
Stevens, S.S. (1946). On the theory of scales of measurement. Science, 103, 677680.CrossRefGoogle ScholarPubMed
Smits, D.J.M., De Boeck, P. (2003). A componential IRT model for guilt. Multivariate Behavioral Research, 38, 161188.CrossRefGoogle Scholar
Ten Berge, J.M.F., Snijders, T.A.B., Zegers, F.E. (1981). Computational aspects of the greatest lower bound to the reliability and constrained minimum trace factor analysis. Psychometrika, 46, 201213.CrossRefGoogle Scholar
Van Breukelen, G.J.P., Vlaeyen, J.W.S. (2005). Norming clinical questionnaires with multiple regression: the pain cognition list. Psychological Assessment, 17, 336344.CrossRefGoogle ScholarPubMed
Van Maanen, L., Been, P.H., Sijtsma, K. (1989). Problem solving strategies and the linear logistic test model. In Roskam, E.E.C.I. (Eds.), Mathematical psychology in progress (pp. 267287). New York: Springer.CrossRefGoogle Scholar
Verguts, T., De Boeck, P. (2002). The induction of solution rules in Raven’s progressive matrices test. European Journal of Cognitive Psychology, 14, 521547.CrossRefGoogle Scholar
Zachary, R.A., Gorsuch, R.L. (1985). Continuous norming: implications for the WAIS-R. Journal of Clinical Psychology, 41, 8694.3.0.CO;2-W>CrossRefGoogle ScholarPubMed
Zhu, J., Chen, H.-Y.. Journal of Psychoeducational Assessment, 2011.Google Scholar