Hostname: page-component-745bb68f8f-l4dxg Total loading time: 0 Render date: 2025-01-07T18:55:56.497Z Has data issue: false hasContentIssue false

Optimal Appropriateness Measurement

Published online by Cambridge University Press:  01 January 2025

Michael V. Levine*
Affiliation:
University of Illinois
Fritz Drasgow
Affiliation:
University of Illinois
*
Requests for reprints should be sent to Michael V. Levine, 210 Education Building, University of Illinois, 1310 South Sixth St., Champaign, IL 61820.

Abstract

The test-taking behavior of some examinees may be so idiosyncratic that their test scores may not be comparable to the scores of more typical examinees. Appropriateness measurement attempts to use answer patterns to recognize atypical examinees. In this report appropriateness measurement procedures are viewed as statistical tests for choosing between a null hypothesis of normal test-taking behavior and an alternative hypothesis of atypical test-taking behavior. Most powerful tests for inappropriateness are described together with methods for computing their power. A recursion greatly simplifying the calculation of optimal test statistics is described and illustrated.

Type
Original Paper
Copyright
Copyright © 1988 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

The work reported in this article was supported by United States Office of Naval Research contracts N00014-79C-0752, NR 154-445 and N00014-83K-0397, NR 150-518, Michael V. Levine, Principal Investigator.

References

Bahadur, R. R. (1968). A representation of the joint distribution of responses ton dichotomous items. In Solomon, H. (Eds.), Studies in item analysis and prediction, Stanford, CA: Stanford University Press.Google Scholar
Bock, R. D., Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443459.CrossRefGoogle Scholar
Bock, R. D., Lieberman, M. (1970). Fitting a response model forn dichotomously scored items. Psychometrika, 35, 179197.CrossRefGoogle Scholar
Cressie, N., Holland, P. W. (1983). Characterizing the manifest probabilities of latent trait models. Psychometrika, 48, 129141.CrossRefGoogle Scholar
Donlon, T. F., Fischer, F. E. (1968). An index of an individual's agreement with group-determined item difficulties. Educational and Psychological Measurement, 28, 105113.CrossRefGoogle Scholar
Donlon, T. F., & iRindler, S. E. (1979). Consistency of item difficulty for individuals and groups in the Graduate Record Examination. Paper presented at the annual meeting of the American Educational Research Association, San Francisco.Google Scholar
Drasgow, F. (1982). Choice of test model for appropriateness measurement. Applied Psychological Measurement, 6, 297308.CrossRefGoogle Scholar
Drasgow, F., Levine, M. V. (1986). Optimal detection of certain forms of inappropriate test scores. Applied Psychological Measurement, 10, 5967.CrossRefGoogle Scholar
Drasgow, F., Levine, M. V., McLaughlin, M. E. (1987). Detecting inappropriate test scores with optimal and practical appropriateness indices. Applied Psychological Measurement, 11, 5979.CrossRefGoogle Scholar
Drasgow, F., Levine, M. V., Williams, E. (1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38, 6786.CrossRefGoogle Scholar
Green, D. M. (1960). Auditory detection of noise signal. Journal of the Acoustical Society of America, 32, 11891203.CrossRefGoogle Scholar
Harnisch, D. L. (1983). Item response patterns: Applications for educational practice. Journal of Educational Measurement, 20, 191206.CrossRefGoogle Scholar
Harnisch, D. L., Linn, R. L. (1981). Analysis of item response patterns: Questionable test data and dissimilar curriculum practices. Journal of Educational Measurement, 18, 133146.CrossRefGoogle Scholar
Harnisch, D. L., Linn, R. L. (1981). Identification of aberrant response patterns, Washington, DC: National Institute of Education.Google Scholar
Hulin, C. L., Drasgow, F., Parson, C. K. (1983). Item response theory: Application to psychological measurement, Homewood, IL: Dow Jones-Irwin.Google Scholar
Kane, M. T., Brennan, R. L. (1980). Agreement coefficients as indices of dependability for domain-referenced tests. Applied Psychological Measurement, 4, 105126.CrossRefGoogle Scholar
Kendall, M., Stuart, A. (1979). The advanced theory of statistics 4th ed.,, New York: Macmillan.Google Scholar
Lehmann, E. L. (1959). Testing statistical hypotheses, New York: Wiley.Google Scholar
Levine, M. V. (1984). An introduction to multilinear formula score theory (Report 84-4), Champaign, IL: University of Illinois, Department of Educational Psychology, Model-Based Measurement Laboratory.Google Scholar
Levine, M. V. (1985). Representing ability distributions (Report 85-1), Champaign, IL: University of Illinois, Department of Educational Psychology, Model-Based Measurement Laboratory.Google Scholar
Levine, M. V., Drasgow, F. (1982). Appropriateness measurement: Review, critique and validating studies. British Journal of Mathematical and Statistical Psychology, 35, 4256.CrossRefGoogle Scholar
Levine, M. V., Drasgow, F. (1983). The relation between incorrect option choice and estimated ability. Educational and Psychological Measurement, 43, 675685.CrossRefGoogle Scholar
Levine, M. V., Drasgow, F. (1984). Performance envelopes and optimal appropriateness measurement (Report 84-5), Champaign, IL: University of Illinois, Department of Educational Psychology, Model-Based Measurement Laboratory (ERIC Document No. ED 263 126)Google Scholar
Levine, M. V., Rubin, D. F. (1979). Measuring the appropriateness of multiple choice test scores. Journal of Educational Statistics, 4, 269290.CrossRefGoogle Scholar
Lord, F. M. (1968). An analysis of the Verbal Scholastic Aptitude Test using Birnbaum's three-parameter logistic model. Educational and Psychological Measurement, 28, 9891020.CrossRefGoogle Scholar
Miller, M. D. (1981). Measuring between-group differences in instruction, Los Angeles: University of California.Google Scholar
Mislevy, R. J. (1984). Estimating latent distributions. Psychometrika, 49, 359382.CrossRefGoogle Scholar
Parsons, C. K. (1983). The identification of people for whom JDI scores are inappropriate. Organizational Behavior and Human Performance, 31, 365393.CrossRefGoogle Scholar
Rudner, L. M. (1983). Individual assessment accuracy. Journal of Educational Measurement, 20, 207219.CrossRefGoogle Scholar
Samejima, F. (1981). Final report: Efficient methods of estimating the operating characteristics of item response categories and challenge to a new model for the multiple-choice item, Knoxville, TN: Department of Psychology, University of Tennessee.CrossRefGoogle Scholar
Sato, T. (1975). The construction and interpretation of S-P tables, Tokyo: Meijo Tosho (in Japanese)Google Scholar
Tatsuoka, K. K. (1984). Caution indices based on item response theory. Psychometrika, 49, 95110.CrossRefGoogle Scholar
Tatsuoka, K. K., Linn, R. L. (1983). Indices for detecting unusual response patterns: Links between two general approaches and potential applications. Applied Psychological Measurement, 7, 8196.CrossRefGoogle Scholar
Tatsuoka, K. K., Tatsuoka, M. M. (1980). Detection of aberrant response patterns and their effect on dimensionality, Urbana, IL: University of Illinois, Computer-based Education Research Laboratory.Google Scholar
Van der Flier, H. (1977). Environmental factors and deviant response patterns. In Poortinga, Y. H. (Eds.), Basic problems in cross-cultural psychology, Amsterdam: Swets and Seitlinger, B.V..Google Scholar
Van der Flier, H. (1982). Deviant response patterns and comparability of test scores. Journal of Cross-Cultural Psychology, 13, 267298.CrossRefGoogle Scholar
Wainer, H., Wright, B. D. (1980). Robust estimation of ability in the Rasch model. Psychometrika, 45, 373391.CrossRefGoogle Scholar
Wood, R. L., Wingersky, M. S., Lord, F. M. (1976). LOGIST—A computer program for estimating examinee ability and item characteristic curve parameters, Princeton, NJ: Educational Testing Service.Google Scholar
Wright, B. D. (1977). Solving measurement problems with the Rasch model. Journal of Educational Measurement, 14, 97116.CrossRefGoogle Scholar
Wright, B. D., Stone, M. H. (1979). Best test design, Chicago: Mesa Press.Google Scholar