Hostname: page-component-745bb68f8f-v2bm5 Total loading time: 0 Render date: 2025-01-07T18:37:26.708Z Has data issue: false hasContentIssue false

Item Response Theory with Estimation of the Latent Population Distribution Using Spline-Based Densities

Published online by Cambridge University Press:  01 January 2025

Carol M. Woods*
Affiliation:
Washington University in St. Louis
David Thissen
Affiliation:
University of North Carolina at Chapel Hill
*
Requests for reprints should be sent to Carol Woods, Washington University, Department of Psychology, Campus Box 1125, St. Louis, MO 63130-4899, USA. E-mail: cwoods@artsci.wustl.edu

Abstract

The purpose of this paper is to introduce a new method for fitting item response theory models with the latent population distribution estimated from the data using splines. A spline-based density estimation system provides a flexible alternative to existing procedures that use a normal distribution, or a different functional form, for the population distribution. A simulation study shows that the new procedure is feasible in practice, and that when the latent distribution is not well approximated as normal, two-parameter logistic (2PL) item parameter estimates and expected a posteriori scores (EAPs) can be improved over what they would be with the normal model. An example with real data compares the new method and the extant empirical histogram approach.

Type
Original Paper
Copyright
Copyright © 2006 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abdel-fattah, A.A. (1994, April). Comparing BILOG and LOGIST estimates for normal, truncated normal, and beta ability distributions. Paper presented at the annual meeting of the American Educational Research Association, New Orleans, GA.Google Scholar
Abrahamowicz, M., & Ramsay, J.O. (1992). Multicategorical spline model for item response theory. Psychometrika, 57, 528.CrossRefGoogle Scholar
American Psychiatric Association (1994). The diagnostic and statistical manual of mental disorders (4th ed.). Washington, DC: Author.Google Scholar
Andersen, E.B. (1970). Asymptotic properties of conditional maximum likelihood estimators. Journal of the Royal Statistical Society, Series B, 32, 283301.CrossRefGoogle Scholar
Andersen, E.B., & Madsen, M. (1977). Estimating the parameters of a latent population distribution. Psychometrika, 42, 357374.CrossRefGoogle Scholar
Baker, F.B., & Subkoviak, M.J. (1981). Analysis of test results via loglinear models. Applied Psychological Measurement, 5, 503515.CrossRefGoogle Scholar
Birnbaum, A. (1968). Some latent trait models. In Lord, F.M., & Novick, M.R., (Eds.), Statistical theories of mental test scores (pp. 395479). Reading, MA: Addison & Wesley.Google Scholar
Bock., R.D. (2003). 8.2 Estimation in BILOG-MG. In du Toit, M. (Ed.), IRT from SSI (pp. 599611). Lincolnwood, IL: Scientific Software International.Google Scholar
Bock, R.D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443459.CrossRefGoogle Scholar
Bock, R.D., & Lieberman, M. (1970). Fitting a response model for n dichotomously scored items. Psychometrika, 35, 179197.CrossRefGoogle Scholar
Boulet, J.R. (1996). The effect of nonnormal ability distributions on IRT parameter estimation using full-information and limited-information methods (item response theory, nonlinear factor analysis). Dissertation abstracts online, University of Ottawa (Canada).Google Scholar
Chen, J., & Zhang, D., & Davidian, M. (2002). A Monte Carlo EM algorithm for generalized linear mixed models with flexible random effects distribution. Biostatistics, 3, 347360.CrossRefGoogle Scholar
Cox, M.G. (1972). The numerical evaluation of B-splines. Journal of the Institute for Mathematics and its Applications, 10, 134149.CrossRefGoogle Scholar
Cressie, N., & Holland, P.W. (1983). Characterizing the manifest probabilities of latent trait models. Psychometrika, 48, 129141.CrossRefGoogle Scholar
Curry, H.B., & Schoenberg, I.J. (1947). On spline distributions and their limits: The Polya distribution functions. Bulletin of the American Mathematical Society, 53, 1114.Google Scholar
Davidian, M., & Gallant, A.R. (1993). The nonlinear mixed effects model with a smooth random effects density. Biometrika, 80, 475488.CrossRefGoogle Scholar
De Ayala, R.J. (1995, April). Item parameter recovery for the nominal response model. Paper presented at the Annual Meeting of the American Educational Research Association, San Francisco.Google Scholar
de Boor, C. (1972). On calculating with B-splines. Journal of Approximation Theory, 6, 5062.CrossRefGoogle Scholar
de Boor, C. (1978). A practical guide to splines. New York: Springer-Verlag.CrossRefGoogle Scholar
de Boor, C. (2001). A practical guide to splines (rev. ed.). New York: Springer-Verlag.Google Scholar
de Leeuw, J., & Verhelst, N. (1986). Maximum likelihood estimation in generalized Rasch models. Journal of Educational Statistics, 11, 183196.CrossRefGoogle Scholar
Dempster, A.P., Laird, N.M., & Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39, 138.CrossRefGoogle Scholar
Eilers, P.H., & Marx, B.D. (1996). Flexible smoothing with B-spline and penalties. Statistical Science, 11, 89121.CrossRefGoogle Scholar
Engelen, R.J.H. (1989). Parameter estimation in the logistic item response model. Doctoral dissertation, Universiteit Twente.Google Scholar
Follman, D. (1988). Consistent estimation in the Rasch model based on nonparametric margins. Psychometrika, 53, 553562.CrossRefGoogle Scholar
Friedman, J., & Silverman, B.W. (1989). Flexible parsimonious smoothing and additive modeling (with discussion). Technometrics, 31, 339.CrossRefGoogle Scholar
Gallant, A.R., & Nychka, D.W. (1987). Semi-nonparametric maximum likelihood estimation. Econometrica, 55, 363390.CrossRefGoogle Scholar
Hill, C.D. (2004). Precision of parameter estimates for the graded item response model. Unpublished masters thesis. University of North Carolina at Chapel Hill.Google Scholar
Holland, P.W. (1990). On the sampling theory foundations of item response theory models. Psychometrika, 55, 577601.CrossRefGoogle Scholar
Johnson, N.L. (1949). Systems of frequency curves generated by methods of translation. Biometrika, 36, 149176.CrossRefGoogle ScholarPubMed
Kelderman, H. (1984). Loglinear Rasch model tests. Psychometrika, 49, 223245.CrossRefGoogle Scholar
Kirisci, L., & Hsu, T.C. (April, 1995). The robustness of BILOG to violations of the assumptions of unidimensionality of test items and normality of ability distribution. Paper presented at the annual meeting of the National Council on Measurement in Education, San Francisco.Google Scholar
Kirisci, L., Hsu, T., & Yu, L. (2001). Robustness of item parameter estimation programs to assumptions of unidimensionality and normality. Applied Psychological Measurement, 25, 146162.CrossRefGoogle Scholar
Knuth, D.E. (1981). The art of computer programming. Vol. 2: Seminumerical algorithms. Reading, MA: Addison-Wesley.Google Scholar
Kooperberg, C., & Stone, C.J. (1991). A study of logspline density estimation. Computational Statistics and Data Analysis, 12, 327347.CrossRefGoogle Scholar
Kooperberg, C., & Stone, C.J. (1992). Logspline density estimation for censored data. Journal of Computational and Graphical Statistics, 1, 301328.CrossRefGoogle Scholar
Lord, F.N., & Novick, M.R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.Google Scholar
Marsaglia, G., & Zaman, A. (1991). A new class of random number generators. Annals of Applied Probability, 1, 462480.CrossRefGoogle Scholar
Masters, G.N., & Wright, B.D. (1984). The essential process in a family of measurement models. Psychometrika, 49,529544.CrossRefGoogle Scholar
Mellenbergh, G.J., & Vijn, P. (1981). The Rasch model as a loglinear model. Applied Psychological Measurement, 5, 369376.CrossRefGoogle Scholar
Mislevy, R.J. (1984). Estimating latent distributions. Psychometrika, 49, 359381.CrossRefGoogle Scholar
Mislevy, R.J., & Bock, R.D. (1990). BILOG-3: Item analysis and test scoring with binary logistic models [Computer software]. Mooresville, IN: Scientific Software International.Google Scholar
Nürnberger, G. (1989). Approximation by spline functions. New York: Springer-Verlag.CrossRefGoogle Scholar
O’Sullivan, F. (1986). A statistical perspective on ill-posed inverse problems (with discussion). Statistical Science, 1, 505527.Google Scholar
O’Sullivan, F. (1988). Fast computation of fully automated log-density and log-hazard estimators. Society for Industrial and Applied Mathematics Journal of Scientific and Statistical Computing, 9, 363379.Google Scholar
Pearson, K. (1895). Skew variation in homogenous material. Philosophical Transactions A, 186, 343414.Google Scholar
Raftery, A.E. (1999). Bayes Factors and the BIC: Comment on “A critique of the Bayesian information criterion for model selection.” Sociological Methods and Research, 27, 411427.CrossRefGoogle Scholar
Ramsay, J.O. (1991). Kernel smoothing approaches to nonparametric item characteristic curve estimation. Psychometrika, 56, 611630.CrossRefGoogle Scholar
Ramsay, J.O. (1995). A similarity-based smoothing approach to nondimensional item analysis. Psychometrika, 60, 323339.CrossRefGoogle Scholar
Ramsay, J.O. (2000). Differential equation models for statistical functions. Canadian Journal of Statistics, 28, 225240.CrossRefGoogle Scholar
Ramsay, J.O., & Abrahamowicz, M. (1989). Binomial regression with monotone splines: A psychometric application. Journal of the American Statistical Association, 84, 906915.CrossRefGoogle Scholar
Ramsay, J.O., & Silverman, B.W. (1997). Functional data analysis. New York: Springer-Verlag.CrossRefGoogle Scholar
Ramsay, J.O., & Winsberg, S. (1991). Maximum marginal likelihood estimation in semiparametric data analysis. Psychometrika, 56, 365380.CrossRefGoogle Scholar
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational Research.Google Scholar
Reise, S.P., & Yu, J. (1990). Parameter recovery in the graded response model using MULTILOG. Journal of Educational Measurement, 27, 133144.CrossRefGoogle Scholar
Roberts, J.S., Donoghue, J.R., & Laughlin, J.E. (2002). Characteristics of MML/EAP parameter estimates in the generalized graded unfolding model. Applied Psychological Measurement, 26, 192207.CrossRefGoogle Scholar
Rossi, N., Wang, X., & Ramsay, J.O. (2002). Nonparametric item response function estimates with the EM algorithm. Journal of Educational and Behavioral Statistics, 27, 291317.CrossRefGoogle Scholar
Samejima, F. (1998). Efficient nonparametric approaches for estimating the operating characteristics of discrete item responses. Psychometrika, 63, 111130.CrossRefGoogle Scholar
Santor, D.A., & Coyne, J.C. (1997). Shortening the CES-D to improve its ability to detect cases of depression. Psychological Assessment, 9, 233243.CrossRefGoogle Scholar
Santor, D.A., & Coyne, J.C. (2001). Examining symptom expression as a function of symptom severity: Item performance on the Hamilton Rating Scale for Depression. Psychological Assessment, 13, 127139.CrossRefGoogle ScholarPubMed
Santor, D.A., Ramsay, J.O., & Zuroff, D.C. (1994). Nonparametric item analyses of the Beck Depression Inventory. Examining item bias and response option weights in clinical and nonclinical samples. Psychological Assessment, 6, 255270.CrossRefGoogle Scholar
Santor, D.A., Zuroff, D.C., Ramsay, J.O., Cervantes, P., & Palacios, J. (1995). Examining scale discriminability in the BDI and CES-D as a function of depressive severity. Psychological Assessment, 7, 131139.CrossRefGoogle Scholar
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461464.CrossRefGoogle Scholar
Seong, T. (1990). Sensitivity of marginal maximum likelihood estimation of item and ability parameters to the characteristics of the prior ability distributions. Applied Psychological Measurement, 14, 299311.CrossRefGoogle Scholar
Silverman, B.W. (1982). On the estimation of a probability density function by the maximum penalized likelihood method. Annals of Statistics, 10, 795810.CrossRefGoogle Scholar
Stone, C.A. (1992). Recovery of marginal maximum likelihood estimates in the two-parameter logistic response model: An evaluation of MULTILOG. Applied Psychological Measurement, 16, 116.CrossRefGoogle Scholar
Stouffer, S.A., & Toby, J. (1951). Role conflict and personality. American Journal of Sociology, 5, 395406.CrossRefGoogle Scholar
Stroud, A.H. (1974). Numerical quadrature and solution of ordinary differential equations. New York: Springer-Verlag.CrossRefGoogle Scholar
Thissen, D. (1991). MULTILOG user’s guide: Multiple categorical item analysis and test scoring using item response theory.: Chicago, IL: Scientific Software International.Google Scholar
Thissen, D., & Orlando, M. (2001). Item response theory for items scored in two categories. In Thissen, D., & Wainer, H. (Eds.), Test scoring (p. 73137). Mahwah, NJ: Lawrence Erlbaum.CrossRefGoogle Scholar
Thissen, D., & Mooney, J.A. (1989). Loglinear item response models, with applications to data from social surveys. Sociological Methodology, 19, 299330.CrossRefGoogle Scholar
Thurstone, L.L. (1927). A law of comparative judgment. Psychological Review, 34, 278286.CrossRefGoogle Scholar
Tjur, T. (1982). A connection between Rasch’s item analysis model and a multiplicative Poisson model. Scandinavian Journal of Statistics, 9, 2330.Google Scholar
van den Oord, E.J.C.G. (2005). Estimating Johnson curve population distributions in MULTILOG. Applied Psychological Measurement, 29, 4564.CrossRefGoogle Scholar
Vevea, J.L., Edwards, M.C., Thissen, D., Reeve, B.B., Flora, D.B., Sathy, V., & Coon, C. (2002). User’s guide for Augment v.2: Empirical Bayes subscore augmentation software. Electronic Research Memorandum #2002–2. Chapel Hill, NC: University of North Carolina, L.L. Thurstone Psychometric Laboratory.Google Scholar
Woods, C.M. (2004). Item response theory with estimation of the latent population distribution using spline-based densities. Unpublished doctoral dissertation. University of North Carolina at Chapel Hill.Google Scholar
Woods, C.M., & Thissen, D. (2004). RCLOG v.1: Software for item response theory parameter estimation with the latent population distribution represented using spline-based densities (Tech. Rep. No. 1). Chapel Hill, NC: University of North Carolina, L.L. Thurstone Psychometric Laboratory.Google Scholar
Yamamoto, K., & Muraki, E. (1991, April). Non-linear transformation of IRT scale to account for the effect of nonnormal ability distribution on the item parameter estimation. Paper presented at the annual meeting of the American Educational Research Association, Chicago, IL.Google Scholar
Yen, W.M. (1987). A comparison of the efficiency and accuracy of bilog and logist. Psychometrika, 52, 275291.CrossRefGoogle Scholar
Zhang, D., & Davidian, M. (2001). Linear mixed models with flexible distributions of random effects for longitudinal data. Biometrics, 57, 795802.CrossRefGoogle ScholarPubMed
Zimowski, M., Muraki, E., Mislevy, R., & Bock, D. (2003). BILOG-MG 3 [Computer software]. Lincolnwood, IL: Scientific Software International.Google Scholar
Zwinderman, A.H., & van den Wollenberg, A.L. (1990). Robustness of marginal maximum likelihood estimation in the Rasch model. Applied Psychological Measurement, 14, 7381.CrossRefGoogle Scholar