Hostname: page-component-745bb68f8f-cphqk Total loading time: 0 Render date: 2025-01-07T18:59:50.705Z Has data issue: false hasContentIssue false

The Relation of Item Difficulty and Inter-Item Correlation to Test Variance and Reliability

Published online by Cambridge University Press:  01 January 2025

Harold Gulliksen*
Affiliation:
College Entrance Examination Board

Abstract

Under assumptions that will hold for the usual test situation, it is proved that test reliability and variance increase (a) as the average inter-item correlation increases, and (b) as the variance of the item difficulty distribution decreases. As the average item variance increases, the test variance will increase, but the test reliability will not be affected. (It is noted that as the average item variance increases, the average item difficulty approaches .50). In this development, no account is taken of the effect of chance success, or the possible effect on student attitude of different item difficulty distributions. In order to maximize the reliability and variance of a test, the items should have high intercorrelations, all items should be of the same difficulty level, and the level should be as near to 50% as possible.

Type
Original Paper
Copyright
Copyright © 1945 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

*

The desirability of determining this relationship has been indicated by previous writers. Work on the present paper arose out of some problems raised by Dr. Herbert S. Conrad in connection with an analysis of aptitude tests.

On leave for Government war research from the Psychology Department, University of Chicago.

References

Carroll, John B. The effect of difficulty and chance success on correlations between items or between tests. Psychometrika, 1945, 10, 120.CrossRefGoogle Scholar
Dressel, Paul L. Some remarks on the Kuder-Richardson Reliability coefficient. Psychometrika, 1940, 5, 305310.CrossRefGoogle Scholar
Ferguson, G. A. The factorial interpretation of test difficulty. Psychometrika, 1941, 6, 323329.CrossRefGoogle Scholar
Jackson, R. W. B. and Ferguson, G. A.Studies on the reliability of tests. Bull. No. 12 of the Dept. of Educ. Res., Univer. of Toronto, 371 Bloor St. West, Toronto 5.Google Scholar
Kelley, T. L. Statistical method, New York: Macmillan, 1924.Google Scholar
Kuder, G. F. and Richardson, M. W. The theory of the estimation of test reliability. Psychometrika, 1937, 2, 151160.CrossRefGoogle Scholar
Richardson, M. W. The relation of difficulty to the differential validity of a test. Psychometrika, 1936, 1, 3349.CrossRefGoogle Scholar
Symonds, P. M. Factors influencing test reliability. J. educ. Psychol., 1928, 19, 7387.CrossRefGoogle Scholar
Thurstone, L. L. A method of scaling psychological and educational tests. J. educ. Psychol., 1925, 16, 433451.CrossRefGoogle Scholar
Thurstone, L. L. The scoring of individual performance. J. educ. Psychol., 1926, 17, 446457.CrossRefGoogle Scholar
Thurstone, T. G. The difficulty of a test and its diagnostic value. J. educ. Psychol., 1932, 23, 335343.CrossRefGoogle Scholar