Hostname: page-component-cd9895bd7-p9bg8 Total loading time: 0 Render date: 2025-01-04T00:59:16.766Z Has data issue: false hasContentIssue false

On more Powerful Tests of Judge Agreement with a known Standard

Published online by Cambridge University Press:  01 January 2025

David H. Robinson*
Affiliation:
St. Cloud State University
*
Requests for reprints should be sent to David H. Robinson, Department of Mathematics, St. Cloud State University, St. Cloud, MN 32611.

Abstract

To establish the existence of his abilities, a judge is given the task of classifying each of N= rs subjects into one of r known categories, each containing s of the subjects. An incomplete design is proposed whereby the judge is presented with b groups, each one containing n= rs/b<r subjects. The n different categories corresponding to members of the group are known. Using the total number of correct classifications, this method of grouping is compared to that in which the group size is equal to the number of categories. The incomplete grouping is shown to yield a more powerful test for discriminating between the null hypothesis that the judge is guessing the classifications and the alternative hypothesis that he has some definite abilities. The incomplete design is found to be most effective (powerful) when the number of subjects in a group is limited to two or three.

Type
Original Paper
Copyright
Copyright © 1985 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

The author is grateful for the suggestions of the referees and the editor, which greatly improved the paper.

References

Blume, G. E. (1977). A comparative study of dreams and related fantasies, Gainesville: University of Florida.Google Scholar
Fisher, R. A. (1971). The design of experiments 9th ed.,, New York: Hafner Press.Google Scholar
Gridgeman, N. T. (1959). The lady tasting tea and allied topics. Journal of the American Statistical Association, 54, 776783.CrossRefGoogle Scholar
Tocher, K. D. (1950). Extensions of the Neyman-Pearson theory to tests of discontinuous variates. Biometrika, 37, 130144.CrossRefGoogle ScholarPubMed
Wackerly, D. D., McClave, J. T., Rao, P. V. (1978). Measuring nominal scale agreement between judge and a known standard. Psychometrika, 43, 213223.CrossRefGoogle Scholar
Wackerly, D. D., Robinson, D. H. (1983). A more powerful method for testing for agreement between a judge and a known standard. Psychometrika, 48, 183193.CrossRefGoogle Scholar