Hostname: page-component-5f745c7db-q8b2h Total loading time: 0 Render date: 2025-01-06T07:49:13.162Z Has data issue: true hasContentIssue false

Confidence Bounds and Power for the Reliability of Observational Measures on the Quality of a Social Setting

Published online by Cambridge University Press:  01 January 2025

Yongyun Shin*
Affiliation:
Department of Biostatistics, Virginia Commonwealth University
Stephen W. Raudenbush
Affiliation:
Department of Sociology, University of Chicago
*
Requests for reprints should be sent to Yongyun Shin, Department of Biostatistics, Virginia Commonwealth University, P.O. Box 980032, 830 East Main Street, Richmond, VA 23298-0032, USA. E-mail: yshin@vcu.edu

Abstract

Social scientists are frequently interested in assessing the qualities of social settings such as classrooms, schools, neighborhoods, or day care centers. The most common procedure requires observers to rate social interactions within these settings on multiple items and then to combine the item responses to obtain a summary measure of setting quality. A key aspect of the quality of such a summary measure is its reliability. In this paper we derive a confidence interval for reliability, a test for the hypothesis that the reliability meets a minimum standard, and the power of this test against alternative hypotheses. Next, we consider the problem of using data from a preliminary field study of the measurement procedure to inform the design of a later study that will test substantive hypotheses about the correlates of setting quality. The preliminary study is typically called the “generalizability study” or “G study” while the later, substantive study is called the “decision study” or “D study.” We show how to use data from the G study to estimate reliability, a confidence interval for the reliability, and the power of tests for the reliability of measurement produced under alternative designs for the D study. We conclude with a discussion of sample size requirements for G studies.

Type
Original Paper
Copyright
Copyright © 2012 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

We wish to thank the editor and two reviewers for their helpful comments. The work reported here has been supported by the William T. Grant Foundation under the grant “Building Capacity for Evaluating Group-Level Interventions”.

References

Borman, G., Slavin, R.E., Cheung, A., Chamberlain, A., Madden, N.A., Chambers, B. (2005). The national randomized field trial of success for all: second-year outcomes. American Educational Research Journal, 42, 673696CrossRefGoogle Scholar
Brennan, R.L. (2001). Generalizability theory, New York: SpringerCrossRefGoogle Scholar
Burdick, R.K., Graybill, F.A. (1992). Confidence intervals on variance components, New York: DekkerCrossRefGoogle Scholar
Hirsch, B.J., Wong, V. (2005). After-school programs. In DuBois, D.L., Karcher, M.J. Handbook of youth mentoring, Thousand Oaks: Sage 364375CrossRefGoogle Scholar
Kinzie, M., Whitaker, S., Neesen, K., Kelley, M., Matera, M., Pianta, R. (2005). State-wide web-based professional development & curricula for early childhood educators: design & infrastructure. In Richards, G. Proceedings of world conference on E-learning in corporate, government, healthcare, and higher education 2005, Chesapeake: AACE 814821Google Scholar
La Paro, K., Pianta, R., Stuhlman, M. (2004). The classroom assessment scoring system: findings from the prekindergarten year. The Elementary School Journal, 104, 409426CrossRefGoogle Scholar
Pianta, R., Howes, C., Burchinal, M., Bryant, D., Clifford, R., Early, D., Barbarin, O. (2005). Features of pre-kindergarten programs, classrooms, and teachers: do they predict observed classroom quality and child-teacher interactions. Applied Developmental Science, 9, 144159CrossRefGoogle Scholar
Raudenbush, S.W., Martinez, A., Bloom, H., Zhu, P., & Lin, F., (2010). Studying the reliability of group-level measures with implications for statistical power: a six-step paradigm (Working paper). University of Chicago.Google Scholar
Raudenbush, S.W., Sadoff, S. (2008). Statistical inference when classroom quality is measured with error. Journal of Research on Educational Effectiveness, 1, 138154CrossRefGoogle Scholar
Shin, Y., Raudenbush, S.W. (2010). A latent cluster mean approach to the contextual effects model with missing data. Journal of Educational and Behavioral Statistics, 35, 2653CrossRefGoogle Scholar
Smith, R.E., Smoll, F.L., Cumming, S.P. (2007). Effects of a motivational climate intervention for coaches on young athletes’ sport performance anxiety. Journal of Sport and Exercise Psychology, 29, 3959CrossRefGoogle Scholar