Book contents
- Frontmatter
- Contents
- Series editors' preface
- Acknowledgements
- Preface
- 1 Alternate paradigms
- 2 Curriculum-related testing
- 3 Criterion-referenced test items
- 4 Basic descriptive and item statistics for criterion-referenced tests
- 5 Reliability, dependability, and unidimensionality
- 6 Validity of criterion-referenced tests
- 7 Administering, giving feedback, and reporting on criterion-referenced tests
- References
- Index
5 - Reliability, dependability, and unidimensionality
Published online by Cambridge University Press: 05 October 2012
- Frontmatter
- Contents
- Series editors' preface
- Acknowledgements
- Preface
- 1 Alternate paradigms
- 2 Curriculum-related testing
- 3 Criterion-referenced test items
- 4 Basic descriptive and item statistics for criterion-referenced tests
- 5 Reliability, dependability, and unidimensionality
- 6 Validity of criterion-referenced tests
- 7 Administering, giving feedback, and reporting on criterion-referenced tests
- References
- Index
Summary
Introduction
In this chapter, we will begin by addressing three central issues involved in test consistency: reliability, dependability, and fit. As we will explain, these issues arise because tests are never perfect, that is, any set of test scores contains error. Estimating just how much that error contributes to the scores of examinees centers on notions of reliability in norm-referenced testing (NRT), dependability in criterion-referenced testing (CRT), and what is termed fit in item response theory (IRT).
NRT reliability estimation relies heavily on correlational approaches. So, we will necessarily have to provide a brief explanation/ review of how the Pearson product-moment correlation coefficient works and how it should be interpreted. Three basic correlational approaches will then be explained: (a) test–retest reliability, (b) equivalent forms reliability, and (c) internal consistency reliabilities (including split-half adjusted by the Spearman-Brown prophecy formula, alpha, K-R20, and K-R21).
CRT dependability will be discussed in terms of two general approaches to consistency estimation: threshold-loss methods and generalizability theory approaches. The threshold-loss methods will include the original agreement and kappa coefficients based on two administrations of a CRT, and Subkoviak's (1980) short-cut methods that allow for estimating the agreement and kappa coefficients from a single test administration. The generalizability theory approaches will include discussion of single and multiple sources of error as well as in terms of the domain score approach, phi (Φ), and squared-error loss approaches including Livingston's statistic and the more effective phi(lambda).
- Type
- Chapter
- Information
- Criterion-Referenced Language Testing , pp. 149 - 211Publisher: Cambridge University PressPrint publication year: 2002