1. The General Topic of the Book
In its most broad conception, this book is about how we choose to aggregate information from assessments to form scores. As a topic, it is as old as assessment itself. There are many adjacent topics that share considerable conceptual overlap with subscores. For example, someone asking “In what situations does it make sense to have subscores” is essentially asking the inverse of “In what situations does it makes sense to combine test scores into a higher-order composite?” Although the current work touches on critical historical developments in the subscore literature, it’s primary focus is the last quarter century. During this period, thought, research, and application related to subscores were tremendously impacted by federal education policy in the USA—specifically the No Child Left Behind Act (2001) and the Every Student Succeeds Act (2015). Both laws required provision of diagnostic information for individuals taking mandated assessments. This is the context and focus of the book: Subscores to provide diagnostic information about individuals in educational assessments in the twenty-first century.
2. Book Overview & Structure
Anyone who has had any exposure to the existing knowledge base related to subscores will have inevitably come across all four of the authors that contributed to this volume. They represent a significant portion of the collective human wisdom on the topic, and their research forms the basis of the current state of the art. As someone who has dabbled in this space, I was excited to see these four researchers come together to summarize their current thoughts on the topic of subscores.
The book consists of six chapters, along with preface, acknowledgements, appendix, glossary, and references. In the preface, the authors state their goal for the work:
Given the popularity of subscores and the high variability of their utility that is encountered in practice, the authors felt the need for a single, coherent, and authoritative source that provides a succinct and easily accessible summary of best practices supported by existing research. (p.xii)
Chapter 1 provides a very quick explanation of psychometrics, an illustrative example, introduction of some key concepts (reliability, specificity, & orthogonality), a very brief history of testing, and ends with a review of the chapter and a description of what follows. Chapter 2 mostly focuses on real examples of subscore reporting and closes with a distillation of some lessons learned. Chapter 3 serves as a more technical introduction to subscores, augmented subscores, and the evaluation of both (mostly focusing on PRMSE and VAR). Chapter 4 surveys 36 extant tests that report subscores and considers whether they should be reporting subscores. In the course of this exercise, some general lessons are reiterated about the situations in which one might expect subscores to add value. Chapter 5 covers what a psychometrician can consider doing when they try to estimate subscores and they are not informative. Chapter 6 summarizes the work, while also debunking some popular “subscore myths” the authors have encountered in their travels.
3. What I Liked
For starters, if I’m going to read someone’s book about subscores, I can’t imagine a better set of experts. If there is such a thing as a canon in subscore literature, their work features prominently.
Chapter 1 does a nice job outlining the conceptual landscape and gives a good overview of the remainder of the book. Chapter 2 provides an excellent variety of score reports and does a good job highlighting strengths and weaknesses of the disparate approaches to score reporting. Chapter 3 is a great summary of a large volume of existing workFootnote 1. It covers subscores, augmented subscores, evaluation of subscores, and several adjacent topics. It is more technical than the other chapters in the work, but made more accessible to more readers than the source material being summarized. Chapter 4 is heartbreaking, but in the best way possible. It shows how easy it is to waste time and energy making useless subscores and how often this is what we do in practice. Combined with Chapter 3, it also provides sound practical guidance about when users might expect subscores to be useful (or not). Chapter 5 is a great service to applied psychometricians, especially those who just figured out their subscores are useless. I deeply appreciated the myth busting that forms the latter part of Chapter 6. I have personally had most of these arguments and I will remain grateful to have this work to quote from (and cite) going forward.
The writing is good throughout and there are gems in each chapter. I think the summary of where things stand in Chapter 6 is particularly good:
It is the unfortunate result of essentially all serious research into the efficacy of subscores that despite what are often the heartfelt and earnest desires of many users of test results, extracting useful subscores from tests is extraordinarily unlikely unless the capacity for such subscores is built into the tests from their very beginning. (p. 136)
This sentence, along with the flowchart provided in Figure 5.1, forms a very pithy encapsulation of the current state of subscores.
4. What I Did Not Like
I have two bigger complaints and two little ones. The bigger ones first.
First, I think the chapters are in the wrong order. Chapter 1 is fine, but moving on to score reporting before talking about when scores were worth reporting was an odd step. I think it would have made more sense to have the Chapters ordered: 1, 3, 4, 5, 2, & 6. That would have created a very interesting possibility of incorporating the usefulness of subscores either as a structural element or at least as part of the discussion when reviewing score reporting. On the plus side, one can read the chapters in whatever order one wants, so this is partially remediable if the reader agrees with me. Second, I was disappointed with how little the discussion focused on dimensionality. Although it showed up in places, the most salient inclusion was in Dr. Sinharay’s acknowledgement (“To my parents...who taught me that anything is possible except reporting subscores for unidimensional tests”). I think that deprives readers of a potentially valuable handle for the issues being presented. At the end of the day, the abysmal track record of producing useful subscores (fewer than one in four reviewed in Chapter 4 are useful) is likely a direct result of our success in building unidimensional tests.
Moving on to more minor critiques, Chapter 2 never really makes a clear case about how subscores are specifically different from any other kind of score when it comes to best practices around score reporting. It isn’t at all obvious to me that there is anything materially different when it comes to score reporting for subscores. Perhaps the one exception would be an addendum which is something like “how to stop people from interpreting useless subscores that we are required by law or policy to provide.” That’s probably an under-developed niche in the score reporting literature. Last quibble is that I would have loved these authors to expound more on the validity issues around augmented subscores. It is possible to fix poor reliability by giving up some specificity, but depending on how we go about doing that we invoke different kinds of challenges to our validity argument. I admit that this is not necessarily in line with what the authors set out to do. But it is a critical topic, and I can’t think of a better set of scholars to write about it.
5. Who Should Buy This Book?
Anyone with basic psychometric training who needs to get up to speed on subscores quickly would benefit immensely from this book. Even someone who intends to dive deeper into the underlying research would be well served by starting here and then working through large chunks of the papers in the reference section. Stakeholders? It would take a very dedicated score consumer to end up here. On the other hand, anyone who is about to vote on whether to require subscores should be compelled to read Chapters 4 and 6.
6. Conclusion
In stating their goal for the book, the authors’ desired adjectives were “single,” “coherent,” “authoritative,” “succinct,” and “accessible.” How did they do? Well, there is just one book, so we’re one for one. In terms of coherence, it is both mostly coherent (defining coherence in terms of sticking together) and entirely coherent (defining coherence in terms of being intelligible). It is authoritative. It is succinct. The chapters are all accessible, just not to the same readers. So, all things considered, I think the authors achieved their goal. I look forward to being able to pull this book off the shelf the next time I must explain to someone why subscores are (probably) a horrible idea.