Temporally Dynamic, Cohort-Varying Value-Added Models

Garritt L. Page; Ernesto San Martín; David Torres Irribarra; Sébastien Van Bellegem

doi:10.1007/s11336-024-09979-0

Temporally Dynamic, Cohort-Varying Value-Added Models

Published online by Cambridge University Press: 01 January 2025

Garritt L. Page ,

Ernesto San Martín

David Torres Irribarra and

Sébastien Van Bellegem

Show author details

Garritt L. Page: Affiliation:
Brigham Young University
Ernesto San Martín*: Affiliation:
Millennium Nucleus on Intergenerational Mobility MOVI LIDAM/CORE, Université catholique de Louvain Pontificia Universidad Católica de Chile
David Torres Irribarra: Affiliation:
School of Psychology, Pontificia Universidad Católica de Chile
Sébastien Van Bellegem: Affiliation:
LIDAM/CORE, Université catholique de Louvain
*: Correspondence should be made to Ernesto SanMartín, Faculty of Mathematics, Pontificia Universidad Católica de Chile, Vicuna Mackenna 4860, Macul, Santiago, Chile. Email: esanmar@uc.cl

Article contents

Abstract
Introduction
Time-Dependent Value-Added Model
Computation and Model Fitting
Simulation Study
Analysis of SIMCE Data
Conclusions
Data availibility
Declarations
Footnotes
References

Rights & Permissions

Abstract

We aim to estimate school value-added dynamically in time. Our principal motivation for doing so is to establish school effectiveness persistence while taking into account the temporal dependence that typically exists in school performance from one year to the next. We propose two methods of incorporating temporal dependence in value-added models. In the first we model the random school effects that are commonly present in value-added models with an auto-regressive process. In the second approach, we incorporate dependence in value-added estimators by modeling the performance of one cohort based on the previous cohort’s performance. An identification analysis allows us to make explicit the meaning of the corresponding value-added indicators: based on these meanings, we show that each model is useful for monitoring specific aspects of school persistence. Furthermore, we carefully detail how value-added can be estimated over time. We show through simulations that ignoring temporal dependence when it exists results in diminished efficiency in value-added estimation while incorporating it results in improved estimation (even when temporal dependence is weak). Finally, we illustrate the methodology by considering two cohorts from Chile’s national standardized test in mathematics.

Keywords

School value persistence Value-added models Temporal dependence

Type: Application Reviews and Case Studies
Information: Psychometrika , Volume 89 , Issue 3 , September 2024 , pp. 1074 - 1103

DOI: https://doi.org/10.1007/s11336-024-09979-0 [Opens in a new window]
Copyright: Copyright © 2024 The Author(s), under exclusive licence to The Psychometric Society

1. Introduction

Value-added models are frequently utilized to evaluate the contributions of educational institutions or stakeholders to the educational process. In certain instances, these models have directly influenced education policies(Sass et al., Reference Sass, Hannaway, Xu, Figlio and Feng2012; Koedel et al., Reference Koedel, Mihaly and Rockoff2015; Kyriakides et al., Reference Kyriakides, Georgiou, Creemers, Panayiotou and Reynolds2018; Liu & Loeb, 2019; Hanushek, Reference Hanushek, Bradley and Green2020). While there are several criticisms regarding their use(EPI Briefing Paper, 2010; Scherrer, Reference Scherrer2011; Ehlert et al., Reference Ehlert, Koedel, Parsons and Podgursky2014; Amrein-Beardsley & Holloway, Reference Amrein-Beardsley and Holloway2019), these critiques are primarily directed at the contexts in which they are applied, rather than their intrinsic value in advancing educational research(McCaffey et al., Reference McCaffey, Lockwood, Koretz, Louis and Hamilton2004; Reynolds et al., Reference Reynolds, Sammons, Fraine, Damme, Townsend, Teddlie and Stringfield2014). Nonetheless, value-added models appear to be valuable for monitoring the effectiveness of schools when different measures at both the school and student levels are taken over time.

Broadly speaking, two overarching perspectives regarding value-added model building exist in the school effectiveness literature. The first one considers an invariant group of people subjected to multiple measures over a time period. In this perspective, the school effect is constant over the time, capturing thus the effect of the school after considering the full process; this type of approach is developed through the so-called growth models (Potthoff & Roy, Reference Potthoff and Roy1964; Strenio et al., 1983; Fitzmaurice et al., Reference Fitzmaurice, Laird and Ware2004; Guldemond & Bosker, Reference Guldemond and Bosker2009; Bianconcini & Cagnone, Reference Bianconcini and Cagnone2012). In the second perspective, the composition of the group of individuals changes over time: each group is measured twice (pre- and post-test), allowing the identification of the school effect for each period; in this perspective, student achievement (which is often a standardized test score) is regressed onto previous attainment scores (i.e., standardized test result at the beginning of the value-added period).

This paper focuses on the second perspective because we aim to investigate how the performance of a previous cohort influences school effectiveness as the school accepts a new cohort. For instance, in our case study, the first cohort consists of students who were in the 4th grade in 2012, took the pre-test, and then took the post-test in 2014 as 6th graders. Subsequently, the second cohort, comprising students who took the pre-test in the 4th grade during the 2014 school year, took the post-test in 2016 as 6th graders. For similarly structured data, refer to Fig. 1 in Papay (Reference Papay2011, Figure 1).

When estimating a particular institution’s value-added across multiple cohorts of students, there is much interest to determine the extent to which a school’s effectiveness persists over time. One approach of modeling such a persistence is by considering school and/or teacher effects as “the effects cumulate over time” (Briggs & Weeks, Reference Briggs and Weeks2011, p. 620). The underlying idea is that effectiveness varies over time, and school and/or teacher effects represent school or teacher impacts within each academic year (Sanders & Horn, Reference Sanders and Horn1994; Ballou et al., Reference Ballou, Sanders and Wright2004). At the model specification level, the test score of a student at time t is determined by covariates, particularly the test score at time $t - 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t-1$$\end{document} , and a linear combination of school and/or teacher effects from t to the initial data collection time (McCaffey et al., Reference McCaffey, Lockwood, Koretz, Louis and Hamilton2004; Lockwood et al., Reference Lockwood, McCaffrey, Mariano and Setodji2007; Rothstein, Reference Rothstein2010; Kinsler, Reference Kinsler2012). It’s worth noting that this approach typically necessitates cohort scores available for at least two time periods (Vanwynsberghe et al., Reference Vanwynsberghe, Vanlaar, Van Damme and De Fraine2017; Tymms et al., Reference Tymms, Merrell and Bailey2018).

An alternative approach to analyzing the persistence of school effectiveness involves assessing school value-added indicators over time. It’s reasonable to assume that school effectiveness would generally exhibit stability, barring abrupt changes in faculty, resources, and leadership. From a statistical perspective, this suggests that value-added estimators would demonstrate temporal dependence. Consequently, proposing a model independently for each cohort of students might, at first glance, overlook the correlation among these estimators and result in a loss of efficiency when estimating the persistency of school effectiveness. Unfortunately, this approach is currently common practice.

Efforts to address this temporal dependence in value-added models often involve two-step procedures based on correlating value-added estimates post-model fit with each cohort being modeled separately (Gray et al., Reference Gray, Goldstein and Thomas2001; Thomas et al., Reference Thomas, Peng and Gray2007; Bellei et al., Reference Bellei, Vanni, Valenzuela and Contreras2016). However, this approach is suboptimal, as highlighted by Leckie (Reference Leckie2018), who recently explored this method and demonstrated persistent biases. As mentioned by Leckie (Reference Leckie2018), a much preferable approach would be to incorporate temporal dependence coherently in a statistical model by jointly modeling cohorts. This can be achieved in various ways, with Leckie (Reference Leckie2018) proposing one such approach.

Our goal here is therefore to further develop time-dependent value-added models. We specify a time-dependent value-added model to answer the following question: is it possible that the current effectiveness of a school as it takes in a new cohort is jointly influenced by both its previous effectiveness and some previous information from the former cohort? To answer this question, an identification analysis is necessary. The analysis we carry out shows that the parameters characterizing both the dependence of the previous school effectiveness and the previous cohort are identified. By using the model-free definition of value-added (Manzi et al., Reference Manzi, San Martín and Van Bellegem2014), we derive the corresponding school value-added indicators. Afterward, we establish an interpretation of school effectiveness persistence for the case of two cohorts, which consists of decomposing the school value-added for cohort 2 into two components: the first one corresponds to the expectation of the school value-added for cohort 2 conditionally on the school value-added for cohort 1, whereas the second component corresponds to the school value-added for cohort 2 minus the latter conditional expectation. This type of decomposition, typically used in other fields of psychometrics (e.g., Classical Test Theory Zimmerman, Reference Zimmerman1975), Factor Analysis (Lord & Novick, Reference Lord and Novick1968, Chapter 24) and School Effectiveness Manzi et al. (Reference Manzi, San Martín and Van Bellegem2014)), has the advantage that the first component corresponds to the explanation of cohort 2’s school value-added by the school value-added for cohort 1, while the second component corresponds to everything in value-added for cohort 2 that cannot be explained by value-added for cohort 1. Thus, the persistence of school efficiency corresponds to an additive combination of both the school value-added for cohort 1 and the information coming from cohort 1. Moreover, we prove that the first additive component is related to a model that is nested in our general formulation, which we denote by Model 1: it is characterized by assuming that the school effects are correlated over time as in ARIMA-type models. Similarly, we prove that the second additive component is related to a separate model nested in our general formulation, which we denote by Model 2: it is characterized by assuming that the current school effect is influenced by the post-tests from previous cohorts as a kind of “information shock”.

The rest of this paper is organized as follows. The time-dependent value-added models and the structural interpretation of the value-added indicators are derived and discussed in Sect. 2. Computation and model fitting, both based on Bayesian procedures, are briefly explained in Sect. 3. A simulation study to explore the impact of ignoring temporal dependence on value-added estimates is conducted in Sect. 4. Section 5 details a case study using data of the Chilean educational system. Conclusions and future work are gathered in Sect. 6.

2. Time-Dependent Value-Added Model

In this section we describe our approach to incorporate temporal dependence in value-added models. To begin we introduce notation that will be used throughout the article. Let $Y_{tij}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_{tij}$$\end{document} denote the jth measurement coming from the ith school for cohort t where $j = 1, \dots, n_{ti}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j=1, \ldots , n_{ti}$$\end{document} , $i = 1, \dots, I$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i = 1, \ldots , I$$\end{document} , and $t = 1, \dots T$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t = 1,\ldots T$$\end{document} (in our application $T = 2$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T=2$$\end{document} ). Further, let $Y_{ti} = {(Y_{t i 1}, \dots, Y_{t i n_{ti}})}^{'}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Y}_{ti} = (Y_{ti1}, \ldots , Y_{t i n_{ti}})'$$\end{document} be a $n_{ti} \times 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n_{ti}\times 1$$\end{document} vector of response values for cohort t of the ith school. Let $X_{tij}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{X}_{tij}$$\end{document} be a $p_{t} \times 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_t\times 1$$\end{document} vector of covariates measured from the jth student at the ith school for cohort t and $X_{ti} = {(X_{t i 1}, \dots, X_{t i n_{ti}})}^{'}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{X}_{ti}= ( \varvec{X}_{ti1}, \ldots , \varvec{X}_{tin_{ti}})'$$\end{document} denote the $n_{ti} \times p_{t}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n_{ti}\times p_t$$\end{document} “stacked matrix” of all covariate vectors measured from the ith school for cohort t. Note that this matrix does not include a column vector of ones. When it is necessary to refer to the pth covariate, we will use $X_{t i j, p}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_{tij,p}$$\end{document} . We will use $α_{ti}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{ti}$$\end{document} to denote the ith latent school effect for cohort t respectively. Finally, $β_{t}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\beta }_t$$\end{document} will denote a $p_{t} \times 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_t\times 1$$\end{document} vector of regression coefficients for cohort t; the remaining parameters will be made explicit in the sequential specification below; for the time being, the set of all parameters for cohort t (including $β_{t}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\beta }_t$$\end{document} ) will be denoted by $ψ_{t}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\psi }_t$$\end{document} .

2.1. Sequential Model Specification

When more than one cohort is available, the school effect $α_{ti}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{ti}$$\end{document} can in principle be influenced by two types of temporal factors: one that is unobserved and corresponds to the school effect of the previous cohort, namely $α_{t - 1 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{t-1i}$$\end{document} ; and one that is observed corresponding to (a function of the) post-tests from the previous cohort, namely $Y_{t - 1 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Y}_{t-1i}$$\end{document} . Thus, temporal dependence in value-added models is not only based on dependence between a school’s current cohort performance and its previous one (which is captured by $α_{t - 1 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{t-1i}$$\end{document} ), but also includes the impact that the information shock contained in $Y_{t - 1 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Y}_{t-1i}$$\end{document} has on current school performance.

As a result, a temporal dependent value-added model should be sequentially specified. More specifically, denoting $Y_{1, i}^{t} = {Y_{1 i}, Y_{2 i}, \dots, Y_{ti}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Y}_{1,i}^t = \{\varvec{Y}_{1i},\varvec{Y}_{2i},\dots ,\varvec{Y}_{ti}\}$$\end{document} (with similar notation for $X_{1, i}^{t}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{X}_{1,i}^t$$\end{document} , $α_{1, i}^{t}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{1,i}^t$$\end{document} and $ψ_{1}^{t}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\psi }_1^t$$\end{document} ) as the collection of response values (and covariates, latent school effects and parameters, respectively) for the ith school from time period one to time period t, the joint distribution generating ${(Y_{1, i}^{t}, X_{1, i}^{t}, α_{1, i}^{t}, ψ_{1}^{t}) : t = 1, \dots, T}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{(\varvec{Y}_{1,i}^t,\varvec{X}_{1,i}^t,\alpha _{1,i}^t,\varvec{\psi }_1^t)\,:\, t=1,\dots ,T\}$$\end{document} for each school i is sequentially decomposed as follows:

(2.1)

\begin{matrix} Y_{ti} ⊥ ⊥ Y_{1, i}^{t - 1}, X_{1, i}^{T}, α_{1, i}^{t}, ψ_{1}^{T} ∣ X_{ti}, α_{ti}, β_{t}, σ_{t}^{2} t \geq 2; \end{matrix}

(2.2)

\begin{matrix} (Y_{ti} ∣ X_{ti}, α_{ti}, β_{t}, σ_{t}^{2}) \sim N (X_{ti} β_{t} + α_{ti} ı_{n_{ti}}, σ_{t}^{2} I_{n_{ti}}), t \geq 2; \end{matrix}

(2.3)

\begin{matrix} α_{ti} ⊥ ⊥ Y_{1, i}^{t - 1}, X_{1, i}^{T}, α_{1, i}^{t - 1}, ψ_{1}^{T} ∣ Y_{t - 1 i}, α_{t - 1 i}, ϕ_{0 t}, ϕ_{1 t}, γ_{t}, τ_{t}^{2} t \geq 2; \end{matrix}

(2.4)

\begin{matrix} (α_{ti} ∣ Y_{t - 1 i}, α_{t - 1 i}, ϕ_{0 t}, ϕ_{1 t}, γ_{t}, τ_{t}^{2}) \sim N (ϕ_{0 t} + ϕ_{1 t} α_{t - 1 i} \\ + γ_{t} {\bar{Y}}_{t - 1 i}, τ_{t}^{2} (1 - ϕ_{1 t}^{2})), t \geq 2; \end{matrix}

(2.5)

\begin{matrix} Y_{1 i} ⊥ ⊥ X_{2, i}^{T}, ψ_{1}^{T} ∣ X_{1 i}, α_{1 i}, β_{1}, σ_{1}^{2}; \end{matrix}

(2.6)

\begin{matrix} (Y_{1 i} ∣ X_{1 i}, α_{1 i}, β_{1}, σ_{1}^{2}) \sim N (X_{1 i} β_{1} + α_{1 i} ı_{n_{1 i}}, σ_{1}^{2} I_{n_{1 i}}); \end{matrix}

(2.7)

\begin{matrix} α_{1 i} ⊥ ⊥ X_{1, i}^{T}, ψ_{1}^{T} ∣ ϕ_{01}, τ_{1}^{2}; \end{matrix}

(2.8)

\begin{matrix} (α_{1 i} ∣ ϕ_{01}, τ_{1}^{2}) \sim N (ϕ_{01}, τ_{1}^{2}); \end{matrix}

(2.9)

\begin{matrix} X_{1, i}^{T} ⊥ ⊥ ψ_{1}^{T}; \end{matrix}

(2.10)

\begin{matrix} X_{1, i}^{T} is left unspecified; \end{matrix}

(2.11)

\begin{matrix} ψ_{1}^{T} \sim π_{ψ}, \end{matrix}

where the parameter $ψ_{t} ≐ (β_{t}, σ_{t}^{2}, τ_{t}^{2}, ϕ_{0 t}, γ_{t}, ϕ_{1 t}) \in R^{p_{t}} \times R_{+}^{2} \times R^{2} \times [- 1, 1]$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\psi }_t\doteq (\varvec{\beta }_t,\sigma _t^2,\tau _t^2,\phi _{0t},\gamma _t, \phi _{1t})\in {\mathbb {R}}^{p_t}\times {\mathbb {R}}_+^2\times {\mathbb {R}}^2\times [-1,1]$$\end{document} for $t \geq 2$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t\ge 2$$\end{document} and $ψ_{1} ≐ (β_{1}, σ_{1}^{2}, τ_{1}^{2}, ϕ_{01}) \in R^{p_{1}} \times R_{+}^{2} \times R$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\psi }_1\doteq (\varvec{\beta }_1,\sigma _1^2,\tau _1^2,\phi _{01})\in {\mathbb {R}}^{p_{1}}\times {\mathbb {R}}_+^2\times {\mathbb {R}}$$\end{document} . Here, $ı_{n}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\imath }_{n}$$\end{document} is a vector of ones that is of length n and ${\bar{Y}}_{t - 1 i} = \frac{1}{n_{t - 1 i}} \sum_{j = 1}^{n_{t - 1 i}} Y_{t - 1 i j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\overline{Y}}_{t-1i}=\frac{1}{n_{t-1i}} \sum _{j=1}^{n_{t-1i}}Y_{t-1ij}$$\end{document} . The symbol $≐$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\doteq $$\end{document} means “defined as”. Note that in (2.1), (2.3) and (2.5), we consider $X_{1, i}^{T}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{X}_{1,i}^T$$\end{document} rather than $X_{1, i}^{t}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{X}_{1,i}^t$$\end{document} because both $Y_{ti}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Y}_{ti}$$\end{document} and $α_{ti}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{ti}$$\end{document} are related to the covariates at time t only, namely $X_{ti}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{X}_{ti}$$\end{document} ; similarly, for the parameters. To understand the logic behind this sequential specification, readers are encouraged to consult Section C of the Supplementary Material, where such a decomposition is explained for the case of $T = 2$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T=2$$\end{document} cohorts.

A few more detailed comments regarding the sequential specification of our time-dependent value-added model are warranted.

1. For each cohort t, condition (2.1) implies that $Y_{ti}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Y}_{ti}$$\end{document} is stochastically determined by the covariates $X_{ti}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{X}_{ti}$$\end{document} and the random school effect $α_{ti}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{ti}$$\end{document} ; the parameters characterizing this conditional distribution are $(β_{t}, σ_{t}^{2})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\beta _t,\sigma _t^2)$$\end{document} . Condition (2.2) not only defines a specific functional relationship of the conditional expectation $E (Y_{tij} ∣ X_{ti}, α_{ti})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E(Y_{tij}\mid \varvec{X}_{ti},\alpha _{ti})$$\end{document} , but also makes explicit that, conditionally on $(X_{ti}, α_{ti}, β_{t}, σ_{t}^{2})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\varvec{X}_{ti},\alpha _{ti},\varvec{\beta }_t,\sigma _t^2)$$\end{document} , the $Y_{tij}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_{tij}$$\end{document} ’s are mutually independent. This condition, known as the Axiom of Local Independence, defines the school effect in the sense that it explains the heterogeneity that is present in the $Y_{tij}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_{tij}$$\end{document} ’s and that is not explained by the covariates $X_{tij}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{X}_{tij}$$\end{document} ; for details, see Manzi et al. (Reference Manzi, San Martín and Van Bellegem2014) and Page et al. (Reference Page, San Martín, Orellana and González2017). Similar comments can be made for conditions (2.5) and (2.6).
2. Conditions (2.3) and (2.4) specify the temporal dependence of the proposed value-added model: the school effect $α_{ti}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{ti}$$\end{document} that impacts the t-th cohort’s performance is determined by both the school effect $α_{t - 1 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{t-1i}$$\end{document} and the information shock contained in the post-tests of cohort $t - 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t-1$$\end{document} . This conditional model is parameterized by $(ϕ_{0 t}, ϕ_{1 t}, γ_{t}, τ_{t}^{2})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\phi _{0t}, \phi _{1t},\gamma _t,\tau _t^2)$$\end{document} . Conditions (2.7) and (2.8) specify the initial condition at the school effect level; this model is parameterized by $(ϕ_{01}, τ_{1}^{2})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\phi _{01},\tau _1^2)$$\end{document} .
3. Condition (2.10) means that the covariates $X_{1, i}^{T}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{X}_{1,i}^{T}$$\end{document} are exogenous with respect to the school effect $α_{1 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{1i}$$\end{document} . This exogeneity explains why the covariates are left unspecified as stated in condition (2.10). For details on exogeneity, see Engle et al. (Reference Engle, Hendry and Richard1983) and Mouchart and Oulhaj (2006).

Remark 1

An important and well-known approach to modeling value-added with multiple test measures over time is provided by growth models. Growth models with latent variables have been considered to measure achievement, offering a pathway to model value-added across multiple measures (e.g., Bianconcini & Cagnone, Reference Bianconcini and Cagnone2012). In this remark, we aim to contrast this approach with the model studied in the current paper. Firstly, in growth models, an invariant group of people is subject to multiple measures over a time period. By contrast, in our approach, the composition of the group of individuals changes over time. Each group is measured twice (pre- and post-test), allowing the identification of the school effect for each period. Another difference lies in the interpretation of the random effect (or latent variable, in the parlance of growth models). Growth models consider the random effect to be invariant over time, meaning that it captures the value of the school after observing the full measures over the time period. In our modeling strategy, conversely, the latent random effect is dynamic, meaning that it changes over time as it captures the school effect over each period (1, 2], (2, 3], etc. $□$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\square $$\end{document}

2.2. Likelihood Function

The sequential specification (2.1)–(2.11) corresponds to a Bayesian decomposition of the joint distribution of $(Y_{1, i}^{T}, X_{1, i}^{T}, α_{1, i}^{T}, ψ_{1}^{T})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\varvec{Y}_{1,i}^{T},\varvec{X}_{1,i}^{T},\alpha _{1,i}^T,\varvec{\psi }_1^T)$$\end{document} for each school i across varying numbers of cohorts, denoted by T. The question concerns the criterion for selecting the likelihood function or statistical model, which is characterized by generating the observations alone. In other words, are the school effects $α_{i, 1}^{T}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{i,1}^T$$\end{document} treated as parameters of the likelihood function, or is the likelihood derived after integrating them out? This inquiry is closely tied to the two overarching perspectives in value-added model construction: one that treats the school effect as a random effect, while the other regards it as a fixed effect (see, among many others, Aitkin and Longford (Reference Aitkin and Longford1986), Tekwe et al. (Reference Tekwe, Carter, Ma, Algina, Lucas, Roth and Resnick2004)).

In the fixed-effect perspective, the school intercept is included among the parameters of the likelihood function, whereas in the random-effect perspective, the likelihood function is derived after integrating out the school effect, making the school effect not a parameter of the likelihood function. The decision between the fixed-effect or random-effect approach should be based on the induced statistical model, that is, on our “understanding of the way in which the data are supposed to, or did in fact, originate” (Fisher, Reference Fisher1973, p.8). In the context of school effectiveness, it’s crucial to conceptualize the school effect by considering its impact on observable variables. In the fixed-effect perspective, the school effect contributes the same amount to the student achievement regressed onto previous attainment scores (and potentially other factors). Thus, a school in this framework is seen as an entity adding a constant effect to each student’s predicted achievement, without relating achievement scores among students. On the other hand, in a random-effect approach, the school effect is defined through the Axiom of Local Independence, which implies that the school effect explains the heterogeneity that is present in the students achievement and that is not explained by the previous attaintment scores (and possibly other factors). Consequently, under this approach, a school is perceived as an entity that introduces heterogeneity in students’ achievements. This criterion is sometimes utilized in econometrics to differentiate between fixed-effect and random-effect models (Hsiao, Reference Hsiao2014). Additionally, when considering the type of data generating process being modeled, empirical comparisons between models attempting to characterize different phenomena—such as contrasting the corresponding estimates of value-added indicators from both approaches (Longford, Reference Longford2012; Clarke et al., Reference Clarke, Crawford, Steele and Vignoles2015)—might not necessarily aid in selecting the appropriate model. For details on this criterion, the reader is referred to the Supplementary Material, Appendix B.

Following Manzi et al. (Reference Manzi, San Martín and Van Bellegem2014) and Page et al. (Reference Page, San Martín, Orellana and González2017), we adhere to the random-effect paradigm because of the underlying conception of school. Consequently, the likelihood function is derived after integrating out the school effects. The subsequent result derives the explicit joint distribution of the observations $Y_{1, i}^{T}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{Y}}_{1,i}^T$$\end{document} given $(X_{1 i}^{T}, ψ_{1}^{T})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\varvec{X}_{1i}^T, \varvec{\psi }_1^T)$$\end{document} :

Theorem 1

Given the sequential model (2.1)–(2.11), the joint distribution of $Y_{1, i}^{T}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{Y}}_{1,i}^T$$\end{document} given $(X_{1 i}^{T}, ψ_{1}^{T})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\varvec{X}_{1i}^T, \varvec{\psi }_1^T)$$\end{document} is a normal distribution for every school i. For $T = 2$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T=2$$\end{document} cohorts, its expectation is given by

(2.12)

\begin{matrix} (\begin{matrix} η_{02} ı_{n_{2 i}} + X_{2 i} β_{2} + γ_{2} {\bar{X}}_{1 i} β_{1} ı_{n_{2 i}} \\ ϕ_{01} ı_{n_{1 i}} + X_{1 i} β_{1} \end{matrix}) \end{matrix}

where ${\bar{X}}_{1 i} = \frac{1}{n_{1 i}} ı_{n_{1 i}}^{'} X_{1 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\overline{\varvec{X}}_{1i}=\frac{1}{n_{1i}}\varvec{\varvec{\imath }}_{n_{1i}}'\varvec{X}_{1i}$$\end{document} is a $1 \times p_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1\times p_1$$\end{document} vector of empirical means at the school level so that ${\bar{X}}_{1 i} β_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\overline{\varvec{X}}_{1i}\varvec{\beta }_1$$\end{document} is a scalar, and $η_{02} ≐ {ϕ_{02} + ϕ_{01} (ϕ_{12} + γ_{2})}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\eta _{02}\doteq \{\phi _{02}+\phi _{01}(\phi _{12}+\gamma _2)\}$$\end{document} ; and its variance-covariance matrix is given by

(2.13)

\begin{matrix} (\begin{matrix} ω_{2 i} J_{n_{2 i}} + σ_{2}^{2} I_{n_{2 i}} & δ_{12 i} ı_{n_{2 i}} ı_{n_{1 i}}^{'} \\ τ_{1}^{2} J_{n_{1 i}} + σ_{1}^{2} I_{n_{1 i}} \end{matrix}), \end{matrix}

where

(2.14)

\begin{matrix} ω_{2 i} ≐ τ_{1}^{2} {(ϕ_{12} + γ_{2})}^{2} + \frac{γ_{2}^{2} σ_{1}^{2}}{n_{1 i}} + τ_{2}^{2} (1 - ϕ_{12}^{2}), δ_{12 i} ≐ ϕ_{12} τ_{1}^{2} + γ_{2} (τ_{1}^{2} + \frac{σ_{1}^{2}}{n_{1 i}}) \end{matrix}

and $J_{n} ≐ ı_{n} ı_{n}^{'}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{J}_{n}\doteq \varvec{\imath }_n\varvec{\imath }_n'$$\end{document} .

For $T > 2$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T>2$$\end{document} cohorts, the expectation is given by

(2.15)

\begin{matrix} E (Y_{1, i}^{t} ∣ X_{1 i}^{T}, ψ_{1}^{T}) = X_{ti} β_{t} + E (α_{ti} ∣ X_{1 i}^{T}, ψ_{1}^{T}) for 3 \leq t \leq T, \end{matrix}

where, for $3 \leq t \leq T$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$3\le t\le T$$\end{document} ,

(2.16)

\begin{matrix} E (α_{ti} ∣ X_{1 i}^{T}, ψ_{1}^{T}) & = ϕ_{0 t} + \sum_{ℓ = 2}^{t} \prod_{k = ℓ}^{t} (ϕ_{1 k} + γ_{k}) ϕ_{0, ℓ - 1} \\ + γ_{t} {\bar{X}}_{t - 1, i} β_{t - 1} + \sum_{ℓ = 2}^{t} \prod_{k = ℓ}^{t} (ϕ_{1 k} + γ_{k}) γ_{ℓ - 1} {\bar{X}}_{ℓ - 2, i} β_{ℓ - 2} \end{matrix}

the variances are given by

\begin{matrix} V (Y_{ti} ∣ X_{1 i}^{T}, ψ_{1}^{T}) = V (α_{ti} ∣ X_{1 i}^{t}, ψ_{1}^{T}) ι_{n_{ti}} ι_{n_{ti}}^{t}; \end{matrix}

and the covariances are given for every $1 ⩽ s < t ⩽ T$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1 \leqslant s < t \leqslant T$$\end{document} by

\begin{matrix} c o v (Y_{si}, Y_{ti} ∣ X_{1 i}^{T}, ψ_{1}^{T}) = V (α_{si} ∣ X_{1 i}^{t}, ψ_{1}^{T}) ι_{n_{ti}} ι_{n_{si}}^{t} \prod_{k = 0}^{t - s - 1} (ϕ_{1, t - k} + γ_{t - k}) . \end{matrix}

where

\begin{matrix} V (α_{ti} ∣ X_{1 i}^{t}, ψ_{1}^{T}) = V (α_{1 i} ∣ X_{1 i}^{t}, ψ_{1}^{T}) \prod_{k = 0}^{t - 1} A_{k} + \sum_{k = 0}^{t - 2} B_{k} \prod_{l = - 1}^{k - 1} A_{l} \end{matrix}

with $A_{k} = {(ϕ_{1, t - k} + γ_{t - k})}^{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_k=(\phi _{1,t-k} + \gamma _{t-k})^2$$\end{document} and $B_{k} = n_{t - k - 1, i}^{- 1} γ_{t - k}^{2} σ_{t - k - 1}^{2} + τ_{t - k}^{2} (1 - ϕ_{1, t - k}^{2})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$B_k = n^{-1}_{t-k-1,i} \gamma ^2_{t-k} \sigma ^2_{t-k-1} + \tau ^2_{t-k} (1 - \phi _{1,t-k}^2) $$\end{document} for every $0 ⩽ k ⩽ t - 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$0 \leqslant k \leqslant t-1$$\end{document} and $A_{- 1} = 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_{-1} =1$$\end{document} .

The proof of this result is to be found in the Supplementary Material: Section C provides the details for $T = 2$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T=2$$\end{document} cohorts; Section D provides the details for $T \geq 3$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T\ge 3$$\end{document} cohorts.

Let us conclude by noting that in the Bayesian specification, if the distributions of random effects are interpreted as prior distributions, then the Bayesian model corresponds to the equivalent of the classical fixed-effect model. On the other hand, the Bayesian equivalent of the classical random-effect model entails using the probability distribution obtained after integrating the random effects as the likelihood function (or statistical model), which is the approach we have chosen. For further discussion on the importance of explicitly defining the likelihood function within a Bayesian framework, please refer to the Supplementary Material, Section A.

2.3. Parameter Identification for Two Cohorts

According to the Likelihood Principle, for a given model all the information the data contains about the model parameters is given by the likelihood function (Lindley, 1983). We argue that such information is related to the identified parameters only; for a discussion, see Supplementary Material, Section A. Thus, in the context of the temporally dynamic, cohort-varying value-added model, parameter identifiability should be analyzed with respect to the conditional distribution of $Y_{1, i}^{T}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{Y}}_{1,i}^T$$\end{document} given $(X_{1 i}^{T}, ψ_{1}^{T})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\varvec{X}_{1i}^T, \varvec{\psi }_1^T)$$\end{document} . It’s worth emphasizing that although both model specification and parameter estimation are entirely Bayesian, the identifiability of parameters should also be established beforehand. Furthermore, it should be noted that the prior distribution on $ψ_{1}^{T}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \varvec{\psi }_1^T$$\end{document} has no impact on the identification analysis, except through events where the prior probability equals 0 or 1. For detailed insights, please see Supplementary Material, Section A. This elucidates why, in the sequential specification, the prior distribution $ψ_{1}^{T}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\psi }_1^T$$\end{document} in (2.11) remains unspecified.

Following the identification strategy outlined in San Martín et al. (Reference San Martín, Jara, Rolin and Mouchart2011), San Martín et al. (Reference San Martín, Rolin and Castro2013), San Martín et al. (Reference San Martín, González and Tuerlinckx2015), and Fariña et al. (2019), we demonstrate the identification of $(ψ_{1}, ψ_{2})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\varvec{\psi }_1,\varvec{\psi }_2)$$\end{document} for the case of $T = 2$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T=2$$\end{document} cohorts through the following arguments, which remain analogous for more than two cohorts:

1. From the conditional expectation of $Y_{1 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Y}_{1i}$$\end{document} , $ϕ_{01}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{01}$$\end{document} and $β_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\beta }_1$$\end{document} are identifiable provided that $ı_{n_{1 i}} \notin R (X_{1 i})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\imath }_{n_{1i}}\notin {{\mathcal {R}}}(\varvec{X}_{1i})$$\end{document} , which holds by construction. Here $R (A)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\mathcal {R}}}(\varvec{A})$$\end{document} represents the space generated by the columns of the matrix $A$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{A}$$\end{document} .
2. From the conditional expectation of $Y_{2 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Y}_{2i}$$\end{document} , $η_{02} + γ_{2} {\bar{X}}_{1 i} β_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\eta _{02}+\gamma _2\varvec{{\overline{X}}}_{1i}\varvec{\beta }_1$$\end{document} and $β_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\beta }_2$$\end{document} are identifiable provided that $ı_{n_{2 i}} \notin R (X_{2 i})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\imath }_{n_{2i}}\notin {{\mathcal {R}}}(\varvec{X}_{2i})$$\end{document} , which holds by construction.
3. From the variance of $Y_{1 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Y}_{1i}$$\end{document} , $τ_{1}^{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _1^2$$\end{document} and $σ_{1}^{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _1^2$$\end{document} are identified. Similarly, from the variance of $Y_{2 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Y}_{2i}$$\end{document} , $σ_{2}^{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _2^2$$\end{document} and $ω_{2 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\omega _{2i}$$\end{document} are identified for all school i.
4. From the covariance between $Y_{1 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Y}_{1i}$$\end{document} and $Y_{2 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Y}_{2i}$$\end{document} , $δ_{12 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta _{12i}$$\end{document} are identified for all school i. Now, if there exist at least two schools i and k such that, for the first cohort, the total number of students is different, then
$\begin{matrix} δ_{12 i} - δ_{12 k} = γ_{2} σ_{1}^{2} (\frac{1}{n_{1 i}} - \frac{1}{n_{2 k}}), \end{matrix}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \delta _{12i}-\delta _{12k}=\gamma _2\,\sigma _1^2\left( \frac{1}{n_{1i}}-\frac{1}{n_{2k}}\right) , \end{aligned}$$\end{document}
from which it follows that $γ_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _2$$\end{document} is identified. Furthermore, using the definition of $δ_{12 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta _{12i}$$\end{document} , it follows that $ϕ_{12}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{12}$$\end{document} is identified; and using the definition and identifiability of $ω_{2 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\omega _{2i}$$\end{document} , it follows that $τ_{2}^{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _2^2$$\end{document} is identified.
5. Using the identifiability of $γ_{2}, β_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _2,\varvec{\beta }_1$$\end{document} and $η_{02} + γ_{2} {\bar{X}}_{1 i} β_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\eta _{02}+\gamma _2\varvec{{\overline{X}}}_{1i}\varvec{\beta }_1$$\end{document} , it follows that $η_{02}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\eta _{02}$$\end{document} is identifiedx‘. Finally, from the definition of $η_{02}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\eta _{02}$$\end{document} , it follows that $ϕ_{02}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{02}$$\end{document} is identified. $□$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\square $$\end{document}

We summarize the previous arguments in the following proposition:

Proposition 1

In the statistical model parameterized by (2.12) and (2.13), the parameters $(ψ_{1}, ψ_{2})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\varvec{\psi }_1,\varvec{\psi }_2)$$\end{document} are identified provided that $ı_{n_{1 i}} \notin R (X_{1 i})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\imath }_{n_{1i}}\notin {{\mathcal {R}}}(\varvec{X}_{1i})$$\end{document} , $ı_{n_{2 i}} \notin R (X_{2 i})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\imath }_{n_{2i}}\notin {{\mathcal {R}}}(\varvec{X}_{2i})$$\end{document} and there exist at least two schools i and k such that, for the first cohort, the total number of students is different.

In the remainder of this paper, the discussion is focused on $T = 2$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T=2$$\end{document} cohorts due to the nature of our case study.

2.4. Nested Value-Added Models

The sequential specification (2.1)–(2.11) for $T = 2$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T=2$$\end{document} cohorts (referred to as the “Full Model” or also as Model 3) aims to address whether a school’s effectiveness upon receiving a new cohort is jointly influenced by both the effectiveness of the school for the previous cohort and some observed information from that cohort. The identification analysis assures us that such a scenario is empirically plausible. The crux of the underlying model lies in the temporal dependency of the latent school effect $α_{2 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{2i}$$\end{document} for cohort 2 on both the latent school effect $α_{1 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{1i}$$\end{document} for cohort 1 and the mean of the cohort 1 post-tests $Y_{1 i j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_{1ij}$$\end{document} ’s, as captured by the conditional distribution (2.4). At the statistical model level, the parameters of this conditional distribution, alongside $σ_{1}^{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _1 ^2$$\end{document} and $τ_{1}^{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _1^2$$\end{document} , delineate both the within- and between-cohorts dependencies among the post-test scores. As a matter of fact,

(2.17)

\begin{matrix} c o v (Y_{2 i r}, Y_{2 i s} ∣ X_{1, i}^{2}, ψ_{1}^{2}) = & τ_{1}^{2} {(ϕ_{12} + γ_{2})}^{2} + \frac{γ_{2}^{2} σ_{1}^{2}}{n_{1 i}} + τ_{2}^{2} (1 - ϕ_{12}^{2}), r \neq s; \end{matrix}

(2.18)

\begin{matrix} c o v (Y_{1 i j}, Y_{2 i k} ∣ X_{1, i}^{2}, ψ_{1}^{2}) = & τ_{1}^{2} (ϕ_{12} + γ_{2}) + \frac{γ_{2} σ_{1}^{2}}{n_{1 i}}, j \neq k . \end{matrix}

It can be noticed that the within-cohort covariance (2.17) is positive for all $γ_{2} \in R$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _2\in {\mathbb {R}}$$\end{document} and $ϕ_{12} \in [- 1, 1]$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{12}\in [-1,1]$$\end{document} , and hence the corresponding correlation; this is a standard fact in value-added models. Even more interesting is that the post-tests scores of the two cohorts are correlated; its sign basically depends on both the sign of $ϕ_{12} + γ_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{12}+\gamma _2$$\end{document} (which in turn determines the sign of the correlation between the school effects $α_{1 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{1i}$$\end{document} and $α_{2 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{2i}$$\end{document} ) and the sign of $γ_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _2$$\end{document} :

1. If the information shock parameter $γ_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _2$$\end{document} is such that $| γ_{2} | \geq 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|\gamma _2|\ge 1$$\end{document} , for all schools the sign of (2.18) as well as of the correlation between $α_{1 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{1i}$$\end{document} and $α_{2 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{2i}$$\end{document} corresponds to the sign of $γ_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _2$$\end{document} .
2. If the information shock parameter $γ_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _2$$\end{document} is such that $| γ_{2} | < 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|\gamma _2|<1$$\end{document} , then we distinguish the following cases:
1. (a) If $γ_{2} \in (- 1, 0)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _2\in (-1,0)$$\end{document} and $ϕ_{12} + γ_{2} < 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{12}+\gamma _2<0$$\end{document} then, for all schools, the sign of (2.18) is negative.
2. (b) If $γ_{2} \in (- 1, 0)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _2\in (-1,0)$$\end{document} and $ϕ_{12} + γ_{2} > 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{12}+\gamma _2>0$$\end{document} then, for a school i, the sign of (2.18) is positive if $τ_{1}^{2} > σ_{1}^{2} / n_{1 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _1^2>\sigma _1^2/n_{1i}$$\end{document} ; and it is negative if $τ_{1}^{2} < σ_{1}^{2} / n_{1 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _1^2<\sigma _1^2/n_{1i}$$\end{document} .
3. (c) If $γ_{2} \in (0, 1)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _2\in (0,1)$$\end{document} and $ϕ_{12} + γ_{2} > 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{12}+\gamma _2>0$$\end{document} then, for all schools, the sign of (2.18) is positive.
4. (d) If $γ_{2} \in (0, 1)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _2\in (0,1)$$\end{document} and $ϕ_{12} + γ_{2} < 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{12}+\gamma _2<0$$\end{document} then, for a school i, the sign of (2.18) is positive if $τ_{1}^{2} < σ_{1}^{2} / n_{1 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _1^2<\sigma _1^2/n_{1i}$$\end{document} ; and it is negative if $τ_{1}^{2} > σ_{1}^{2} / n_{1 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _1^2>\sigma _1^2/n_{1i}$$\end{document} .
Note also that, for $| γ_{2} | < 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|\gamma _2|<1$$\end{document} , the sign of the correlation between $α_{1 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{1i}$$\end{document} and $α_{2 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{2i}$$\end{document} corresponds, for all schools, to the sign of $ϕ_{12} + γ_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{12}+\gamma _2$$\end{document} .

These considerations motivate considering special cases of Model 3 that are based on particular values for the parameters $ϕ_{12}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{12}$$\end{document} and $γ_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _2$$\end{document} . This results in three models that are nested in Model 3:

1. Model 0 is obtained by setting $(ϕ_{12}, γ_{2}) = (0, 0)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\phi _{12},\gamma _2)=(0,0)$$\end{document} . In this case, the within-cohort covariance (2.17) is equal to $τ_{2}^{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _2^2$$\end{document} , whereas both the between-cohort covariance (2.18) and covariance between $α_{1 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{1i}$$\end{document} and $α_{2 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{2i}$$\end{document} are equal to 0. In other words, all temporal dependence between cohorts breaks and therefore this model reflects the situation in which a school that does not learn from its past, nor does it intend to affect its future. Under condition $(ϕ_{12}, γ_{2}) = (0, 0)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\phi _{12},\gamma _2)=(0,0)$$\end{document} , it follows that
(2.19) $\begin{matrix} α_{2 i} ⊥ ⊥ X_{1, i}^{2}, Y_{1 i}, α_{1 i}, ψ_{1}^{2} ∣ ϕ_{02}, τ_{2}^{2} . \end{matrix}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \alpha _{2i}\,\;\mathop {\perp \hspace{-0.75em}\perp }\;\, \varvec{X}_{1,i}^{2}, \varvec{Y}_{1i},\alpha _{1i},\varvec{\psi }_1^2\mid \phi _{02},\tau _2^2. \end{aligned}$$\end{document}
Consequently, Model 0 can be specified hierarchically as
1. (a) $(Y_{2 i} ∣ X_{2 i}, α_{2 i}, ψ_{1}^{2}) \sim N (X_{2 i} β_{2} + α_{2 i} ı_{n_{2 i}}, σ_{2}^{2} I_{n_{2 i}})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\varvec{Y}_{2i}\mid \varvec{X}_{2i},\alpha _{2i}, \varvec{\psi }_1^2)\sim {{\mathcal {N}}}(\varvec{X}_{2i}\varvec{\beta }_2 +\alpha _{2i}\varvec{\imath }_{n_{2i}},\sigma _{2}^2\varvec{I}_{n_{2i}})$$\end{document} ;
2. (b) $(α_{2 i} ∣ ψ_{1}^{2}) \sim N (ϕ_{02}, τ_{2}^{2})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\alpha _{2i}\mid \varvec{\psi }_1^2)\sim {{\mathcal {N}}}(\phi _{02},\tau _2^2)$$\end{document} ;
3. (c) $(Y_{1 i} ∣ X_{1 i}, α_{1 i},, ψ_{1}^{2}) \sim N (X_{1 i} β_{1} + α_{1 i} ı_{n_{1 i}}, σ_{1}^{2} I_{n_{1 i}})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\varvec{Y}_{1i}\mid \varvec{X}_{1i},\alpha _{1i},, \varvec{\psi }_1^2)\sim {{\mathcal {N}}}(\varvec{X}_{1i} \varvec{\beta }_1+\alpha _{1i}\varvec{\imath }_{n_{1i}},\sigma _{1}^2\varvec{I}_{n_{1i}})$$\end{document} ;
4. (d) $(α_{1 i} ∣ ψ_{1}^{2}) \sim N (ϕ_{01}, τ_{1}^{2})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\alpha _{1i}\mid \varvec{\psi }_1^2)\sim {{\mathcal {N}}}(\phi _{01},\tau _1^2)$$\end{document} .
2. Model 1 is obtained by setting $γ_{2} = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _2=0$$\end{document} . In this case, the within-cohort covariance (2.17) is equal to $τ_{1}^{2} ϕ_{12}^{2} + τ_{2}^{2} (1 - ϕ_{12}^{2})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _1^2\phi _{12}^2+\tau _2^2(1-\phi _{12}^2)$$\end{document} , which is always positive for all $ϕ_{12} \in [- 1, 1]$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{12}\in [-1,1]$$\end{document} . The between-cohort covariance (2.18) is equal to $ϕ_{12} τ_{1}^{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{12}\tau _1^2$$\end{document} , which is equivalent to the covariance between $α_{2 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{2i}$$\end{document} and $α_{1 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{1i}$$\end{document} : if the school effect for cohort 2 is negatively (resp., positively) correlated to the school effect for cohort 1, then the between-cohort correlation is negative (resp. positive). In other words, the relationship that the school effect has with its past school effect is reflected (at the sign level) in the observed relationship between the post-test scores between the two cohorts. Under condition $γ_{2} = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _2=0$$\end{document} , it follows that
(2.20) $\begin{matrix} α_{2 i} ⊥ ⊥ X_{1, i}^{2}, Y_{1 i}, ψ_{1}^{2} ∣ α_{1 i}, ϕ_{02}, ϕ_{12}, τ_{2}^{2} . \end{matrix}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \alpha _{2i}\,\;\mathop {\perp \hspace{-0.75em}\perp }\;\, \varvec{X}_{1,i}^{2}, \varvec{Y}_{1i},\varvec{\psi }_1^2\mid \alpha _{1i}, \phi _{02},\phi _{12},\tau _2^2. \end{aligned}$$\end{document}
Consequently, Model 1 can be specified hierarchically as
1. (a) $(Y_{2 i} ∣ X_{2 i}, α_{2 i}, ψ_{1}^{2}) \sim N (X_{2 i} β_{2} + α_{2 i} ı_{n_{2 i}}, σ_{2}^{2} I_{n_{2 i}})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\varvec{Y}_{2i}\mid \varvec{X}_{2i},\alpha _{2i},\varvec{\psi }_1^2)\sim {{\mathcal {N}}}(\varvec{X}_{2i} \varvec{\beta }_2+ \alpha _{2i}\varvec{\imath }_{n_{2i}},\sigma _{2}^2\varvec{I}_{n_{2i}})$$\end{document} ;
2. (b) $(α_{2 i} ∣ α_{1 i},, ψ_{1}^{2}) \sim N (ϕ_{02} + ϕ_{12} α_{1 i}, τ_{2}^{2} (1 - ϕ_{12}^{2}))$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\alpha _{2i}\mid \alpha _{1i},, \varvec{\psi }_1^2)\sim {{\mathcal {N}}}(\phi _{02}+\phi _{12}\alpha _{1i},\tau _2^2(1-\phi _{12}^2))$$\end{document} ;
3. (c) $(Y_{1 i} ∣ X_{1 i}, α_{1 i},, ψ_{1}^{2}) \sim N (X_{1 i} β_{1} + α_{1 i} ı_{n_{1 i}}, σ_{1}^{2} I_{n_{1 i}})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\varvec{Y}_{1i}\mid \varvec{X}_{1i},\alpha _{1i},, \varvec{\psi }_1^2)\sim {{\mathcal {N}}}(\varvec{X}_{1i} \varvec{\beta }_1 +\alpha _{1i}\varvec{\imath }_{n_{1i}},\sigma _{1}^2\varvec{I}_{n_{1i}})$$\end{document} ;
4. (d) $(α_{1 i} ∣, ψ_{1}^{2}) \sim N (ϕ_{01}, τ_{1}^{2}) .$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\alpha _{1i}\mid , \varvec{\psi }_1^2)\sim {{\mathcal {N}}}(\phi _{01},\tau _1^2).$$\end{document}
As it can be recognized, this structure corresponds to an ARIMA-type model.
3. Model 2 is obtained by setting $ϕ_{12} = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{12}=0$$\end{document} . In this case, the within-cohort covariance (2.17) is equal to $τ_{1}^{2} γ_{2}^{2} + γ_{2}^{2} σ_{1}^{2} / n_{1 i} + τ_{2}^{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _1^2\gamma _2^2+\gamma _2^2\sigma _1^2/n_{1i}+\tau _2^2$$\end{document} , which is always positive for all $γ_{2} \in R$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _2\in {\mathbb {R}}$$\end{document} . The between-cohort covariance (2.18) is equal to $γ_{2} (τ_{1}^{2} + σ_{1}^{2} / n_{1 i})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _2(\tau _1^2+\sigma _1^2/n_{1i})$$\end{document} : its sign depends on the sign of $γ_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _2$$\end{document} . Furthermore, if $γ_{2} > 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _2>0$$\end{document} (resp., $γ_{2} < 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _2<0$$\end{document} ), the between-cohort covariance is larger (resp., smaller) than the covariance between $α_{2 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{2i}$$\end{document} and $α_{1 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{1i}$$\end{document} . In other words, the shock of information captured by $γ_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _2$$\end{document} impacts both the correlation between the school effects and the between-cohort correlation: these dependency relationships provide an idea of what the model means by “shock of information”. Under condition $ϕ_{12} = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{12}=0$$\end{document} , it follows that
(2.21) $\begin{matrix} α_{2 i} ⊥ ⊥ X_{1, i}^{2}, α_{1 i}, ψ_{1}^{2} ∣ Y_{1 i}, ϕ_{02}, γ_{2}, τ_{2}^{2} . \end{matrix}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \alpha _{2i}\,\;\mathop {\perp \hspace{-0.75em}\perp }\;\, \varvec{X}_{1,i}^{2}, \alpha _{1i},\varvec{\psi }_1^2\mid \varvec{Y}_{1i}, \phi _{02},\gamma _2,\tau _2^2. \end{aligned}$$\end{document}
Consequently, Model 2 can be specified hierarchically as
1. (a) $(Y_{2 i} ∣ X_{2 i}, α_{2 i}, ψ_{1}^{2}) \sim N (X_{2 i} β_{2} + α_{2 i} ı_{n_{2 i}}, σ_{2}^{2} I_{n_{2 i}})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\varvec{Y}_{2i}\mid \varvec{X}_{2i},\alpha _{2i},\varvec{\psi }_1^2)\sim {{\mathcal {N}}}(\varvec{X}_{2i} \varvec{\beta }_2+\alpha _{2i} \varvec{\imath }_{n_{2i}},\sigma _{2}^2\varvec{I}_{n_{2i}})$$\end{document} ;
2. (b) $(α_{2 i} ∣ Y_{1 i}, ψ_{1}^{2}) \sim N (ϕ_{02} + γ_{2} {\bar{Y}}_{1 i ∙}, τ_{2}^{2})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\alpha _{2i}\mid \varvec{Y}_{1i},\varvec{\psi }_1^2)\sim {{\mathcal {N}}}(\phi _{02}+\gamma _2{\bar{Y}}_{1i\bullet },\tau _2^2)$$\end{document} ;
3. (c) $(Y_{1 i} ∣ X_{1 i}, α_{1 i}, ψ_{1}^{2}) \sim N (X_{1 i} β_{1} + α_{1 i} ı_{n_{1 i}}, σ_{1}^{2} I_{n_{1 i}})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\varvec{Y}_{1i}\mid \varvec{X}_{1i},\alpha _{1i},\varvec{\psi }_1^2)\sim {{\mathcal {N}}}(\varvec{X}_{1i} \varvec{\beta }_1+ \alpha _{1i} \varvec{\imath }_{n_{1i}},\sigma _{1}^2\varvec{I}_{n_{1i}})$$\end{document} ;
4. (d) $(α_{1 i} ∣ ψ_{1}^{2}) \sim N (ϕ_{01}, τ_{1}^{2}) .$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\alpha _{1i}\mid \varvec{\psi }_1^2)\sim {{\mathcal {N}}}(\phi _{01},\tau _1^2).$$\end{document}

Since

\begin{matrix} \begin{matrix} (2.19) ⟹ (2.20) ⟹ (2.3) with T = 2 \\ (2.19) ⟹ (2.21) ⟹ (2.3) with T = 2, \end{matrix} \end{matrix}

Model 0 is nested in Model 1, which in turn is nested in Model 3; and Model 0 is nested in Model 2, which in turn is nested in Model 3.

2.5. Value-Added Definition

The concept of school value-added refers to the difference between the expected grade of a student at a given school and the expected grade of the same student at an average school. It is, in other terms, the gain (or loss) in the expected score at a specific school compared to a baseline established by an average score.

A model-free definition of school value added was introduced in Manzi et al. (Reference Manzi, San Martín and Van Bellegem2014). In the notation of the present paper, it is given by

(2.22)

\begin{matrix} V A_{ti} (X_{ti}) ≐ \frac{1}{n_{ti}} \sum_{j = 1}^{n_{ti}} (E (Y_{tij} ∣ X_{tij}, α_{i}) - E (Y_{tij} ∣ X_{tij})) . \end{matrix}

It may be necessary to clarify what “model-free definition” means in this context. This definition involves conditional expectations that do not refer to any specific model, although later in this paper, we shall compute them within the context of the precise model presented from equations (2.1) to (2.11). Indeed, conditional expectations are related to the statistical distribution of data and, like expectations, can be defined and estimated without referring to any model. In probability theory, conditional expectations can be defined either through orthogonal projection of data (Neveu, 1972; Florens et al., Reference Florens, Marimoutou and Péguin-Feissolle2007) or, more broadly, by the Radon–Nikodym Theorem (Kolmogorov, Reference Kolmogorov1950; Billingsley, Reference Billingsley1968). We view the aforementioned definition as advantageous because it provides a statistical interpretation of the concept of school value-added that is not reliant on any specific model. Therefore, it can be applied to any particular situation where the psychometrician assumes a model (or compares them).

Definition (2.22) offers a characterization of the average or reference school that is entirely determined by the vector of covariates $X_{ti}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{X}_{ti}$$\end{document} . If the covariates change, it impacts not only the conceptualization of the average school but also the interpretation of school effectiveness and the value-added indicators themselves. Consequently, this definition implies that school effectiveness should not be viewed as a universally meaningful concept (i.e., a school being effective or ineffective in the same manner under all circumstances). Instead, it is a contextually idiosyncratic concept that should not be reified. The relevant context for this analysis is defined by the covariates included in the model. Therefore, their selection should be closely tied to the policy context requiring a value-added analysis, as well as to the social context within which an educational system operates.

Using (2.22), we derive the following value-added indicator for Model 3:

(2.23)

\begin{matrix} \begin{matrix} V A_{1 i} (X_{1 i}) & = α_{1 i} - ϕ_{01}, \\ V A_{2 i} (X_{2 i}) & = α_{2 i} - (ϕ_{02} + ϕ_{01} (ϕ_{12} + γ_{2})) - \frac{γ_{2}}{n_{2 i}} β_{1}^{'} \sum_{j = 1}^{n_{2 i}} E ({\bar{X}}_{1 i}^{'} ∣ X_{2 i j}) . \end{matrix} \end{matrix}

For further details, see “Appendix A.1”. The value-added indicators for Models 0, 1, and 2 are obtained by setting $ϕ_{12} = γ_{2} = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{12} = \gamma _2 = 0$$\end{document} , $γ_{2} = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _2 = 0$$\end{document} , and $ϕ_{12} = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{12} = 0$$\end{document} , respectively.

Note that $V A_{2 i} (X_{2 i})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$VA_{2i}(\varvec{X}_{2i})$$\end{document} represents the school effect $α_{2 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{2i}$$\end{document} (centered at 0 by $ϕ_{02} + ϕ_{01} (ϕ_{12} + γ_{2})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{02} + \phi _{01}(\phi _{12} + \gamma _2)$$\end{document} ), adjusted by an additive term dependent on ${(n_{2 i})}^{- 1} \sum_{j = 1}^{n_{2 i}} E ({\bar{X}}_{1 i}^{'} ∣ X_{2 i j})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(n_{2i})^{-1} \sum _{j=1}^{n_{2i}}E(\overline{\varvec{X}}_{1i}' \mid \varvec{X}_{2ij})$$\end{document} . Assuming this regression to be linear, it can be demonstrated that

\begin{matrix} \frac{1}{n_{2 i}} \sum_{j = 1}^{n_{2 i}} E ({\bar{X}}_{1 i}^{'} ∣ X_{2 i j}) = (\begin{matrix} b_{10} & b_{11} & 0 & \dots & 0 \\ b_{20} & 0 & b_{22} & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋱ & 0 \\ b_{p 0} & 0 & 0 & \dots & b_{pp} \end{matrix}) (\begin{matrix} 1 \\ {\bar{X}}_{2 i} \end{matrix}), \end{matrix}

where $p ≐ p_{1} = p_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p \doteq p_1 = p_2$$\end{document} ; in this form, the regression parameters are identifiable. For more details, see “Appendix A.2”.

Remark 2

For $T \geq 3$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T\ge 3$$\end{document} cohorts, the value-added $V A_{Ti} (X_{Ti})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$VA_{Ti}(\varvec{X}_{Ti})$$\end{document} is given by

\begin{matrix} V A_{Ti} (X_{Ti}) = & α_{Ti} - E (α_{Ti} ∣ X_{Ti}, ψ_{1}^{T}) \\ = & α_{Ti} - E [E (α_{Ti} ∣ X_{1 i}^{T}, ψ_{1}^{T}) ∣ X_{Ti}, ψ_{1}^{T}] . \end{matrix}

Using (2.16), we conclude that

\begin{matrix} V A_{Ti} (X_{Ti}) = & α_{Ti} - ϕ_{0 t} - \sum_{ℓ = 2}^{t} \prod_{k = ℓ}^{t} (ϕ_{1 k} + γ_{k}) ϕ_{0, ℓ - 1} - γ_{t} E ({\bar{X}}_{t - 1, i}^{'} ∣ X_{Ti}) β_{t - 1} - \\ \sum_{ℓ = 2}^{t} \prod_{k = ℓ}^{t} (ϕ_{1 k} + γ_{k}) γ_{ℓ - 1} E ({\bar{X}}_{ℓ - 2, i}^{'} ∣ X_{Ti}) β_{ℓ - 2} . \end{matrix}

$□$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\square $$\end{document}

2.6. Structural Interpretation of the Persistence of School Effectiveness

Time-dependent value-added models are intended to model the persistence of school effectiveness. Following Gray et al. (Reference Gray, Goldstein and Jesson1996, Reference Gray, Hopkins, Reynolds, Wilcox, Farrell and Jesson1999), the persistence of school effectiveness is described through trajectories of school value-added. This meaning of persistence critically depends on the time-dependent value-added model that is used. Studies devised to describe the persistence of school effectiveness have appeared in the literature, but are (implicitly) based on Model 0 (see, e.g., Gray et al., Reference Gray, Goldstein and Thomas2001, Thomas et al., Reference Thomas, Peng and Gray2007, Bellei et al., Reference Bellei, Vanni, Valenzuela and Contreras2016). These approaches are limited because they assume that a school’s current effectiveness is not affected by what the school did in the past. It seems reasonable to expect that a school’s past performance would be useful in determining the future effectiveness of the school. Our approach that is based on Model 3, including its particular cases Model 1 and Model 2, overcomes this limitation.

Under the Model 0, Model 1, Model 2 or Model 3, the school valued-added for cohort 1 coincide with their respective school effect $α_{1 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{1i}$$\end{document} centered at 0 by $ϕ_{01}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{01}$$\end{document} . To understand the extent to which the school value-added for cohort 1 explains the school value-added for cohort 2, we decompose the latter into two components: the first component captures the explanation of the second value-added by the first one, whereas the second corresponds to everything of the second value-added that is not explained by the first; that is,

(2.24)

\begin{matrix} V A_{2 i} (X_{2 i}) = & E (V A_{2 i} (X_{2 i}) ∣ V A_{1 i} (X_{1 i}), X_{1, i}^{2}, ψ_{1}^{2}) \\ + (V A_{2 i} (X_{2 i}) - E (V A_{2 i} (X_{2 i}) ∣ V A_{1 i} (X_{1 i}), X_{1, i}^{2}, ψ_{1}^{2})), \end{matrix}

where

(2.25)

\begin{matrix} E (V A_{2 i} (X_{2 i}) ∣ V A_{1 i} (X_{1 i}), X_{1, i}^{2}, ψ_{1}^{2}) = & (ϕ_{12} + γ_{2}) V A_{1 i} (X_{1 i}) \\ + \frac{γ_{2}}{n_{2 i}} \sum_{j = 1}^{n_{2 i}} ({\bar{X}}_{1 i} - E ({\bar{X}}_{1 i} ∣ X_{2 i j})) β_{1}; \end{matrix}

for a proof, see Supplementary Material, Section E. It can also be verified that

(2.26)

\begin{matrix} V a r (E (V A_{2 i} (X_{2 i}) ∣ V A_{1 i} (X_{1 i}), X_{1, i}^{2}, ψ_{1}^{2}) ∣ X_{1, i}^{2}, ψ_{1}^{2}) = τ_{1}^{2} {(ϕ_{12} + γ_{2})}^{2}, \end{matrix}

and

(2.27)

\begin{matrix} V a r (V A_{2 i} (X_{2 i}) ∣ X_{1, i}^{2}, ψ_{1}^{2}) = ω_{2 i} . \end{matrix}

Since by construction both terms at the right hand of decomposition (2.24) are uncorrelated, it follows that the variance of the second term at the right hand (typically called measurement error) is given by

(2.28)

\begin{matrix} V a r (V A_{2 i} (X_{2 i}) - E (V A_{2 i} (X_{2 i}) ∣ V A_{1 i} (X_{1 i}), X_{1, i}^{2}, ψ_{1}^{2}) ∣ X_{1, i}^{2}, ψ_{1}^{2}) \\ = \frac{γ_{2}^{2} σ_{1}^{2}}{n_{1 i}} + τ_{2}^{2} (1 - ϕ_{12}^{2}); \end{matrix}

for a proof, see Supplementary Material, Section E.

To facilitate the interpretation of (2.25), consider the case of only one covariate, namely the pre-test. In this case, $p = 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p=1$$\end{document} and therefore

\begin{matrix} \frac{1}{n_{2 i}} \sum_{j = 1}^{n_{2 i}} E ({\bar{X}}_{1 i} ∣ X_{2 i j}) = E ({\bar{X}}_{1 i} ∣ {\bar{X}}_{2 i}) \end{matrix}

so that the average pre-test of the first cohort is regressed on the average pre-test of the second cohort. Then:

1. In the explanation of $V A_{2 i} (X_{2 i})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$VA_{2i}(\varvec{X}_{2i})$$\end{document} by $(V A_{1 i} (X_{1 i}), X_{1, i}^{2})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(VA_{1i}(\varvec{X}_{1i}),\, \varvec{X}_{1,i}^2)$$\end{document} , there are two components:
1. (a) The first component depends on $(ϕ_{12} + γ_{2}) V A_{1 i} (X_{1 i})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\phi _{12}+\gamma _2)\,VA_{1i}(\varvec{X}_{1i})$$\end{document} : the parameter $(ϕ_{12} + γ_{2})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\phi _{12}+\gamma _2)$$\end{document} determines the sign of the correlation between $α_{1 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{1i}$$\end{document} and $α_{2 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{2i}$$\end{document} : if such a correlation is positive (resp., negative), $V A_{1 i} (X_{1 i})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$VA_{1i}(\varvec{X}_{1i})$$\end{document} is amplified (resp., contracted) as an explanatory factor of $V A_{2 i} (X_{2 i})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$VA_{2i}(\varvec{X}_{2i})$$\end{document} .
2. (b) The second component corresponds to the residual of the regression of the average pre-test of the first cohort on the average pre-test of the second cohort: it is actually a regression that inverts variables temporal order. Recall that the school has already treated the first cohort. Now, if the average pre-test of the second cohort is uncorrelated with the average pre-test of the first cohort, then the initial information provided by the second cohort is in every respect different from the initial information provided by the first cohort. The school is facing a new initial condition and, consequently, the residual is the bigger one. Similarly, if the average previous exam of the second cohort predicts the average previous exam of the first cohort, then the school has initial information similar from when the first cohort was treated and therefore the residual is the smaller one. This residual, pre-multiplying by $γ_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _2$$\end{document} , is the second explanatory factor. Note that this interpretation remains valid for the case of $T > 2$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T>2$$\end{document} cohorts, particularly regarding the role played by the parameters $γ_{t}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _t$$\end{document} ’s and the regressions of the form $E ({\bar{X}}_{t, i}^{'} ∣ X_{Ti})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E\left( \overline{\varvec{X}}_{t,i}'\mid \varvec{X}_{Ti}\right) $$\end{document} .
2. How much of the variance of $V A_{2 i} (X_{2 i})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$VA_{2i}(\varvec{X}_{2i})$$\end{document} does $E [V A_{2 i} (X_{2 i}) ∣ V A_{1 i} (X_{1 i}), X_{1, i}^{2}, ψ_{1}^{2}]$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E\,[VA_{2i}(\varvec{X}_{2i})\mid VA_{1i}(\varvec{X}_{1i}),\varvec{X}_{1,i}^2,\varvec{\psi }_1^2\,]$$\end{document} explain? This question can be addressed by computing the so-called reliability, which in this case is given by
$\begin{matrix} \frac{τ_{1}^{2} {(ϕ_{12} + γ_{2})}^{2}}{ω_{2 i}} . \end{matrix}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \frac{\tau _1^2(\phi _{12}+\gamma _2)^2}{\omega _{2i}}. \end{aligned}$$\end{document}
Note that it is always strictly less than 1, which means that $E [V A_{2 i} (X_{2 i}) ∣ V A_{1 i} (X_{1 i}), X_{1, i}^{2}, ψ_{1}^{2}]$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E\,[VA_{2i}(\varvec{X}_{2i})\mid VA_{1i}(\varvec{X}_{1i}),\varvec{X}_{1,i}^2,\varvec{\psi }_1^2\,]$$\end{document} does not exhaust the entire explanation of $V A_{2 i} (X_{2 i})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$VA_{2i}(\varvec{X}_{2i})$$\end{document} : in the persistence of school effectiveness there are always new aspects that $V A_{1 i} (X_{1 i})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$VA_{1i}(\varvec{X}_{1i})$$\end{document} cannot predict.

Along with this general interpretation of school persistence, it is instructive to show how they are simplified in Models 1 and 2.

Under Model 1, $γ_{2} = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _2=0$$\end{document} and therefore

(2.29)

\begin{matrix} E [V A_{2 i} (X_{2 i}) ∣ V A_{1 i} (X_{1 i}), X_{1, i}^{2}, ψ_{1}^{2}] = ϕ_{12} V A_{1 i} (X_{1 i}), \end{matrix}

and the reliability becomes equal to $τ_{1}^{2} ϕ_{12}^{2} / [τ_{1}^{2} ϕ_{12}^{2} + τ_{2}^{2} (1 - ϕ_{12}^{2})]$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _1^2\phi _{12}^2/[\tau _1^2\phi _{12}^2+\tau _2^2(1-\phi _{12}^2)]$$\end{document} . As a result, the persistence of school value-added corresponds to a uniform reduction or shrinkage of $V A_{1 i} (X_{1 i})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$VA_{1i}(\varvec{X}_{1i})$$\end{document} . Thus, an empirical analysis of the persistence of school effectiveness under Model 1 means analyzing to what extent a school that takes on new cohorts maintains the effectiveness achieved when accommodating the first cohort, leacing additional aspects included in $V A_{2 i} (X_{2 i})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$VA_{2i}(\varvec{X}_{2i})$$\end{document} unexplained by the past of the school. What is clear is that Model 1 is very different from Model 3.

Under Model 2, $ϕ_{12} = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{12}=0$$\end{document} and therefore

(2.30)

\begin{matrix} E (V A_{2 i} (X_{2 i}) ∣ V A_{1 i} (X_{1 i}), X_{1, i}^{2}, ψ_{1}^{2}) = γ_{2} V A_{1 i} (X_{1 i}) + \frac{γ_{2}}{n_{2 i}} \sum_{j = 1}^{n_{2 i}} ({\bar{X}}_{1 i} - E ({\bar{X}}_{1 i} ∣ X_{2 i j})) β_{1}, \end{matrix}

and the reliability becomes equal to

\begin{matrix} \frac{τ_{1}^{2} γ_{2}^{2}}{τ_{1}^{2} γ_{2}^{2} + \frac{γ_{2}^{2} σ_{1}^{2}}{n_{1 i}} + τ_{1}^{2}} \end{matrix}

which is still strictly smaller than 1. As a result, when the sign of $γ_{2} + ϕ_{12}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _2+\phi _{12}$$\end{document} in Model 3 is determined by $γ_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _2$$\end{document} , the structural interpretation of the persistence of school effectiveness in Model 2 is practically the same as Model 3. Taking into account such an interpretation, school effectiveness corresponds to what a school does/adds with/to a new cohort after discounting what the school learned to do/add while accommodating the previous cohort.

3. Computation and Model Fitting

We adopt a Bayesian approach and as a result prior distributions for all parameters for Models 0–3 need to be specified. For parameters common to each model we employ the following regularly used conjugate priors $τ_{t}^{2} \sim I G (1, 1)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _t^2 \sim IG(1,1)$$\end{document} , $ϕ_{0 t} \sim N (0, 10^{2})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{0t} \sim {{\mathcal {N}}}(0, 10^2)$$\end{document} , $β_{t} \sim N (0, 10^{2})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _t \sim {{\mathcal {N}}}(0, 10^2)$$\end{document} , and $σ_{ti}^{2} \sim I G (1, 1)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma ^2_{ti} \sim IG(1,1)$$\end{document} for $t = 1, 2$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t=1, 2$$\end{document} and $i = 1, \dots, m$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i=1, \ldots , m$$\end{document} . Here IG(a, b) denotes an inverse gamma distribution with shape a and scale b. For Model 1 and 3 we use $ϕ_{12} \sim U N (- 1, 1)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{12} \sim UN(-1,1)$$\end{document} where UN denotes a Uniform distribution and for Model 2 and 3, $γ_{2} \sim N (0, 10^{2})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _2 \sim {{\mathcal {N}}}(0, 10^2)$$\end{document} . Fitting Models 0–3 and estimating value-added across time as described in Sect. 2.5 requires sampling from joint posterior distribution of all model parameters. To carry this out we developed a straightforward Markov Chain Monte Carlo (MCMC) algorithm that uses a Gibbs sampler to update all parameters on a one-by-one basis save $ϕ_{12}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{12}$$\end{document} . To update this parameter within the MCMC sampler a random walk metropolis step with a gaussian candidate proposal is used. All computer codes needed to fit models and estimate each school’s value-added are provide in the R package modernVA (Page, 2020).

4. Simulation Study

4.1. Design of the Simulation Study

The objective of the simulation study is to explore the impact that ignoring temporal dependence may have on value-added estimates. The experiment consisted of using Models 0, 1, 2, and 3 as data generating mechanisms and then for each synthetic data, using Models 0, 1, 2, and 3 to estimate value-added across time for each institution. In more detail, we set $I = 250$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$I = 250$$\end{document} and $n_{i} = 25$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n_i =25$$\end{document} for $i = 1, \dots, I$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i = 1, \ldots , I$$\end{document} and using Model 0, 1, 2, or 3 generated post-test scores (i.e., $Y_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_1$$\end{document} and $Y_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_2$$\end{document} ) by setting $β_{1} = 0.6$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _1 = 0.6$$\end{document} , $β_{2} = 0.75$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _2 = 0.75$$\end{document} , $σ_{1}^{2} = σ_{2}^{2} = 25$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma ^2_1 = \sigma ^2_2 = 25$$\end{document} and $τ_{1}^{2} = τ_{2}^{2} = 100$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _1^2 = \tau ^2_2=100 $$\end{document} . The pre-test scores (i.e., $X_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_1$$\end{document} and $X_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_2$$\end{document} ) for both time periods were generated independently using $N (0, 200)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\mathcal {N}}}(0, 200)$$\end{document} . The intercept values at each time point were $ϕ_{01} = ϕ_{02} = 10$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{01} = \phi _{02} = 10$$\end{document} . Synthetic data sets based on Models 1, 2, and 3 were generated by considering $ϕ_{12} \in {0.1, 0.25, 0.5, 0.75, 0.9}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{12} \in \{0.1, 0.25, 0.5, 0.75, 0.9\}$$\end{document} and $γ_{2} \in {0.1, 0.25, 0.5, 1.0, 2.0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _2 \in \{0.1, 0.25, 0.5, 1.0, 2.0\}$$\end{document} .

For each data generating scenario 1000 datasets were created and to each the 4 models were fit. For each model the value-added at each time period was estimated for each school using posterior means of the estimators provided in (2.23) for $ϕ_{12} = γ_{2} = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{12}=\gamma _2=0$$\end{document} (value-added for Model 0), for $γ_{2} = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _2=0$$\end{document} (value-added for Model 1), for $ϕ_{12} = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{12}=0$$\end{document} (value-added for Model 2) and for $ϕ_{12} \neq$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{12}\ne $$\end{document} and $γ_{2} \neq 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _2\ne 0$$\end{document} (value-added for Model 3). Producing credible intervals for value-added estimates is straightforward once draws from the corresponding posterior distributions for each estimator are collected. To compare the methods we average the mean squared error (MSE), coverage, and 95% credible interval widths across the 250 school’s value-added estimates for each time point. Coverage was estimated by calculating the proportion of the 95% credible intervals that contained the true value-added values (i.e., the values calculated using (2.23) for $ϕ_{12} = γ_{2} = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{12}=\gamma _2=0$$\end{document} (value-added for Model 0), for $γ_{2} = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _2=0$$\end{document} (value-added for Model 1), for $ϕ_{12} = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{12}=0$$\end{document} (value-added for Model 2) and for $ϕ_{12} \neq$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{12}\ne $$\end{document} and $γ_{2} \neq 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _2\ne 0$$\end{document} (value-added for Model 3)). Results associated with using Model 1 as a data generating mechanism are provided in Fig. 1, those for Model 2 are displayed in Fig. 2 and those for Model 3 in Fig. 4. In the Figs. 1 and 2, the first row corresponds to results associated with $V A_{1} (X_{1})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$VA_{1}(\varvec{X}_{1})$$\end{document} , the second row corresponds to $V A_{2} (X_{2})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$VA_{2}(\varvec{X}_{2})$$\end{document} , the third to $α_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _1$$\end{document} and the fourth to $α_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _2$$\end{document} . The first column displays MSE values associated with the posterior mean estimator of $V A_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$VA_1$$\end{document} , $V A_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$VA_2$$\end{document} , $α_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _1$$\end{document} , and $α_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _2$$\end{document} averaged over the $I = 250$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$I=250$$\end{document} schools. The second column in both figures displays the 95% credible interval width averaged over the 250 schools, and the third column corresponds to the coverage property of the 95% credible intervals.

Figure 1 Results from the simulation study when using Model 1 as a data generating mechanism. The MSE, interval widths, and coverage values are averages across 1000 generated sets and the 250 schools.

Figure 2 Results from the simulation study when using Model 2 as a data generating mechanism. The MSE, interval widths, and coverage values are averages across 1000 generated sets and the 250 schools.

Figure 3 Model fits measured using log pseudo marginal likelihood (LPML) for models 0, 1, and 2.

Figure 4 Model fits using Model 3 as a data generating mechanism. The first three rows correspond to the MSE, interval width, and coverage of $V A_{2 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$VA_{2i}$$\end{document} . The last corresponds to the LPML. Columns indicate which $γ_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _2$$\end{document} value was used to generate data and the x-axis tick marks indicate the same thing for $ϕ_{12}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{12}$$\end{document} .

4.2. Conclusions of the Simulation Study

Focusing on Fig. 1, it appears that Model 1 performs best in estimating value-added for both time periods but much more so in the first time period. That is, the value-added estimates under Model 1 have the smallest MSE and the shortest credible interval widths while maintaining the same coverage rates as the other models. Even though Model 1 outperforms Model 3, Model 3’s performance is vastly superior to that of model 0 and 2, which is to be expected. In addition, it appears that even if dependence between cohorts doesn’t exist (i.e., $ϕ_{12} = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{12} = 0$$\end{document} ), little is lost by using Model 1 (or 3) and in fact the benefits of using Model 1 (or 3) manifest themselves for relatively small values of $ϕ_{12}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{12}$$\end{document} (e.g., $ϕ_{12} \approx 0.35$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{12} \approx 0.35$$\end{document} ). Finally, it seems that incorporating any type of temporal dependence in a value-added model (even if misspecified) provides benefit as even Model 2 outperforms Model 0 at estimating VA.

The same trends seen in Fig. 1 also appear in Fig. 2 (which displays results when Model 2 was used to generate data) although not as stark. It does seem that Model 2 over all out performs Model 0 at estimating value-added, but differences are more apparent in the estimation of the individual school random effects. Interestingly Model 1 performs better than Model 2 for $V A_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$VA_1$$\end{document} , while for $V A_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$VA_2$$\end{document} the smaller MSE comes at a decrease in coverage for Model 1. Model 2 and 3 are similar in most scenariose. As before, even though the temporal dependence in Model 1 and 3 is misspecified in this scenario, there are clear benefits to include temporal dependence in some way as both Model’s 1 and 3 outperforms Model 0 at estimating value-added particularly for the second cohort.

Next we provide Fig. 3 which displays the log pseudo marginal likelihood (LPML) values which can be used to evaluate model fit (see, Christensen et al. 2011, Chapter 4.9.2). Larger LPML values indicate a better fit. Here we see that even for weak temporal dependence incorporating the dependence in the value-added model results in better model fit. Interestingly, Model 1 tends to fit the data better than Model 2 even when Model 2 was used to generate data.

Lastly, Fig. 1 contains results when Model 3 is the true data generating mechanism. Here we only focus on performance for estimating $V A_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$VA_2$$\end{document} and trends are similar for other parameters and this allows us to be more concise in our description. For this figure, each column corresponds to the $γ_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _2$$\end{document} value used to generate data while the tick marks on the x-axis correspond to the $ϕ_{12}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{12}$$\end{document} values. The first three rows correspond to the MSE, coverage and interval width associated with $V A_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$VA_2$$\end{document} . The last row corresponds to the LPML model fit metric. Notice that as $ϕ_{12}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{12}$$\end{document} and $γ_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _2$$\end{document} both increase (i.e., there is more temporal dependence), that Model 3 outperforms the other models in terms of LPML. Model 1 has the smallest MSE values for $γ_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _2$$\end{document} but at the cost of a substantial decrease in coverage. Model 3’s MSE and interval widths are the smaller than those of Model 0 and quite comparable to Model 1 (though slightly smaller).

The take home message from Figs. 1, 2, 3 and 4 is that incorporating temporal dependence in the model greatly improves value-added estimates when temporal dependence exists, and does so at a minimal cost when temporal dependence between cohorts is absent. Since the meaning of the value-added indicators in Models 1, 2, and 3 are very different in terms of school persistence (see the discussions in Sect. 2.6), model selection should be motivated by the intended use of the VA estimates.

5. Analysis of SIMCE Data

5.1. The SIMCE Test

Chile administers a yearly large-scale standardized test called SIMCE (Sistema de Medición de la Calidad de la Educación, Measurement System of Quality of Education). The main subjects of this test are Language and Mathematics. The SIMCE test was created at the end of the 1980’s and coincided with the privatization of education which introduced issues such as competence among schools, private and public providers, vouchers to fund schools, universal school choice, for-profit schools, and co-payment (from 1993 to 2015). In this context, the SIMCE test was an instrument to aid parents in school choice decision-making, and to provide information necessary for schools to undertake data-based decision-making that would enhance school improvement efforts; for more details, see Meckes and Carrasco (Reference Meckes and Carrasco2010), Manzi and Preiss (2013), Manzi et al. (Reference Manzi, San Martín and Van Bellegem2014) and Page et al. (Reference Page, San Martín, Orellana and González2017).

5.2. Data to be Used in the Value-Added Analysis

We explore the extent of persistence in school effectiveness using two cohorts from schools to which the SIMCE test was administered. Here we only consider the results from the Mathematics part of the exam. The first cohort took the SIMCE exam as 4th graders in 2012 and then again as 6th graders in 2014 and the second cohort took the SIMCE exam as 4th graders during the 2014 school year and again during the 2016 school year as 6th graders. Thus $X_{i j 1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_{ij1}$$\end{document} denotes the 4th grade SIMCE score of the ith student at the the jth school in 2012 and $X_{i j 1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_{ij1}$$\end{document} the 4th grade SIMCE math score in 2014. Additionally, $Y_{i j 1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_{ij1}$$\end{document} denotes the 6th grade math SIMCE score in 2014 and $Y_{i j 2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_{ij2}$$\end{document} that of 2016. In total, the data set includes 2804 schools with the number of students per school ranging from 6 to 236 in cohort 1 and 6 to 258 in cohort 2. To these data, we fit Models 0, 1, 2, and 3 by retaining 1,000 MCMC iterates after discarding the first 10,000 as burnin and thinning by 200 (i.e., 210,000 total MCMC iterates were collected). Thinning was used, despite the potential loss of efficiency, to produce (essentially) independent samples from the posterior. The tdVA function in the modernVA R-package (Page, 2020) was used to fit all models and it took approximately 180 s for each.

5.3. Results

Before exploring school persistence in the four models, in Table 1 we provide the LPML and WAIC model fit metrics for each model. Again, larger LPML values indicate a better fit while smaller WAIC values indicate the same thing. It appears that both LPML and WAIC favor Models 1–3 over Model 0 with Model 1 fitting best based on both LPML and WAIC. The comparison between Model 0 and 1, and between Model 0 and 2 along with Model 1 and 3 and Models 2 and 3 seem adequate with respect to the nested structure of the models. Next in Table 2 we provide the posterior means and 95% credible intervals of $β_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _1$$\end{document} and $β_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _2$$\end{document} under each of the three models. Note that Models 2 and 3 produce very similar estimates of $β_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _1$$\end{document} and $β_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _2$$\end{document} while those for Model 0 and 1 differ slightly. Even so, it appears that differences (in terms of magnitude) between the four models in estimating $β_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _1$$\end{document} and $β_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _2$$\end{document} are minor.

Table 1 Model fit metrics for models 0, 1, and 2. For LPML, larger value indicates better fit, while for WAIC, smaller value indicates better fit.

Table 2 Posterior summaries of $β_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _1$$\end{document} and $β_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _2$$\end{document} for each of the three models detailed in this paper.

5.3.1. School Persistence Under Model 1

As discussed in Sect. 2.6 persistence under Model 1 is based on $ϕ_{12}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{12}$$\end{document} . Since $| ϕ_{12} | < 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|\phi _{12}| < 1$$\end{document} , persistence under this model corresponds to a uniform reduction in $V A_{1 i} (X_{1 i})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$VA_{1i}(\varvec{X}_{1i})$$\end{document} based on the magnitude of $ϕ_{12}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{12}$$\end{document} . For the SIMCE data the posterior mean of $ϕ_{12}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{12}$$\end{document} turned out to be 0.6 with a 95% credible interval of (0.57, 0.63). Based on the simulation study, the magnitude of the estimated value of $ϕ_{12}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{12}$$\end{document} is large enough to conclude that there is moderate to strong school effectiveness persistence among the schools of Chile based on these two cohorts. This results in a slight reduction in the credible interval widths for $V A_{1 i} (X_{1 i})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$VA_{1i}(\varvec{X}_{1i})$$\end{document} and $V A_{2 i} (X_{2 i})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$VA_{2i}(\varvec{X}_{2i})$$\end{document} compared to Model 0. In fact, the average credible interval width (across the 2804 schools) for $V A_{1 i} (X_{1 i})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$VA_{1i}(\varvec{X}_{1i})$$\end{document} under Model 0 is 19.1 compared to 18.6 under Model 1 and 18.9 for $V A_{2 i} (X_{2 i})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$VA_{2i}(\varvec{X}_{2i})$$\end{document} under Model 0 compared to 18.2 under Model 1.

As noted in (2.29), $ϕ_{12}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{12}$$\end{document} corresponds to the slope when regressing $V A_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$VA_2$$\end{document} onto $V A_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$VA_1$$\end{document} without an intercept. To verify this, for schools that have an average pre-test score for cohort 1 between 261 and 269 and an average pre-test score for cohort 2 between 253 and 261 (this resulted in 85 schools) we fit a least squares regression of the estimated $V A_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$VA_2$$\end{document} onto the estimated $V A_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$VA_1$$\end{document} without an intercept. The slope of this regression line turned out to be 0.603 which is very close to the posterior mean of $ϕ_{12}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{12}$$\end{document} . This emphasizes the fact that the $V A_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$VA_2$$\end{document} is in general smaller than $V A_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$VA_1$$\end{document} and, therefore, we expect to observe that schools taking on the new second cohort to not necessarily maintain the effectiveness achieved for the first cohort.

As a matter of fact, to further explore the school effectiveness persistence through reduction or shrinkage of $V A_{2 i} (X_{2 i})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$VA_{2i}(\varvec{X}_{2i})$$\end{document} based on $ϕ_{12}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{12}$$\end{document} , we provide Table 3. The table illustrates the “stability” of value-added estimates for cohort 1 and cohort 2 under Model 1 by presenting the percentage of schools (from the same 85 schools identified previously) according to the quartile in which they were located based on cohort 1 and cohort 2’s value-added estimates. Note that since value-added is a metric that makes a comparison to a “reference” school, comparing estimated value-added percentiles across time is more reasonable than comparing the value-added estimates themselves. Thus, the values in Table 3 correspond to the percent of schools that belong to a particular combination of value-added quartiles between the two cohorts. For example, the entry at the upper left corner shows that 13% of the 85 schools had value-added estimates for cohort 1 and cohort 2 that belonged to the first quartile and the cell to its right shows that 5% of the 85 schools had a value-added estimate for cohort 1 that belonged to the second quartile, while that of cohort 2 belonged the first quartile. Other entries in the table can be interpreted similarly. Thus, higher values on the diagonal would indicate that the difference in value-added between the two cohorts is small. Notice that under Model 1 it appears that most of the differences between the two cohort’s value-added can be found among the schools that are in the second and third quartiles. The majority of schools whose school effectiveness for cohort 1 is strong (or weak) also have strong (or weak) effectiveness for cohort 2. In fact, very few schools go from high value-added to low (or visa-versa) under Model 1.

To highlight further the bearing that $ϕ_{12}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{12}$$\end{document} has on school effectiveness persistence under Model 1 we provide as a contrast Table 4, which displays the same information as Table 3 but for Model 0. Contrasting the results in these two tables it is possible to see that differences between cohort 1 and cohort 2’s value-added estimates under Model 0 are greater than that under Model 1. This can be seen first in terms of values on the diagonal, which are smaller in Table 4, 36% of schools, relative to the 42% in Table 3. Similarly, under Model 0 almost a quarter of the schools (24%) exhibited a change in the results of 2 or even 3 quartiles, as opposed the results under Model 1, where only 17% of the schools presented such large differences. Though this contrast indicates that Model 1 produces more stable results than Model 0, it is important to keep in mind that the stability of the results in absolute terms is low overall, which can be summarized overall through the use of Cohen’s Kappa, which corresponds to.24 in Table 3 and.38 in Table 4.

The improvement of stability in the results is worth highlighting as a relevant by-product of the use of Model 1 in this applied setting. Under Model 0, 4% of schools would receive results for cohort 1 indicating that they are in the highest quartile only to then receive a report for cohort 2 indicating they are in the lowest quartile, a scenario which is effectively eliminated under Model 1. The stability of results across time in value-added models is particularly relevant in applied settings, as presenting highly variable can appear to school officials as hard to interpret or simply as noise, a result that can undermine the credibility of the system and diminish the overall usefulness of the results.

Summarizing, though any value-added results will involve some degree of instability in the relative positions of schools across time, the expectation is that the variation is attributable to a real effect from schools related changes. However, the stability of value-added results will be inevitably affected by multiple sources of error, including the reliability of the test scores, the intraclass correlation of results at the school level, and the uncertainty associated with the value-added estimates themselves. In this case, the lack of stability between cohort1 and cohort 2’s value-added estimates under Model 0 could be due either to failing to account for temporal dependence between the school effects, or to some school changes between periods 1 and 2: this ambiguity is avoided by using Model 1.

Table 3 The percent of schools in each value-added quartile based on cohort 1 and cohort 2 for Model 1.

The schools included in this table are those that have average pre-test scores in cohort 1 between 261 and 269 and for cohort 2 between 253 and 261 (this resulted in 85 schools). Thus, the schools included in the table have similar student abilities in both cohorts.

Table 4 The percent of schools in each value-added quartile based on cohort 1 and cohort 2 for Model 0.

5.3.2. School Persistence Under Model 2

We now explore school persistence under Model 2. As mentioned in Sect. 2.6 school persistence based on Model 2 focuses on to what extent a school is able to do “new things” with a new cohort. If the information about the cohort is similar (i.e. pre-test of both cohorts are similar), then cohorts have similar abilities and teaching strategies devised for cohort 1 should be useful for cohort 2. The extent to which a school is able to do “new things” is reflected in $γ_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _2$$\end{document} . The posterior mean of $γ_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _2$$\end{document} turned out to be 0.35 with 95% credible interval (0.34, 0.37). Thus based on Model 2, persistence in school effectiveness depends on the way in which the corresponding school effect $α_{2 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{2i}$$\end{document} is corrected by $n_{2 i}^{- 1} β_{1} γ_{2} E ({\bar{X}}_{1 i} ∣ {\bar{X}}_{2 i})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n_{2i}^{-1}\,\beta _1\gamma _2\,E({\bar{X}}_{1i}\mid {\bar{X}}_{2i})$$\end{document} , where ${\hat{β}}_{1} \times {\hat{γ}}_{2} \approx 0.25$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widehat{\beta }}_1\times {\widehat{\gamma }}_2\approx 0.25$$\end{document} .

To quantify the amount of prior information provided to schools for each cohort by way of a pre-test we provide Fig. 5. The figure contains a scatterplot of ${\bar{X}}_{2 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\bar{X}}_{2i}$$\end{document} and ${\bar{X}}_{1 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\bar{X}}_{1i}$$\end{document} . The correlation between the pairs $({\bar{X}}_{1 i}, {\bar{X}}_{2 i})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({\bar{X}}_{1i}, {\bar{X}}_{2i})$$\end{document} turns out to be 0.68 indicating quite strong correlation between pre-test of cohort 1 and cohort 2. To further explore persistence under Model 2, we group schools based on the similarity between ${\bar{X}}_{1 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\bar{X}}_{1i}$$\end{document} and ${\bar{X}}_{2 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\bar{X}}_{2i}$$\end{document} . This is done by forming a group of schools whose residual value ( $r_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r_i$$\end{document} ) from the regression line found in Fig. 5 is $| r_{i} | < 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ |r_i| < 1 $$\end{document} (red group in Fig. 5), $1 \leq | r_{i} | < 2$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1 \le |r_i| < 2$$\end{document} (green group in Fig. 5), and $| r_{i} | > 2$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|r_i| > 2$$\end{document} (blue group in Fig. 5). For each group, tables similar to those in Sect. 5.3.1 are provided based on schools that had significantly different value-added estimates between the two cohorts (significance was establish if 95% credible intervals for $V A_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$VA_1$$\end{document} and $V A_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$VA_2$$\end{document} failed to intersect). First note that there are only a few schools that fall within the middle quartiles. This is a result of only considering schools with significantly different value-added estimates between cohorts. Notice further that as the information provided schools becomes less correlated (i.e., points farther from the regression line), there is a higher percentage of schools that improve their quartile group: 40% for red schools, 42% for green schools and 49% for blue schools. Moreover, for red schools 14% do not change of quartile group, for green schools 13% do not change, and for blue schools 13% do not change. This indicates that the less similar the pre-test cohorts are, the better the persistence of the schools. This suggests that the schools in the sample are able to improve their results by doing “new things” for the new cohort (Tables 5, 6, and 7).

Figure 5 Scatter plot of ${\bar{X}}_{2 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\bar{X}}_{2i}$$\end{document} and ${\bar{X}}_{1 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\bar{X}}_{1i}$$\end{document} . The strong linear dependence between the pre-test of the two cohorts indicates that prior information about the two cohorts is similar.

Table 5 The percent of schools in each value-added quartile based on cohort 1 and cohort 2 based on Model 2’s value-added estimates.

Schools included are those that correspond to the red points in Fig. 5 and that had significantly different value-added estimates in cohort 1 relatively to cohort 2 based on the criteria that the posterior 95% credible intervals didn’t intersect. The total number of schools is 210.

Table 6 The percent of schools in each value-added quartile based on cohort 1 and cohort 2 based on model 2’s value-added estimates.

Schools included correspond to the green points in Fig. 5 and that had significantly different value-added estimates in cohort 1 relatively to cohort 2 based on the criteria that the posterior 95% credible intervals didn’t intersect. The total number of schools is 208.

Table 7 The percent of schools in each value-added quartile based on cohort 1 and cohort 2 for model 2 based on model 2’s value-added estimates.

Schools included correspond to the blue points in Fig. 5 and that had significantly different value-added estimates in cohort 1 relatively to cohort 2 based on the criteria that the posterior 95 credible intervals didn’t intersect. The total number of schools is 229.

6. Conclusions

Value-added models are a plausible tool for monitoring the effectiveness of a school across time. From a policy perspective, it is important to identify schools whose effectiveness has changed dramatically (either improvement or deterioration) and those that maintain their effectiveness. However, the modeling challenge is being able to disentangle the instability of value-added indicators due to internal and/or external changes affecting the effectiveness of schools from the instability due to the specification of value-added models. To meet this challenge, we formulate time dependent value-added models, which are basically characterized by specifying the school effect related to cohort t as a function of the past performance of the school. By doing so, we intend to eliminate one source of instability that is contained in a value-added model where the school effects are mutually independent across time.

More specifically, we have proposed a value-added model that incorporates temporal dependence from two different perspectives, namely a dependence in the school random effects (Model 1) and a “shock” based on the post-test performance from the previous cohort (Model 2). The identification analysis indicated that both models are nested into Model 3, which in turn incorporates both temporal dependencies. The value-added indicators induced by Models 1 and 2 have different statistical interpretations and, consequently, are useful for different policy purposes. As a matter of fact, Model 1 assumes that the school effects are correlated over time as in ARIMA-type models, whereas Model 2 assumes that the current school effect is influenced by the post-tests from previous cohorts as a kind of “information shock”. An empirical analysis of the persistence of school effectiveness under Model 1 means analyzing to what extent a school that takes on new cohorts maintains the effectiveness achieved when accommodating the first cohort. Such an effectiveness may be viewed as a base line: the farther a cohort is from the first, the harder it is to maintain the effectiveness achieved with that first cohort. An empirical analysis of the persistence of school effectiveness under Model 2, means analyzing to what extent a school that deals with a new cohort is capable of doing “new things” with it. The focus of the empirical analysis is on the future performance after taking into account the “shocks” of information. In spite of that, Model 3 incorporates both time dependencies, and allows us to see that the persistence of school effectiveness corresponds to an additive combination of both the school value-added for cohort 1 and the information coming from cohort 1: the first additive component is related to the ARIMA-type model, whereas the second additive component is related to the “information shock” model. It is important to point out that if the parameter of shock $γ_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _2$$\end{document} is large enough (in absolute value), then the persistence of school effectiveness under Model 2 and 3 are quite similar.

In order to show that this modeling strategy leads to control one source of instability, we have also discussed the effects of Models 1, 2 and 3 in contrast with the traditional value-added Model 0 in terms of the stability of the school results in different cohorts. The results of the applied example in this study show that the use of models that include temporal dependence improves the consistency of the school results when contrasted with the single cohort model. This result is potentially relevant for the use of value-added models in applied settings, as high instability of the estimates can be hard to interpret, or worse, can be perceived as random by schools officials, potentially affecting the credibility of the overall value-added system. Thus, it seems that a reasonable approach to employing our method is to first fit Model 3 (i.e., the Full Model) and then carry out hypothesis tests (our use Bayes Factors) to determine if would be appropriate to use either Model 1 or 2 (i.e., assume that $ϕ_{12} = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{12} = 0$$\end{document} or $γ_{2} = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _2 = 0$$\end{document} ). What we provide in this paper is an all encompassing modeling strategy regardless of which model a particular data set favors. The interpretation of value-added and persistence in school effectiveness for all models has been studied and all are based on theoretically sound arguments.

A related, and relevant, characteristic of the proposed models is the “shrinkage” effect associated with the use of random effects. We consider this as another valuable characteristic of the proposed models, as the effect will tend to moderate results where there are fewer cases (i.e., less evidence) to draw inferences regarding the school value added effect. This more conservative estimate will not necessarily disadvantage schools that may have fewer students in a particular cohort, as in systems that have incentives to both punish and reward with lower VA estimates (and particularly in those that just include incentives to punish schools), the conservative estimates will tend to shield these schools from suffering penalties due to extreme negative results. Though this effect will also preclude them from potentially being rewarded due to extreme positive results, we contend that this is a reasonable trade-off; particularly when these effects will play out in the face of less empirical evidence than the one available for other schools. This moderating effect is particularly relevant as part of the larger issue of inclusion or removal of schools with few observations as part of a national system, particularly given that the number of cases with common responses can vary year to year. Though one approach is to remove these schools from the analysis, we consider that shrinkage effects constitute a feature that opens the possibility of maintaining them as part of a larger system even though in some years they would have been excluded based on an arbitrary cut point for sample size was used.

From a modeling perspective, we emphasize that Model 3 (along with its nested models) includes parameters that can be considered as characterizations of an educational system, namely $ϕ_{12} + γ_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{12}+\gamma _2$$\end{document} , $γ_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _2$$\end{document} and $ϕ_{12}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{12}$$\end{document} : the first one characterizes, for all schools, the sign of the correlation between school effect for cohort 1 and school effect for cohort 2; this correlation is essentially characterized by $ϕ_{12}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{12}$$\end{document} . The first and second parameters characterize, for all schools, the sign of the between-cohort correlation. As next steps in developing this modeling approach we propose specifying parameters like $ϕ_{12}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi _{12}$$\end{document} and $γ_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _2$$\end{document} at sub-groups levels to identify population heterogeneity of these effects.

Last, but not least, we highlight the fact that the proposed models can be fit using the R package modernVA (Page, 2020).

Acknowledgements

The second, third and fourth authors were partially supported by the Millennium Nucleus on Intergenerational Mobility MOVI. Part of this research was developed in the context of a scientific consultancy for the National Agency for Quality Education of the Government of Chile. The perspective developed in this paper does not necessarily represent those of the Agency. The authors thank two anonymous referees for their comments and questions, which helped improve the paper.

Data availibility

The data used in this paper will be included in the supplementary material.

Declarations

Conflict of interest

The authors report no Conflict of interest.

A. Appendix

A.1. Value-added in the Full Model

Let us provide the derivations for Model 2 only: for the first cohort, we have

\begin{matrix} V A_{1 i} (X_{1 i}) ≐ & \frac{1}{n_{1 i}} \sum_{j = 1}^{n_{1 i}} (E (Y_{1 i j} ∣ X_{1 i j}, α_{1 i}) - E (Y_{1 i j} ∣ X_{1 i j})) \\ = & \frac{1}{n_{1 i}} \sum_{j = 1}^{n_{1 i}} (X_{1 i j}^{'} β_{1} + α_{1 i} - E [E (Y_{1 i j} ∣ X_{2 i j}, X_{1 i j}) ∣ X_{1 i j}]) \\ = & \frac{1}{n_{1 i}} \sum_{j = 1}^{n_{1 i}} (X_{1 i j}^{'} β_{1} + α_{1 i} - E [ϕ_{01} + X_{1 i j}^{'} β_{1} ∣ X_{1 i j}]) \\ = & α_{1 i} - ϕ_{01} . \end{matrix}

Similarly, for cohort 2,

\begin{matrix} V A_{2 i} (X_{2 i}) ≐ & \frac{1}{n_{2 i}} \sum_{j = 1}^{n_{2 i}} (E (Y_{2 i j} ∣ X_{2 i j}, α_{2 i}) - E (Y_{2 i j} ∣ X_{2 i j})) \\ = & \frac{1}{n_{2 i}} \sum_{j = 1}^{n_{2 i}} (X_{2 i j}^{'} β_{2} + α_{2 i} - E [E (Y_{2 i j} ∣ X_{2 i j}, X_{1 i j}) ∣ X_{2 i j}]) \\ = & \frac{1}{n_{2 i}} \sum_{j = 1}^{n_{2 i}} (X_{2 i j}^{'} β_{2} + α_{2 i} - E [ϕ_{02} + ϕ_{01} (ϕ_{12} + γ_{2}) + X_{2 i j}^{'} β_{2} + γ_{2} {\bar{X}}_{1 i} β_{1} ∣ X_{2 i j}]) \\ = & α_{2 i} - [ϕ_{02} + ϕ_{01} (ϕ_{12} + γ_{2})] - \frac{γ_{2}}{n_{2 i}} β_{1}^{'} \sum_{j = 1}^{n_{2 i}} E ({\bar{X}}_{1 i}^{'} ∣ X_{2 i j}) . \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} VA_{2i}(\varvec{X}_{2i})\doteq & {} \frac{1}{n_{2i}}\sum _{j=1}^{n_{2i}}\left[ E(Y_{2ij}\mid \varvec{X}_{2ij},\alpha _{2i})- E(Y_{2ij}\mid \varvec{X}_{2ij})\right] \\= & {} \frac{1}{n_{2i}}\sum _{j=1}^{n_{2i}}\left\{ \varvec{X}_{2ij}'\varvec{\beta }_2+\alpha _{2i}-E[E(Y_{2ij}\mid \varvec{X}_{2ij},\varvec{X}_{1ij})\mid \varvec{X}_{2ij}]\right\} \\= & {} \frac{1}{n_{2i}}\sum _{j=1}^{n_{2i}}\left\{ \varvec{X}_{2ij}'\varvec{\beta }_2+\alpha _{2i} -E[\phi _{02}+\phi _{01}(\phi _{12}+\gamma _2)+\varvec{X}_{2ij}'\varvec{\beta }_2+ \gamma _2\overline{\varvec{X}}_{1i}\varvec{\beta }_1\mid \varvec{X}_{2ij}]\right\} \\= & {} \alpha _{2i}-[\phi _{02}+\phi _{01}(\phi _{12}+\gamma _2)]-\frac{\gamma _2}{n_{2i}}\, \varvec{\beta }_1'\,\sum _{j=1}^{n_{2i}}E(\overline{\varvec{X}}_{1i }'\mid \varvec{X}_{2ij}). \end{aligned}$$\end{document}

A.2. Correction factor for the school value-added under the Full Model

As it was discussed in Sect. 2.5, the school value-added for cohort 2 under the Full Model is equal to the centered school effect $α_{2 i} - (ϕ_{02} + ϕ_{01} γ_{2})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{2i}-(\phi _{02}+\phi _{01}\gamma _2)$$\end{document} plus a correction factor given by

\begin{matrix} \frac{γ_{2}}{n_{2 i}} β_{1}^{'} \sum_{j = 1}^{n_{2 i}} E ({\bar{X}}_{1 i}^{'} ∣ X_{2 i j}), where (γ_{2}, β_{1}) \in R \times R^{p_{1}} . \end{matrix}

Then

\begin{matrix} E ({\bar{X}}_{1 i}^{'} ∣ X_{2 i j}) = (\begin{matrix} E ({\bar{X}}_{1 i, 1} ∣ X_{2 i j}) \\ E ({\bar{X}}_{1 i, 2} ∣ X_{2 i j}) \\ ⋮ \\ E ({\bar{X}}_{1 i, p_{1}} ∣ X_{2 i j}) \end{matrix}) \end{matrix}

Now, for the k-th covariate, let us assume that $E ({\bar{X}}_{1 i, k} ∣ X_{2 i j}) = b_{k 0} + b_{k}^{'} X_{2 i j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E({\overline{X}}_{1i, k}\mid \varvec{X}_{2ij})=b_{k0}+\varvec{b}_k'\varvec{X}_{2ij}$$\end{document} , where $b_{k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{b}_k$$\end{document} is a $p_{2} \times 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_2\times 1$$\end{document} vector and $b_{k 0} \in R$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{k0}\in {\mathbb {R}}$$\end{document} . One possibility is to consider $b_{k} = d_{k} e_{k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_k=d_k\,e_k$$\end{document} where $d_{k} \in R$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_k\in {\mathbb {R}}$$\end{document} and $e_{k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$e_k$$\end{document} is the k-th vector of the canonical basis of $R^{p_{2}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {R}}^{p_2}$$\end{document} . Therefore,

\begin{matrix} E ({\bar{X}}_{1 i}^{'} ∣ X_{2 i j}) = & (\begin{matrix} b_{10} & b_{11} & b_{12} & \dots & b_{1 p_{2}} \\ b_{20} & b_{21} & b_{22} & \dots & b_{2 p_{2}} \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ b_{p_{1} 0} & b_{p_{1} 1} & b_{p_{1} 2} & \dots & b_{p_{1} p_{2}} \end{matrix}) (\begin{matrix} 1 \\ X_{2 i j, 1} \\ X_{2 i j, 2} \\ ⋮ \\ X_{2 i j, p_{2}} \end{matrix}) ≐ B Z_{2 i j}, \end{matrix}

where $B$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{B}$$\end{document} is a $p_{1} \times (p_{2} + 1)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_1\times (p_2+1)$$\end{document} matrix and $Z_{2 i j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Z}_{2ij}$$\end{document} is a $(p_{2} + 1) \times 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(p_2+1)\times 1$$\end{document} vector.

We stack the conditional expectations by considering the $n_{2 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n_{2i}$$\end{document} students:

\begin{matrix} (\begin{matrix} E ({\bar{X}}_{1 i}^{'} ∣ X_{2 i 1}) \\ E ({\bar{X}}_{1 i}^{'} ∣ X_{2 i 2}) \\ ⋮ \\ E ({\bar{X}}_{1 i}^{'} ∣ X_{2 i n_{2 i}}) \end{matrix}) = & (\begin{matrix} B Z_{2 i 1} \\ B Z_{2 i 2} \\ ⋮ \\ B Z_{2 i n_{2 i}} \end{matrix}) = (I_{n_{2 i}} \otimes B) (\begin{matrix} Z_{2 i 1} \\ Z_{2 i 2} \\ ⋱ \\ Z_{2 i n_{2 i}} \end{matrix}) \end{matrix}

Then

\begin{matrix} \sum_{j = 1}^{n_{2 i}} E ({\bar{X}}_{1 i}^{'} ∣ X_{2 i j}) = & (ı_{n_{2 i}}^{'} \otimes I_{p_{1}}) (\begin{matrix} E ({\bar{X}}_{1 i}^{'} ∣ X_{2 i 1}) \\ E ({\bar{X}}_{1 i}^{'} ∣ X_{2 i 2}) \\ ⋮ \\ E ({\bar{X}}_{1 i}^{'} ∣ X_{2 i n_{2 i}}) \end{matrix}) \\ = & (ı_{n_{2 i}}^{'} \otimes I_{p_{1}}) (I_{n_{2 i}} \otimes B) (\begin{matrix} Z_{2 i 1} \\ Z_{2 i 2} \\ ⋱ \\ Z_{2 i n_{2 i}} \end{matrix}) \\ = & (ı_{n_{2 i}}^{'} \otimes B) (\begin{matrix} Z_{2 i 1} \\ Z_{2 i 2} \\ ⋱ \\ Z_{2 i n_{2 i}} \end{matrix}) \\ = & \sum_{j = 1}^{n_{2 i}} B Z_{2 i j} = B \sum_{j = 1}^{n_{2 i}} Z_{2 i j} \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \sum _{j=1}^{n_{2i}} E(\overline{\varvec{X}}_{1i }'\mid \varvec{X}_{2ij})= & {} (\varvec{\imath }_{n_{2i}}'\otimes \varvec{I}_{p_1})\, \left( \begin{array}{c} E(\overline{\varvec{X}}_{1i }'\mid \varvec{X}_{2i1})\\ E(\overline{\varvec{X}}_{1i }'\mid \varvec{X}_{2i2})\\ \vdots \\ E(\overline{\varvec{X}}_{1i }'\mid \varvec{X}_{2in_{2i}}) \end{array}\right) \\{} & {} \\= & {} (\varvec{\imath }_{n_{2i}}'\otimes \varvec{I}_{p_1})\,\left( \varvec{I}_{n_{2i}}\,\otimes \,\varvec{B}\right) \, \left( \begin{array}{cccc} \varvec{Z}_{2i1} &{} &{} &{} \\ &{} \varvec{Z}_{2i2} &{} &{} \\ &{} &{} \ddots &{} \\ &{} &{} &{} \varvec{Z}_{2in_{2i}} \end{array}\right) \\{} & {} \\= & {} (\varvec{\imath }_{n_{2i}}'\otimes \varvec{B})\, \left( \begin{array}{cccc} \varvec{Z}_{2i1} &{} &{} &{} \\ &{} \varvec{Z}_{2i2} &{} &{} \\ &{} &{} \ddots &{} \\ &{} &{} &{} \varvec{Z}_{2in_{2i}} \end{array}\right) \\{} & {} \\= & {} \sum _{j=1}^{n_{2i}}\varvec{B}\varvec{Z}_{2ij} = \varvec{B}\,\sum _{j=1}^{n_{2i}}\varvec{Z}_{2ij} \end{aligned}$$\end{document}

Therefore,

\begin{matrix} \frac{1}{n_{2 i}} \sum_{j = 1}^{n_{2 i}} E ({\bar{X}}_{1 i}^{'} ∣ X_{2 i j}) = B \frac{1}{n_{2 i}} \sum_{j = 1}^{n_{2 i}} Z_{2 i j} = B {\bar{Z}}_{2 i} = B (\begin{matrix} 1 \\ {\bar{X}}_{2 i, 1} \\ ⋮ \\ {\bar{X}}_{2 i, p_{2}} \end{matrix}) . \end{matrix}

Now, a linear regression is based on the decomposition

\begin{matrix} {\bar{X}}_{1 i}^{'} = E ({\bar{X}}_{1 i}^{'} ∣ X_{2 i j}) + ({\bar{X}}_{1 i}^{'} - E ({\bar{X}}_{1 i}^{'} ∣ X_{2 i j})) = E ({\bar{X}}_{1 i}^{'} ∣ X_{2 i j}) + u_{i} . \end{matrix}

Then, after averaging on j, we have

\begin{matrix} \underset{p_{1} \times 1}{\underset{⏟}{{\bar{X}}_{1 i}^{'}}} = \underset{p_{1} \times (p_{2} + 1)}{\underset{⏟}{B}} \underset{(p_{2} + 1) \times 1}{\underset{⏟}{{\bar{Z}}_{2 i}}} + u_{i} . \end{matrix}

Therefore, assuming that there are I schools, we have

\begin{matrix} (\begin{matrix} {\bar{X}}_{11}^{'} \\ ⋮ \\ {\bar{X}}_{1 I}^{'} \end{matrix}) = (I_{I} \otimes B) (\begin{matrix} {\bar{Z}}_{21}^{'} \\ ⋮ \\ {\bar{Z}}_{2 I}^{'} \end{matrix}) + u . \end{matrix}

Footnotes

Supplementary Information The online version contains supplementary material available at https://doi.org/10.1007/s11336-024-09979-0.

The manuscript was handled by the ARCS Editor Dr. Nidhi Kohl.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

Aitkin, M., Longford, N. (1986). Statistical modelling issues in school effectiveness studies. Journal of the Royal Statistical Society: Series A (General), 149(1), 1–26.CrossRef Google Scholar

Amrein-Beardsley, A., Holloway, J. (2019). Value-added models for teacher evaluation and accountability: Commonsense assumptions. Educational Policy, 33(3), 516–542.CrossRef Google Scholar

Ballou, D., Sanders, W., Wright, P. (2004). Controlling for student background in value-added assessment of teachers. Journal of Educational and Behavioral Statistics, 29(1), 37–65.CrossRef Google Scholar

Bellei, C., Vanni, X., Valenzuela, J. P., Contreras, D. (2016). School improvement trajectories: An empirical typology. School Effectiveness and School Improvement, 27, 275–292.CrossRef Google Scholar

Bianconcini, S., Cagnone, S. (2012). A general multivariate latent growth model with applications to student achievement. Journal of Educational and Behavioral Statistics, 37(2), 339–364.Google Scholar

Billingsley, P. (1968). Convergence of probability measures, Wiley.Google Scholar

Briggs, D. C., Weeks, J. P. (2011). The persistence of school-level value-added. Journal of Educational and Behavioral Statistics, 36, 616–637.CrossRef Google Scholar

Christensen, R., Johnson, W., Branscum, A., & Hanson, T. (2011). Bayesian ideas and data analysis: An introduction for scientists and statisticians. Taylor & Francis. https://books.google.com/books?id=qPERhCbePNcC.Google Scholar

Clarke, P., Crawford, C., Steele, F., Vignoles, A. (2015). Revisiting fixed-and random-effects models: Some considerations for policy-relevant education research. Education Economics, 23(3), 259–277.CrossRef Google Scholar

Ehlert, M., Koedel, C., Parsons, E., Podgursky, M. J. (2014). The sensitivity of value-added estimates to specification adjustments: Evidence from school- and teacher-level models in missouri. Statistics and Public Policy, 1(1), 19–27.CrossRef Google Scholar

Engle, R. E., Hendry, D. F., Richard, J. F. (1983). Exogeneity. Econometrica, 51, 277–304.CrossRef Google Scholar

EPI Briefing Paper. (2010). Problems with the use of student test scores to evaluate teachers. Economic Policy Institute.Google Scholar

Fariña, P., González, J., & San Martín, E. (2019). The use of an identifiability-based strategy for the interpretation of parameters in the 1PL-G and Rasch models. Psychometrika, 84, 511–528.CrossRef Google Scholar

Fisher, R. A. (1973). Statistical methods for research workers, Hafner Publishhing.Google Scholar

Fitzmaurice, G., Laird, N., Ware, J. (2004). Applied longitudinal analysis, Wiley.Google Scholar

Florens, J.-P., Marimoutou, V., Péguin-Feissolle, A. (2007). Econometric modeling and inference, Cambridge University Press.CrossRef Google Scholar

Gray, J., Goldstein, H., Jesson, D. (1996). Changes and improvements in schools’ effectiveness: Trends over five years. Research Papers in Education, 11, 35–51.CrossRef Google Scholar

Gray, J., Goldstein, H., Thomas, S. (2001). Predicting the future: The role of past performance in determining trends in institutional effectiveness at A level. British Educational Research Journal, 27, 391–405.CrossRef Google Scholar

Gray, J., Hopkins, D., Reynolds, D., Wilcox, B., Farrell, S., Jesson, D. (1999). Improving schools: Performance and potential, Open University Press.Google Scholar

Guldemond, H., Bosker, R. J. (2009). School effects on students’ progress: A dynamic perspective. School Effectiveness and School Improvement, 20(2), 255–268.CrossRef Google Scholar

Hanushek, E. A. (2020). Education production functions. In Bradley, S., Green, C. (Eds), The economics of education, Elsevier 161–170.CrossRef Google Scholar

Hsiao, C. (2014). Analysis of panel data, Cambridge University Press.CrossRef Google Scholar

Kinsler, J. (2012). Beyond levels and growth: Estimating teacher value-added and its persistence. Journal of Human Resources, 47(3), 722–753.CrossRef Google Scholar

Koedel, C., Mihaly, K., Rockoff, J. E. (2015). Value-added modeling: A review. Economics of Education Review, 47, 180–195.CrossRef Google Scholar

Kolmogorov, A. N. (1950). Foundations of the theory of probability, Chelsea Publishing Company.Google Scholar

Kyriakides, L., Georgiou, M. P., Creemers, B. P., Panayiotou, A., Reynolds, D. (2018). The impact of national educational policies on student achievement: A European study. School Effectiveness and School Improvement, 29(2), 171–203.CrossRef Google Scholar

Leckie, G. (2018). Avoiding bias when estimating the consistency and stability of value-added school effects. Journal of Educational and Behavioral Statistics, 43, 440–468.CrossRef Google Scholar

Lindley, D. V. (1983). Bayesian statistics: A review. In: CBMS-NSf regional conference series in applied mathematics, Philadelphia.Google Scholar

Liu, J., & Loeb, S. (2019). Engaging teachers: Measuring the impact of teachers on student attendance in secondary school. Journal of Human Resources, pp. 1216–8430R3.Google Scholar

Lockwood, J., McCaffrey, D. F., Mariano, L. T., Setodji, C. (2007). Bayesian methods for scalable multivariate value-added assessment. Journal of Educational and Behavioral Statistics, 32, 125–150.CrossRef Google Scholar

Longford, N. T. (2012). A revision of school effectiveness analysis. Journal of Educational and Behavioral Statistics, 37(1), 157–179.CrossRef Google Scholar

Lord, F. M., Novick, M. (1968). Statistical theories of mental test scores, Addison Wesley.Google Scholar

Manzi, J., & Preiss, D. (2013). Educational Assessment and Educational Achievement in South America. In J. Hattie & E. M. Anderman (Eds.), International guide to student achievement (p. chapter 9). Taylor and Friends.Google Scholar

Manzi, J., San Martín, E., Van Bellegem, S. (2014). School system evaluation by value added analysis under endogeneity. Psychometrika, 79(1), 130–153.CrossRef Google Scholar PubMed

McCaffey, D. F., Lockwood, J., Koretz, D. M., Louis, T. A., Hamilton, L. S. (2004). Models for value-added modeling of teacher effects. Journal of Educational and Behavioral Statistics, 29, 67–101.CrossRef Google Scholar

Meckes, L., Carrasco, R. (2010). Two decades of simce: An overview of the national assessment system in Chile. Assessment in Education: Principles, Policy and Practice, 17, 233–248.Google Scholar

Mouchart, M., & Oulhaj, A. (2006). The role of exogenous randomness in the identification of conditional models. Metron - International Journal of Statistics, LXIV:253–271.Google Scholar

Neveu, J. (1972). Martingales ‘a temps discret. Paris: Masson et CIE.Google Scholar

Page, G. L. (2020). modernva: An implementation of two modern education-based value-added models [Computer software manual]. https://CRAN.R-project.org/package=modernVA (R-package version 0.1.1).Google Scholar

Page, G. L., San Martín, E., Orellana, J., González, J. (2017). Exploring complete school effectiveness via quantile value-added. Journal of the Royal Statistical Society Series A, 180, 315–340.CrossRef Google Scholar

Papay, J. P. (2011). Different tests, different answers: The stability of teacher value-added estimates across outcome measures. American Educational Research Journal, 48(1), 163–193.CrossRef Google Scholar

Potthoff, R. F., Roy, S. N. (1964). A generalized multivariate analysis of variance model useful especially for growth curve problems. Biometrika, 51 3–4313–326.CrossRef Google Scholar

Reynolds, D., Sammons, P., Fraine, B. D., Damme, J. V., Townsend, T., Teddlie, C., Stringfield, S. (2014). Educational effectiveness research (EER): A state-of-the-art review. School Effectiveness and School Improvement, 25, 197–230.CrossRef Google Scholar

Rothstein, J. (2010). Teacher quality in educational production: Tracking, decay, and student achievement. The Quarterly Journal of Economics, 125(1), 175–214.CrossRef Google Scholar

Sanders, W. L., Horn, S. P. (1994). The Tennessee value-added assessment system (TVAAS): Mixed-model methodology in educational assessment. Journal of Personnel Evaluation in education, 8(3), 299–311.CrossRef Google Scholar

San Martín, E., González, J., Tuerlinckx, F. (2015). On the unidentifiability of the fixed-effects 3PL model. Psychometrika, 80, 450–467.CrossRef Google Scholar PubMed

San Martín, E., Jara, A., Rolin, J.-M., Mouchart, M. (2011). On the Bayesian nonparametric generalization of IRT-type models. Psychometrika, 76(3), 385–409.CrossRef Google Scholar

San Martín, E., Rolin, J.-M., Castro, L. M. (2013). Identification of the 1PL model with guessing parameter: Parametric and semi-parametric results. Psychometrika, 78, 341–379.Google Scholar PubMed

Sass, T. R., Hannaway, J., Xu, Z., Figlio, D. N., Feng, L. (2012). Value added of teachers in high-poverty schools and lower poverty schools. Journal of urban Economics, 72 2–3104–122.CrossRef Google Scholar

Scherrer, J. (2011). Measuring teaching using value-added modeling: The imperfect panacea. NASSP Bulletin, 95(2), 122–140.CrossRef Google Scholar

Strenio, J. F., Weisberg, H. I., & Bryk, A. S. (1983). Empirical bayes estimation of individual growth-curve parameters and their relationship to covariates. Biometrics, pp. 71–86.CrossRef Google Scholar

Tekwe, C. D., Carter, R. L., Ma, C.-X., Algina, J., Lucas, M. E., Roth, J., Resnick, M. B. (2004). An empirical comparison of statistical models for value-added assessment of school performance. Journal of Educational and Behavioral Statistics, 29(1), 11–36.CrossRef Google Scholar

Thomas, S., Peng, W. J., Gray, J. (2007). Modelling patterns of improvement over time: Value added trends in English secondary school performance across ten cohorts. Oxford Review of Education, 33, 261–295.CrossRef Google Scholar

Tymms, P., Merrell, C., Bailey, K. (2018). The long-term impact of effective teaching. School Effectiveness and School Improvement, 29(2), 242–261.CrossRef Google Scholar

Vanwynsberghe, G., Vanlaar, G., Van Damme, J., De Fraine, B. (2017). Long-term effects of primary schools on educational positions of students 2 and 4 years after the start of secondary education. School Effectiveness and School Improvement, 28(2), 167–190.CrossRef Google Scholar

Zimmerman, D. W. (1975). Probability spaces, hilbert spaces, and the axioms of test theory. Psychometrika, 40(3), 395–412.CrossRef Google Scholar

Figure 3 Model fits measured using log pseudo marginal likelihood (LPML) for models 0, 1, and 2.

Figure 4 Model fits using Model 3 as a data generating mechanism. The first three rows correspond to the MSE, interval width, and coverage of VA2i\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$VA_{2i}$$\end{document}. The last corresponds to the LPML. Columns indicate which γ2\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\gamma _2$$\end{document} value was used to generate data and the x-axis tick marks indicate the same thing for ϕ12\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\phi _{12}$$\end{document}.

Table 1 Model fit metrics for models 0, 1, and 2. For LPML, larger value indicates better fit, while for WAIC, smaller value indicates better fit.

Table 2 Posterior summaries of β1\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\beta _1$$\end{document} and β2\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\beta _2$$\end{document} for each of the three models detailed in this paper.

Table 3 The percent of schools in each value-added quartile based on cohort 1 and cohort 2 for Model 1.

Table 4 The percent of schools in each value-added quartile based on cohort 1 and cohort 2 for Model 0.

Figure 5 Scatter plot of X¯2i\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\bar{X}}_{2i}$$\end{document} and X¯1i\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\bar{X}}_{1i}$$\end{document}. The strong linear dependence between the pre-test of the two cohorts indicates that prior information about the two cohorts is similar.

Table 5 The percent of schools in each value-added quartile based on cohort 1 and cohort 2 based on Model 2’s value-added estimates.

Table 6 The percent of schools in each value-added quartile based on cohort 1 and cohort 2 based on model 2’s value-added estimates.

Table 7 The percent of schools in each value-added quartile based on cohort 1 and cohort 2 for model 2 based on model 2’s value-added estimates.

Page et al. Supplementary material

File 350.9 KB

Article contents

Temporally Dynamic, Cohort-Varying Value-Added Models

Abstract

Keywords

1. Introduction

2. Time-Dependent Value-Added Model

2.1. Sequential Model Specification

Remark 1

2.2. Likelihood Function

Theorem 1

2.3. Parameter Identification for Two Cohorts

Proposition 1

2.4. Nested Value-Added Models

2.5. Value-Added Definition

Remark 2

2.6. Structural Interpretation of the Persistence of School Effectiveness

3. Computation and Model Fitting

4. Simulation Study

4.1. Design of the Simulation Study

4.2. Conclusions of the Simulation Study

5. Analysis of SIMCE Data

5.1. The SIMCE Test

5.2. Data to be Used in the Value-Added Analysis

5.3. Results

5.3.1. School Persistence Under Model 1

5.3.2. School Persistence Under Model 2

6. Conclusions

Acknowledgements

Data availibility

Declarations

Conflict of interest

A. Appendix

A.1. Value-added in the Full Model

A.2. Correction factor for the school value-added under the Full Model

Footnotes

References

Page et al. Supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests