Likert-type rating scales are still by far the most popular choice to measure personality traits. However, they are susceptible to several response biases that threaten the validity and accuracy of the obtained traits scores, including acquiescence, extremity/centrality biases, or leniency tendencies (Paulhus & Vazire, Reference Paulhus, Vazire, Robins, Fraley and Krueger2007; Wetzel et al., Reference Wetzel, Böhnke, Brown, Leong, Bartram, Cheung, Geisinger and Iliescu2016). As a potential remedy, questionnaires that employ comparative judgments between two or more alternative items have gained a lot of attention (Brown & Maydeu-Olivares, Reference Brown and Maydeu-Olivares2013; Cheung & Chan, Reference Cheung and Chan2002; Paulhus, e.g., Reference Paulhus, Robinson, Shaver, Wrightsman, Robinson, Shaver and Wrightsman1991; Saville & Willson, Reference Saville and Willson1991). This is true specifically for forced choice (FC) formats where the comparative judgments can be expressed as a set of binary decisions on item pairs, forcing respondents to either endorse one or the other item (e.g., Brown & Maydeu-Olivares, Reference Brown and Maydeu-Olivares2011; Hontangas et al., Reference Hontangas, de la Torre, Ponsoda, Leenen, Morillo and Abad2015). Perhaps most importantly, FC formats and comparative judgments more generally have the potential to reduce faking of responses, as they prevent all items from being endorsed maximally at the same time (Cao & Drasgow, Reference Cao and Drasgow2019; Wetzel et al., Reference Wetzel, Böhnke, Brown, Leong, Bartram, Cheung, Geisinger and Iliescu2016). Faking tendencies tend to occur particularly in high-stakes situations, such as personal selection, where an individual’s responses and the subsequently estimated trait scores are used for inter-individual decision making (Cao & Drasgow, Reference Cao and Drasgow2019). If the faking tendency varies across individuals in a given population, this will strongly bias the obtained trait scores obtained from any fakeable questionnaire, thus invalidating their use for individual-level diagnostic decisions. Accordingly, developing faking-resistant personality questionnaires that yield highly accurate trait estimates even in high-stakes situations would be a major breakthrough for the fields of psychological diagnostics, differential psychology, and their areas of application.
In the most simple case, scoring such comparative tests proceeds by counting how often each of the items is endorsed, a procedure often referred to as classical scoring. For a long time, the lack of alternatives to the classical scoring procedure has been a major barrier to the application of comparative judgments in the context of personality measurement (Baron, Reference Baron1996; Hicks, Reference Hicks1970). This is because the classical scoring implies the within-person score mean across traits to be fixed by design of the scoring rule (Baron, Reference Baron1996). Hence, we obtain only ipsative trait estimates that enable comparisons within but not across individuals. For example, we can compare extraversion to emotional stability of Person A but not extraversion of Person A to extraversion of Person B. Of course, for individual-level diagnostic decisions, we need normative trait scores that enable comparisons both within and across individuals.
1. Obtaining Normative Traits Scores from Comparative Judgments
The Thurstonian Item Response Theory (TIRT) model has been proposed as a way to obtain normative trait scores from FC questionnaires (Brown & Maydeu-Olivares, Reference Brown and Maydeu-Olivares2011; Reference Brown and Maydeu-Olivares2012). It is perhaps the most widely applied IRT model for FC data and belongs to a wider class of models that all aim to obtain normative trait scores from such data (see Brown, Reference Brown2016a for an overview and unifying framework). Although these models differ from each other in the details of how the latent person and item parameters relate to the comparative responses, they all share the same mechanism by which they can achieve normative scoring: differential weighting of responses. When responses are differentially weighted, the within-person score means may vary across individuals and so between-person comparisons become possible, provided that the weighting itself is valid. From the perspective of (latent) linear factor analysis models, including TIRT models (Brown, Reference Brown2016a), there are two ways to achieve differential weighting: First, ensure that factor loadings differ between the compared items. Second, invert some of the items so that positively keyed items are also compared to negatively keyed items, so-called unequally keyed item pairs (Bürkner et al., Reference Bürkner, Schulte and Holling2019a). The second mechanism can be understood as an extreme case of the former because inverted items have, by definition, negative factor loadings, which implies particularly strong factor loading differences when compared to items with positive factor loadings.
Various simulation studies have demonstrated that normative scores can indeed be obtained by means of TIRT (or comparable) modeling approaches (Brown & Maydeu-Olivares, Reference Brown and Maydeu-Olivares2011; Bürkner et al., Reference Bürkner, Schulte and Holling2019a; Lee & Smith, Reference Lee and Smith2020; Schulte et al., Reference Schulte, Holling and Bürkner2020), although satisfactory estimation accuracy cannot easily be achieved under all practically relevant conditions (see Sect. 1.1 for details). Several real-world studies have compared TIRT-scored FC questionnaires to rating scales and have found that FC estimates correlate substantially with corresponding rating scale estimates (Guenole et al., Reference Guenole, Brown and Cooper2018; Lee et al., Reference Lee, Lee and Stark2018; Watrin et al., Reference Watrin, Geiger, Spengler and Wilhelm2019). Where investigated, validities were also mostly similar between the two formats (Anguiano-Carrasco et al., Reference Anguiano-Carrasco, MacCann, Geiger, Seybert and Roberts2015; Brown & Bartram, Reference Brown and Bartram2013; Lee et al., Reference Lee, Lee and Stark2018; Watrin et al., Reference Watrin, Geiger, Spengler and Wilhelm2019). At least, no consistent pattern favoring one over the other format could be found. With regard to fakability, meta-analytic evidence indicates that score inflation between honest and faking conditions can be lower for FC than for rating scale estimates if the FC comparisons are set up with faking resistance in mind (Cao & Drasgow, 2019). However, it remains unclear if score inflation alone captures all relevant aspects of fakability (Schulte et al., Reference Schulte, Kaup, Bürkner and Holling2021). More detailed literature reviews on applications of FC questionnaires can be found in Bartram (Reference Bartram2007) and Cao and Drasgow (Reference Cao and Drasgow2019).
As the responses to FC item pairs or blocks can be represented by a set of binary indicators, they contain comparable little information about the model parameters. To increase the information per pairwise comparison, generalizations of the original (binary) TIRT model for the analysis of other comparative judgment formats have been proposed as well. This includes models for proportion-of-total (compositional) formats (Brown, Reference Brown2016b) and for (ordinal) graded paired comparisons (Brown & Maydeu-Olivares, Reference Brown and Maydeu-Olivares2018, see also Sect. 2). Regardless of the specific response format, the mechanism by which normative scores can be obtained from comparative judgments (i.e., differential weighting of responses) remains the same.
1.1. The Paradox of Comparative Judgments in High-Stakes Situations
In order for comparative judgments to have the potential to be faking resistant in high-stakes situations, the items being compared need to be equally socially desirable (Bürkner et al., Reference Bürkner, Schulte and Holling2019a; Wetzel et al., Reference Wetzel, Frick and Brown2020). Otherwise, we can expect almost all individuals to choose the more socially desirable option independent of their true personally traits (Bürkner et al., Reference Bürkner, Schulte and Holling2019a), which is indeed what happens in practice (e.g., Schulte et al., Reference Schulte, Kaup, Bürkner and Holling2021). This response behavior is of course understandable, but de-facto renders responses on pairs of items with differing social desirability completely uninformative. Designing pairs of equally keyed items that are equally socially desirable is by no means trivial and requires careful item design and pretesting under realistic conditions, but it can be done (e.g., Wetzel et al., Reference Wetzel, Frick and Brown2020). Trying to achieve the same for unequally keyed pairs is much more complicated though: Items keyed in the objectively less desirable direction would have to appear as equally desirable as items keyed the objectively more desirable direction (Bürkner et al., Reference Bürkner, Schulte and Holling2019a).
On the other hand, existing research suggests that, with few exception discussed below, including unequally keyed item pairs is required to obtain normative trait scores from FC questionnaires (Brown & Maydeu-Olivares, Reference Brown and Maydeu-Olivares2011; Bürkner et al., Reference Bürkner, Schulte and Holling2019a; Lee & Smith, Reference Lee and Smith2020; Schulte et al., Reference Schulte, Holling and Bürkner2020). Otherwise, estimated trait scores remain partially ipsative and have insufficient accuracy for use in individual-level diagnostic decisions. Accordingly, we are stuck between two bad options leading to the same outcome in the end: Either include unequally keyed item pairs and risk them being completely uninformative in practice, or directly include only equally keyed item pairs and still end up with highly inaccurate trait scores.
How can we solve this paradox? Two potential paths toward a solution have been identified. First, we can develop unequally keyed item pairs where both items have roughly the same social desirability and can thus be reasonably applied in high-stakes situations (see Wetzel et al., Reference Wetzel, Frick and Brown2020 for some initial evidence in this direction). Second, we can carefully design tests consisting only of equally keyed item pairs so that they alone are sufficient to ensure satisfactory estimation accuracy. It is this second path that I will focus on below, approaching it mainly from a statistical perspective. For example, how shall we choose factor loadings in purely equally keyed designs to maximize information on the trait scores? This is a question of optimal design (see Sect. 1.2). Also, how can we reduce the information lost through the binary decision (or ranking) process that is used in FC formats? This leads to the idea of using rating scales to indicate the degree of preference for one or the other item, instead of giving respondents only a binary choice to express their preferences. Modeling the degree of preferences in turn implies the application of ordinal models of comparative judgments (Brown & Maydeu-Olivares, Reference Brown and Maydeu-Olivares2018, see Sect. 2 for details). Intuitively, using an ordinal comparative rating with more than two possible response categories, and a corresponding ordinal model, should yield more information about trait scores than a binary decision. This is indeed the case, as shown in Online Supplement A.
Another direction leading along the second path is the observation initially made by Baron (Reference Baron1996) that measuring a higher number of traits can lead to a noticeable increase in estimation accuracy of trait scores. This can go up to a point where sufficient to excellent accuracy can be achieved using only equally keyed item pairs (see Bürkner et al., Reference Bürkner, Schulte and Holling2019a; Schulte et al., Reference Schulte, Holling and Bürkner2020 for extensive simulations with up to 30 traits). While Baron (Reference Baron1996) provided some explanation for this behavior (see Online Supplement E), the understanding of why higher number of traits can improve estimation accuracy is still incomplete and required further investigation (see Sect. 4.3).
In summary, there remain a lot of open research questions related to the applicability of comparative judgments to measure personality in high-stakes situations. Although the present research was motivated primarily by these questions, the obtained results apply to comparative judgments more generally independent of the specific application context.
1.2. Optimal Design in IRT
In TIRT models and IRT more generally, we aim for an efficient and accurate estimation of person and/or item parameters. Toward this goal, applying principles of optimal (experimental) design can be highly beneficial (Atkinson et al., Reference Atkinson, Donev and Tobias2007). In the context, one usually distinguishes between two types of optimal design problems: optimal test designs and optimal sampling designs. In the former, we select items with specific properties for the efficient estimation of person parameters. In the latter, we select people with specific trait scores for the efficient estimation of item parameters. The designs studied in this paper are all optimal test designs, that is, item parameters are treated as known and, at least for the purpose of mathematical argumentation, as freely selectable in order to optimize the efficiency of person parameter estimation. In the literature, optimal designs are investigated from both frequentist and Bayesian perspectives, and I use both perspectives in the present paper as well (see Sect. 3.1 and Online Supplement B, respectively).
In order to quantify the amount of information contained in data \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$y$$\end{document} about model parameters \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta $$\end{document} , optimal design utilizes the Fisher information matrix (or simply Fisher information) that is generally defined as
where \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$l(\eta )$$\end{document} denotes the log-likelihood of the model evaluated at parameter values \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta $$\end{document} . In words, the Fisher information is the square of the log-likelihood’s gradient with respect to the parameters in expectation over possible data (Lehmann & Casella, Reference Lehmann and Casella2006). The Fisher information plays a crucial role in both frequentist and Bayesian statistics and constitutes an important tool to study theoretical properties of models. For example, in frequentist statistics, the Fisher information is the inverse of the covariance matrix of an (asymptotically) efficient estimator (Lehmann & Casella, Reference Lehmann and Casella2006). Understanding the Fisher information of a model provides important insights about how accurately parameters can be estimated from a given study design. Thus, the Fisher information plays a major role also in this paper.
1.3. Summary of Contributions
The primary goal of this paper is to enable accurate and efficient estimation of people’s latent traits using models of comparative judgments, while keeping an eye specifically on the applicability under high-stakes situations. Toward achieving this goal, the paper contributes to the psychological and statistical literature in several ways: First, I extend the mathematical theory of ordinal comparative judgment models with a specific focus on TIRT models (Sect. 2.1 and Online Supplement A). Second, I provide optimal test designs for comparative judgments that maximize estimation accuracy of people’s traits from both frequentist and Bayesian statistical perspectives (Sect. 3.1 and Online Supplement B, respectively). Third, I derive analytic upper bounds for the accuracy of these trait estimates achievable through ordinal comparative judgments and corresponding TIRT models (Sect. 3.3). Fourth, I perform numerical experiments that complement results obtained in earlier simulation studies (Sect. 4) and specifically explain why measuring a higher number of traits can be beneficial for estimation accuracy (Sect. 4.3). Fifth and lastly, I extend recommendations for the practical application of paired comparisons for the measurement of personality, specifically in high-stakes situations (Sect. 5). All mathematical proofs are provided in Appendix A and materials required to replicate the numerical results can be found on OSF (https://osf.io/2g76w/). The online supplement containing additional analytical results and numerical experiments can also be found on OSF.
2. Ordinal Thurstonian IRT Models
Building on Thurstone’s law of comparative judgment (Thurstone, Reference Thurstone1927), TIRT models are used to describe individuals’ responses on item pairs (or item blocks represented by a set of item pairs) using a latent variable approach (Brown & Maydeu-Olivares, Reference Brown and Maydeu-Olivares2011; Reference Brown and Maydeu-Olivares2012). Under a Thurstonian model, we assume that each item \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$i$$\end{document} has a latent utility \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$u_{pi}$$\end{document} that describes the item’s psychological value or desirableness for person \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$p$$\end{document} . Assuming a one-dimensional linear factor structure for each item (Brown & Maydeu-Olivares, Reference Brown and Maydeu-Olivares2011; Bürkner et al., Reference Bürkner, Schulte and Holling2019a), such that item \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$i$$\end{document} loads on trait \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$t$$\end{document} , we define
where \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta _{pt}$$\end{document} is the trait score of person \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$p$$\end{document} on trait \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$t$$\end{document} , \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\lambda _i$$\end{document} is the factor loading of the item \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$i$$\end{document} , and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varepsilon _{pi}$$\end{document} is the person and item-specific unique factor considered an error term. In a pairwise comparative format, the utilities of two items \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$i_1$$\end{document} and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$i_2$$\end{document} are subtracted to yield the latent response \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\tilde{y}_{p n}$$\end{document} of person \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$p$$\end{document} on item pair \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$n$$\end{document} :
where \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$i_1[n]$$\end{document} and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$i_2[n]$$\end{document} denote the \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$1\mathrm{st}$$\end{document} and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$2\mathrm{nd}$$\end{document} item belonging to the \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$n\mathrm{th}$$\end{document} pair, which load on the trait \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$t_1[n]$$\end{document} and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$t_2[n]$$\end{document} , respectively. Overall, a total number of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$T$$\end{document} traits is measured. The unique factors \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varepsilon _{pi}$$\end{document} are assumed to be normally distributed with mean zero and standard deviation \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\psi _{i}$$\end{document} . The corresponding variance \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\psi ^2_{i}$$\end{document} is called the uniqueness of the \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$i\mathrm{th}$$\end{document} item. Item parameters can be standardized, without loss of information in person parameters, by setting \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\psi ^2_{i} = 1 - \lambda ^2_{i}$$\end{document} . I will use standardized item parameters throughout in this paper. Person parameters \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta _p$$\end{document} are assumed to be normally distributed with mean 0 and covariance matrix \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Sigma $$\end{document} . The covariance matrix is denoted as \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Phi $$\end{document} in Brown and Maydeu-Olivares (Reference Brown and Maydeu-Olivares2011), but I use \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Sigma $$\end{document} here instead, because the cumulative distribution function of the standard normal distribution is also denoted as \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Phi $$\end{document} . For identification, the marginal variances of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta _p$$\end{document} are fixed to 1 so that \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Sigma $$\end{document} is also the correlation matrix of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta _p$$\end{document} . As a result of these assumptions, \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\tilde{y}_{p n}$$\end{document} is normal distributed with mean zero and standard deviation \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varphi _n := \sqrt{\psi ^2_{i_1[n]} + \psi ^2_{i_2[n]}}$$\end{document} (Brown & Maydeu-Olivares, Reference Brown and Maydeu-Olivares2011; Reference Brown and Maydeu-Olivares2018).
In practice, we can never observe \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\tilde{y}_{p n}$$\end{document} directly but only its categorized version \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$y_{p n}$$\end{document} that is the response of person \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$p$$\end{document} to item pair \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$n$$\end{document} on a binary or ordinal (Likert) scale (Brown & Maydeu-Olivares, Reference Brown and Maydeu-Olivares2018). Formally, we assume that the observed response \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$y_{p n}$$\end{document} arises from the categorization of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\tilde{y}_{pn}$$\end{document} based on a vector \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\tau _n = (\tau _{n1}, \ldots , \tau _{nK})$$\end{document} of ordered inner thresholds that partition the values of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\tilde{y}_{p n}$$\end{document} into the \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$K+1$$\end{document} observable categories:
For notational convenience, the outer thresholds are set to \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\tau _{n0} = -\infty $$\end{document} and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\tau _{n(K+1)} = \infty $$\end{document} . Taken all of these assumptions together (Brown & Maydeu-Olivares, Reference Brown and Maydeu-Olivares2018), the probability that person \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$p$$\end{document} selects response category \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$y_{p n} = k$$\end{document} on item pair \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$n$$\end{document} is given by
where \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Lambda _n \eta _p := \lambda _{i_1[n]} \eta _{p,t_1[n]} - \lambda _{i_2[n]} \eta _{p,t_2[n]}$$\end{document} is the systematic part of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\tilde{y}_{pn}$$\end{document} and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Phi $$\end{document} denotes the cumulative distribution function of the standard normal distribution with corresponding density function \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\phi $$\end{document} . The binary TIRT model (Brown & Maydeu-Olivares, Reference Brown and Maydeu-Olivares2011; Reference Brown and Maydeu-Olivares2012) arises as a special case of the ordinal TIRT model when the comparative judgments have only two response categories (i.e., \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$K = 1$$\end{document} ). Conversely, in the theoretical case of infinite response categories (i.e., \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$K = \infty $$\end{document} ), the ordinal TIRT model becomes linear factor model (3) on the latent continuous response \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\tilde{y}$$\end{document} (Schmidt & Schwabe, Reference Schmidt and Schwabe2015). Thus, the ordinal TIRT model bridges the gap between the binary TIRT model as a lower bound and a latent linear factor model as an upper bound (see Online Supplement A for technical details).
Ordinal comparative judgments are typically employed directly in the form of item pairs instead of in blocks of more than two items (Brown & Maydeu-Olivares, Reference Brown and Maydeu-Olivares2018; Schulte et al., Reference Schulte, Kaup, Bürkner and Holling2021). Accordingly, in the following, I will assume to have measured item pairs directly. This comes without loss of generality as blocks of more than two items can be expressed equivalently by a set of item pairs (subject to certain constraints on the item parameters, Brown & Maydeu-Olivares, Reference Brown and Maydeu-Olivares2011; Bürkner et al., Reference Bürkner, Schulte and Holling2019a). As a result, the above formulation of Thurstonian IRT models naturally extends to blocks of items in that such blocks simply increases the total number of item pairs implied by a questionnaire.
2.1. Fisher Information of Thurstonian IRT Models
Below, I will study the Fisher information of ordinal TIRT Models with respect to the person parameters \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta $$\end{document} assuming the item parameters \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\lambda $$\end{document} , \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\psi $$\end{document} , and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\tau $$\end{document} to be known; an approach used commonly in the literature (Atkinson et al., Reference Atkinson, Donev and Tobias2007; van der Linden & Hambleton, Reference van der Linden and Hambleton2013). For this purpose, I will rewrite Eq. (5) in terms of an equivalent ordinal regression model (Bürkner & Vuorre, Reference Bürkner and Vuorre2019). This simplifies the notation and makes important model properties more visible. Define the design matrix \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X \in \mathrm{Matrix}^{N \times T}$$\end{document} of the regression model as
and define the standardized thresholds as \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\alpha _{nk} := \tau _{nk} / \varphi _n$$\end{document} . For notational convenience, I will drop the person index \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$p$$\end{document} and simply write \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta $$\end{document} in the following unless where specifically required to avoid ambiguity. Then, the ordinal TIRT model can be written equivalently as an ordinal regression model with
where \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X_n$$\end{document} is the row vector denoting the \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$n\mathrm{th}$$\end{document} row of design matrix \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X$$\end{document} . Note that this is simply a rewritten version of Eq. (5). Now define
which I will call the information factor (of the \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$n\mathrm{th}$$\end{document} item pair based on \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$K$$\end{document} thresholds) for reasons that become apparent soon. Using basic calculus, it can be shown that the Fisher information matrix of the person parameters \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta $$\end{document} equals
(see Brown & Maydeu-Olivares, Reference Brown and Maydeu-Olivares2018; Samejima, Reference Samejima1969, or Online Supplement A for derivations). In the limiting case of infinite response categories, \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\mathbb {I}_\mathrm{TIRT}(\eta )$$\end{document} will converge to the information matrix of a normal linear regression model (Schmidt & Schwabe, Reference Schmidt and Schwabe2015, see also Online Supplement A). It is well known (e.g., Atkinson et al., Reference Atkinson, Donev and Tobias2007) that the Fisher information matrix of the regression coefficients of a normal linear regression model is given by
Thus, we obtain the following natural limits of the information factor:
Corollary 2.1
Let \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Phi $$\end{document} be the cumulative distribution function of the standard normal distribution with corresponding density function \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\phi $$\end{document} , and let \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(s_k)_{0 \le k \le K+1}$$\end{document} be a series of ordered real values such that \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$-\infty = s_0 \le s_1 \le \ldots \le s_K \le s_{K+1} = \infty $$\end{document} . Then, for finite K, the following inequalities hold:
Moreover, if \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\lim _{K \rightarrow \infty } \Phi (s_k) - \Phi (s_{k-1}) = 0$$\end{document} for all \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$k \in \{1, \ldots , K\}$$\end{document} , then
Corollary 2.1 implies that \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$1 - {\mathop {\mathrm {I}}}_{nK}(\eta )$$\end{document} can be interpreted as the percentage of information lost through the response categorization process during measurement of item pair \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$n$$\end{document} . If we could directly observe the latent variable \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\tilde{y}_n$$\end{document} underlying the observed response \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$y_n$$\end{document} , we would have \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\mathop {\mathrm {I}}}_{nK}(\eta ) = 1$$\end{document} and no information would be lost through response categorization. Of course, measuring \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\tilde{y}_n$$\end{document} is impossible in reality, but \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\mathop {\mathrm {I}}}_{nK}(\eta )$$\end{document} still approaches \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$1$$\end{document} rather quickly as \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$K$$\end{document} increases, provided that the threshold vector \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\tau _n$$\end{document} roughly has mean \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$0$$\end{document} (see Figure 1 darker lines). For example, for 10 response categories ( \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$K = 9$$\end{document} ), the median information factor across item pairs already exceeds 85% under reasonable assumptions. However, as threshold means differ more from zero, the convergence of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\mathop {\mathrm {I}}}_{nK}(\eta )$$\end{document} becomes much slower (see Fig. 1 brighter lines). This is highly relevant for the application of TIRT models in high-stakes situations where social desirability is an issue, as I will elaborate in the Discussion.
Convergence of the information factor toward \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$1$$\end{document} is not uniform across different values of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X_n \eta $$\end{document} (see Figure 2 for an illustration). Rather, convergence for very small or large \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X_n \eta $$\end{document} values, corresponding to bigger differences between compared trait scores, is slower than for values close to the threshold mean (compare median lines in Fig. 2). Additionally, the variation of the information factor across different threshold vectors, corresponding to different items, is larger for more extreme \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X_n \eta $$\end{document} values (compare shaded areas Fig. 2). Still, using an increasing number thresholds greatly increases the obtainable Fisher information from comparative judgments across the board, which is clearly visible in particular when contrasted with the binary approach (yellow line in Fig. 2).
3. Upper Information Bounds
In Corollary 2.1, we have seen that the information from the linear normal model on the latent variable \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\tilde{y}$$\end{document} provides a natural and sharp upper bound for the Fisher information obtainable from ordinal comparative judgments. This is because the latent linear model does not suffer from any loss of information through response categorization. Thus, we can use this model to study the maximal Fisher information obtainable by a given test design. Although the maximal information cannot be fully achieved in practice, we have seen above that a close approximation is very well realistic. What is more, a lot of the central properties of this latent linear model, which I will study analytically below, apply to its ordinal counterparts as well. Accordingly, there is a lot to be learned from studying such an ideal model even if we are not able to fully achieve it in practice.
For the purpose of the upcoming mathematical analysis, I will assume factor loadings \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\lambda _i$$\end{document} and error variances \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\psi _i^2$$\end{document} to be known or at least estimable with sufficient precision so that their uncertainty does not affect person parameter estimates to a relevant degree. Such a precision seems to be achievable already when including data of as few as 300 (or more) individuals in the analysis as suggested by the simulation study of Schulte et al. (Reference Schulte, Holling and Bürkner2020). When applying IRT models in practice, it is common to measure several hundreds or perhaps even thousands of individuals. Accordingly, in practice, the assumption of known item parameters does hardly affect estimation accuracy obtained for individuals’ trait scores. And even if the information difference was noticeable, assuming item parameters to be known implies more information on the person parameters than when item parameters are estimated. As such, the general goal to provide an upper information bound remains unaffected. Of course, we rarely know the exact values of item parameters in practice; and whether or not item parameters are estimated makes a big difference with regard to the required estimation algorithms and their stability (Bürkner et al., Reference Bürkner, Schulte and Holling2019a; e.g., van der Linden & Hambleton, Reference van der Linden and Hambleton2013), but these considerations are not in focus of the present paper.
Under the above assumptions, we can formally write down the likelihood of the latent normal model on \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\tilde{y}$$\end{document} as a linear regression
where \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$n = 1, \ldots , N$$\end{document} indexes item pairs and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X_n$$\end{document} denotes the \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$n\mathrm{th}$$\end{document} row of the test design matrix \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X$$\end{document} defined in Eq. (6). In Sect. 2, it was mentioned the person parameters \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta $$\end{document} are assumed to come from a multivariate normal distribution, an assumption made for two reasons: (a) Fixing the marginal variance of this distribution ensures joint identification of item and person parameters and (b) modeling the correlation matrix of this distribution enables sharing of information across parameters of the same person as well as across people. Reason (a) is obsolete when assuming item parameters to be known. However, reason (b) remains highly relevant and will be discussed in detail in Sect. 3.2. For now, I will not consider such a distribution or equivalently, assume it to have infinite marginal variances.
3.1. Maximizing the Test Design Information
As already mentioned earlier, the Fisher information matrix of the regression coefficients of a linear model is given by
It is well known (e.g., Atkinson et al., Reference Atkinson, Donev and Tobias2007) that \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Sigma _{\hat{\eta }} := M^{-1}$$\end{document} is the covariance matrix of the maximum likelihood estimator
Thus, the larger the information, the smaller the uncertainty in the parameter estimates, as is the case more generally for (asymptotically) efficient estimators (Atkinson et al., Reference Atkinson, Donev and Tobias2007). The Fisher information \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$M$$\end{document} obtainable on the latent space of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\tilde{y}$$\end{document} depends on the factor loadings \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\lambda _i$$\end{document} and the uniqueness \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\psi _i^2$$\end{document} . As per Eq. (2), both parameters are related to each other via \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\psi _i^2 = v_i - \lambda _i^2$$\end{document} where \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$v_i$$\end{document} denotes the variance of the utilities \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$u_{pi}$$\end{document} across people. For ordinal paired comparisons, \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$v_i$$\end{document} is not identified so we can set \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$v_i = 1$$\end{document} without loss of generality, which leads to standardized item parameters and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\psi _i^2 = 1 - \lambda _i^2$$\end{document} . Independently of how we fix the utility variances, \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$M = M(\lambda )$$\end{document} can be written to only depend on the factor loadings, which will thus be the primary target of investigation.Footnote 1
We can now ask how to choose factor loadings in order to maximize information or equivalently minimize uncertainty. Such questions can be investigated by means of optimal design and several optimality criteria can be applied (see Atkinson et al., Reference Atkinson, Donev and Tobias2007; Berger & Wong, Reference Berger and Wong2009 for an overview). Probably the most common criterion is D-optimality aiming to maximize the determinant of the Fisher information, \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\mathrm{det}(M)$$\end{document} , or equivalently minimize the determinant of the inverse information \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\mathrm{det}(M^{-1}) = \mathrm{det}(M)^{-1}$$\end{document} . The popularity of D-optimality can be explained by a combination of mathematical convenience and intuitiveness of interpretation as \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\det (M^{-1})$$\end{document} is proportional to the volume of the \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$T$$\end{document} -dimensional confidence ellipsoid of an (asymptotically) efficient estimator (Atkinson et al., Reference Atkinson, Donev and Tobias2007). As above, \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$T$$\end{document} is used to denote the total number of estimated parameters per person, that is, the total number of measured traits. Minimizing \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\det (M^{-1})$$\end{document} can be interpreted as minimizing the joint uncertainty of the estimated trait scores per person. For comparability across different number of traits, it is sensible to define the D-optimality criterion as
which is a strictly monotonic transformation of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\det (M^{-1})$$\end{document} that accounts for the volume change induced by increased dimensionality. If we assume some symmetry in the design, we obtain an insightful analytical result about the D-optimal test design for comparative judgments.
Theorem 3.1
Let \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\lambda _\mathrm{max} \in (0, 1)$$\end{document} be the maximally achievable standardized factor loading and let \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\lambda _{n1}$$\end{document} and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\lambda _{n2}$$\end{document} be the two standardized factor loadings for item pair \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$n \in \{1, \ldots , N\}$$\end{document} , with \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\lambda _{n1} \ge 0$$\end{document} without loss of generality. Assume that each trait i is compared to every other trait \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$j \ne i$$\end{document} an even number of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$R_{ij} \ge 0$$\end{document} times. Further, without loss of generality, assume that every two consecutive pairs m and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$m+1$$\end{document} , such that m is odd, belong to the same trait combination. Then, for any number of traits \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$T \ge 2$$\end{document} and any even number of comparisons \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$R_{ij} \ge 0$$\end{document} per trait combination the D-optimal design is given if \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\lambda _{n1} = \lambda _\mathrm{max}$$\end{document} , \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\lambda _{n2} = (-1)^{n}\lambda _\mathrm{max}$$\end{document} , and each trait appears in the same number of item pairs in total.
Among others, this result implies that half of the item pairs should be equally keyed and the other half should be unequally keyed. This is particularly relevant, as it not only states that mixed keyed designs are preferable but also specifies exactly how the ratio between the number of equally and unequally keyed comparisons should be for the test to be optimal. To build an intuition, it is helpful to look at the Fisher information of a single item with trait scores parameterized by their mean \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\bar{\eta }$$\end{document} and difference \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta _d$$\end{document} such that \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta _1 = \bar{\eta } + \eta _d / 2$$\end{document} and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta _2 = \bar{\eta } - \eta _d / 2$$\end{document} . Then, the information in the direction of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\bar{\eta }$$\end{document} equals
while the information in the direction of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta _d$$\end{document} equals
In words, the trait score mean requires high factor loading differences, as provided by unequally keyed item pairs, while the trait score difference requires high factor loading sums, as provided by equally keyed item pairs.
It might be argued that D-optimality is not an ideal measure for evaluation of comparative judgments designs as we are primarily interested in minimizing the marginal variances of each trait, rather than minimizing the determinant of the whole covariance matrix. In other words, we might be more interested in achieving A-optimality (Atkinson et al., Reference Atkinson, Donev and Tobias2007). I define
as the A-optimality criterion, which is a strictly monotonic transformation of the sum of marginal variances. The scaling via \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$1/T$$\end{document} ensures that the score is comparable across varying number of traits \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$T$$\end{document} , and the square-root transform enables interpretation in terms of standard deviations instead of variances. Thus, \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$C_A$$\end{document} can be interpreted as an average marginal standard deviation across trait estimates. It turns out that for the comparative judgment designs considered above, the D-optimal design is also A-optimal:
Theorem 3.2
Under the assumptions of Theorem 3.1, the D-optimal design is also A-optimal.
Through the proofs of Theorems 3.1 and 3.2, it not only becomes clear that an equal number of equally and unequally keyed pairs is optimal but also why this is the case, namely that the diagonal elements of the Fisher information are maximized, while the off-diagonal elements become zero. Conversely, when applying a design with only equally keyed item pairs, we can achieve the same diagonal elements in theory, but obtain highly negative off-diagonal elements, which drastically increases (worsens) both the inverse Fisher information’s determinant (D-optimality) and its trace (A-optimality). In the most extreme case of an equally keyed design, where \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\lambda _{n1} = \lambda _{n2}$$\end{document} for each item pair \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$n = 1, \ldots , N$$\end{document} , the design no longer even identifies the person parameters (see also Brown, Reference Brown2016a).
The above-derived optimal design is in fact ‘very optimal’ compared to potentially alternatives as we can see using a simple example. Suppose we have \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$T = 5$$\end{document} traits and a symmetric design with \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$R = 2$$\end{document} comparisons for each trait combination such that the design consists of a total of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$N = 20$$\end{document} paired comparisons. Suppose further that \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\lambda _{n1} = \bar{\lambda } + \lambda _{\Delta } / 2 \in (0, 0.8]$$\end{document} and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\lambda _{n2} = \bar{\lambda } - \lambda _{\Delta } / 2 \in (0, 0.8]$$\end{document} for each trait combination, so that \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\bar{\lambda }$$\end{document} is the factor loading mean and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\lambda _{\Delta }$$\end{document} is the factor loading difference per item pair. In this example, every trait has the same amount of higher and lower factor loadings such that the information for each trait is the same due to symmetry. The corresponding mixed keyed design is obtained by switching the sign of half of the \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\lambda _{n2}$$\end{document} to be negative.
For varying \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\bar{\lambda }$$\end{document} and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\lambda _{\Delta }$$\end{document} , I illustrate the implied D-optimality criterion in Fig. 3. On the right-hand side of Fig. 3, an illustration for the mixed keyed design is shown, which clearly has the highest determinant when the factor loadings are maximal within the considered range ( \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\bar{\lambda } = 0.8$$\end{document} and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\lambda _{\Delta } = 0$$\end{document} ). In contrast, when considering an equally keyed design (left-hand side of Fig. 3), we see the importance of balancing high mean factor loadings (i.e., high \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\bar{\lambda }$$\end{document} ) with high differences between factor loadings within the same item pair (i.e., high \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\lambda _{\Delta }$$\end{document} ). What is more, the optimal equally keyed design offers only a fraction of the information from the (mixed keyed) optimal design ( \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$C_D(M) \approx 0.40$$\end{document} vs. \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$C_D(M) \approx 0.15$$\end{document} ) under the given conditions. The practical implications of the information difference between mixed and equally keyed designs can be better grasped when investigating the marginal standard deviations of the ML estimator, that is, the A-optimality criterion. When comparing the left-hand and right-hand side of Fig. 4, we not only see very similar optimality patterns as for the D-optimality criterion, but also that the optimal mixed keyed design implies marginal SDs about as half as big as the corresponding marginal SDs implied by the optimal equally keyed design ( \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$C_A(M) \approx 0.37$$\end{document} vs. \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$C_A(M) \approx 0.73$$\end{document} ). The absolute values are not super small in either case, but this is simply the result of using only \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$N = 20$$\end{document} item pairs for the purpose of this illustration. More detailed numerical experiments are provided in Sect. 4.
It is of high practical relevance to understand how the optimal designs on the latent linear TIRT model generalize to ordinal TIRT models. Fortunately, the optimal designs generalize quite nicely:
Theorem 3.3
Let \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$C: \mathrm{Matrix}^{T \times T} \rightarrow \mathbb {R}$$\end{document} be an optimal design criterion based on the Fisher information such that, without loss of generality, lower values are considered more optimal. Let \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$M^\star := \sum _{n=1}^N M^\star _n := \sum _{n=1}^N X^{\star \mathop {\mathrm {T}}}_n X^\star _n$$\end{document} be the optimal Fisher information of a normal linear model according to criterion C with corresponding optimal design matrix \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X^\star $$\end{document} . Assume that \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$C(M^\star ) \le C(M)$$\end{document} implies \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$C(c \, M^\star ) \le C(c \, M)$$\end{document} , for any Fisher information matrix \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$M = \sum _{n=1}^N X^{\mathop {\mathrm {T}}}_n X_n$$\end{document} based on a permissible design matrix X and any constant \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$c \in \mathbb {R}^+$$\end{document} . Then, the optimal design matrix \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X^\star $$\end{document} for the normal linear model on \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\tilde{y}$$\end{document} is also optimal for all corresponding ordinal models on y independently of the number of thresholds K.
Theorem 3.3 applies very generally to any ordinal model with a normal distribution and a linear predictor on the latent scale. Most relevant for the purposes of this paper, Theorem 3.3 immediately implies the following result:
Corollary 3.4.
Under the assumptions of Theorem 3.1, the D- and A-optimal factor loadings for the latent linear model on \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\tilde{y}$$\end{document} are also D- and A-optimal for the corresponding ordinal models on y independently of the number of thresholds K.
However, neither Theorem 3.3 nor Corollary 3.4 make a statement about the optimal ordinal thresholds, just about the optimal factor loadings. Deriving optimal thresholds for ordinal TIRT models is an interesting research direction but out of scope of the present paper. In the following, I will continue with studying the properties of latent linear TIRT models.
3.2. Adding Prior Information
Investigating the test design information alone tells only half of the story even in the linear case. This is because the person parameters in IRT models constitute latent variables that are assumed to come from an underlying distribution describing the variation of the parameters across people (van der Linden & Hambleton, Reference van der Linden and Hambleton2013). In particular this is true for the TIRT model that assumes a multivariate normal distribution for the person parameters \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta $$\end{document} (see Sect. 2 for details). From a Bayesian perspective, this distribution can be understood as a prior and we can formally extend linear regression model (13) as
In practical applications of TIRT models, we would usually treat \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Sigma _{\mathop {\mathrm {prior}}}$$\end{document} as a hyperparameter that is estimated from the data. Here, again for the purpose of the mathematical analysis, I assume \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Sigma _{\mathop {\mathrm {prior}}}$$\end{document} to be known rather than estimated, an assumption I will come back to in the Discussion. If \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Sigma _{\mathop {\mathrm {prior}}}$$\end{document} is known, the above model becomes a special case of Bayesian linear regression. It is well known that a normal prior is conjugate to a normal likelihood (Gelman et al., Reference Gelman, Carlin, Stern, Dunson, Vehtari and Rubin2013), and so we can derive the posterior distribution of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta $$\end{document} analytically as
with posterior covariance matrix
and posterior mean
where \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$M = X^{\mathop {\mathrm {T}}} X$$\end{document} and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\hat{\eta }_\mathrm{ML}$$\end{document} is the maximum likelihood estimate of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta $$\end{document} :
The posterior mean \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\mu _{\mathop {\mathrm {post}}}$$\end{document} is commonly known as the expected a-posteriori (EAP) estimator and applied extensively both in full and empirical Bayes approaches (Gelman et al., Reference Gelman, Carlin, Stern, Dunson, Vehtari and Rubin2013). The expected value and covariance matrix of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\mu _{\mathop {\mathrm {post}}}$$\end{document} over data \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\tilde{y}$$\end{document} are given, respectively, by
and
The subscript of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\mathbb {E}$$\end{document} and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\mathrm{Var}$$\end{document} indicates over which variable integration is performed. In order to measure to accuracy of the EAP estimator, investigating its covariance matrix is insufficient on its own as the estimator may be biased. For this reason, I additionally consider the mean squared error (MSE) matrix
which can be expressed as the variance plus the bias squared. The MSE matrix can be used to obtain root-mean-square error (RMSE) estimates as an important measure for predictive accuracy:
3.3. Marginalizing over Person Parameters
In the above equations, I condition on a fixed vector \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta $$\end{document} of true person parameters. In other words, I investigate the estimates of only a single person at the same time. From the perspective of model estimation, this is totally valid as all the item parameters (i.e., \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\lambda _i$$\end{document} and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\psi _i$$\end{document} per item \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$i$$\end{document} ) and person hyperparameters (i.e., \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Sigma _{\mathop {\mathrm {prior}}}$$\end{document} ) were considered to be known for the purpose of the mathematical analysis performed here. If they were not known, data of all people had to be modeled jointly in order to estimate item parameters and person hyperparameters, as is done in the full version of the TIRT model. However, even if the simplified model can be estimated separately per person, we still aim to compare different individuals and hence we need to consider multiple \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta $$\end{document} parameters and compare their estimates. This can be done by interpreting the multinormal prior over \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta $$\end{document} in Eq. (21) as a sampling distribution from which true person parameter values can be drawn (Gelman et al., Reference Gelman, Carlin, Stern, Dunson, Vehtari and Rubin2013). More accurately, one assumes that \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta $$\end{document} is sampled according to
This resembles the approach in simulation studies, only that the results below are derived analytically instead of empirically via repeated sampling from the distribution. Because the latent scale of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta $$\end{document} is arbitrary, I set the marginal variances \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\Sigma _{\eta }}_{ii} = 1$$\end{document} as is standard in TIRT models and many other latent variable models for reasons of identification (Brown & Maydeu-Olivares, Reference Brown and Maydeu-Olivares2011). If \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Sigma _{\eta } = \Sigma _{\mathop {\mathrm {prior}}}$$\end{document} , then the model’s prior assumes the correct data generating process. If \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Sigma _{\eta } \ne \Sigma _{\mathop {\mathrm {prior}}}$$\end{document} , then the prior provides some kind of model misspecification, whose influence on the obtained posterior estimates can be investigated. This is sensible in the context of TIRT models, as simulation studies have shown potentially strong biases in the correlation hyperparameters that constitute \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Sigma _{\mathop {\mathrm {prior}}}$$\end{document} (Brown & Maydeu-Olivares, Reference Brown and Maydeu-Olivares2011; Bürkner et al., Reference Bürkner, Schulte and Holling2019a); biases we can mimic by letting \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Sigma _{\mathop {\mathrm {prior}}}$$\end{document} deviate from \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Sigma _{\eta }$$\end{document} .
It is now possible to study \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta $$\end{document} -depending quantities not only conditional on \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta $$\end{document} but rather marginalized over its distribution (30). In particular, we can study the first two moments of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\bar{\mu }_{\mathop {\mathrm {post}}}$$\end{document} that evaluate to
and
Due to the positive semi-definiteness of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Sigma _{\mathop {\mathrm {prior}}}$$\end{document} , \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Sigma _{\eta }$$\end{document} , and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$M$$\end{document} , and because of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\mathrm{det} (\Sigma _{\mathop {\mathrm {post}}}) \le \mathrm{det} (M)$$\end{document} as implied by Eq. (23), we see that the marginal variances
Moreover, if the a-priori covariance matrix \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Sigma _{\mathop {\mathrm {prior}}}$$\end{document} is finite, inequality (33) holds strictly so that the variance of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\bar{\mu }_{\mathop {\mathrm {post}}}$$\end{document} is smaller than the variance of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta $$\end{document} . In other words, the prior leads to a shrinkage of estimates in expectation and thus contributes to the bias and MSE of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\mu _{\mathop {\mathrm {post}}}$$\end{document} , a well-known result of adding prior information (Gelman et al., Reference Gelman, Carlin, Stern, Dunson, Vehtari and Rubin2013). This is not to say that shrinkage is undesirable as it also decreases the variance, thus leading to a bias-variance trade-off (Gelman & Hill, Reference Gelman and Hill2006). However, in diagnostic practice, we are not directly interested in estimating the true scale of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta $$\end{document} correctly, but to compare (the estimates of) different people’s traits. Accordingly, the difference in the scales of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\bar{\mu }_{\mathop {\mathrm {post}}}$$\end{document} and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta $$\end{document} is irrelevant and any linear transformation of the true scale will do equivalently well. Also, when fitting TIRT models in practice, we estimate both person parameters and factor loadings simultaneously. This implies that the prior scale is required to identify the scale of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta $$\end{document} and thus no shrinkage will occur in this case.
To remove the shrinkage-induced scale difference, I standardize \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\mu _{\mathop {\mathrm {post}}}$$\end{document} so that its expectation \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\bar{\mu }_{\mathop {\mathrm {post}}}$$\end{document} has the same variance as \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta $$\end{document} . Formally, this is done as follows: Let \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$S$$\end{document} be a diagonal matrix with diagonal elements equal to the inverse of the marginal standard deviations of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\bar{\mu }_{\mathop {\mathrm {post}}}$$\end{document} over \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta $$\end{document} , that is,
and define the scaled EAP estimator \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\delta _{\mathop {\mathrm {post}}}$$\end{document} of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta $$\end{document} as
By definition, \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\delta _{\mathop {\mathrm {post}}}$$\end{document} then satisfies \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\mathrm{Var}_{\eta } \left( \bar{\delta }_{\mathop {\mathrm {post}}} \right) _{ii} = 1$$\end{document} .
The MSE matrix of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\delta _{\mathop {\mathrm {post}}}$$\end{document} is given by
which is now free of any bias caused solely by scale differences. Since \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta $$\end{document} is multivariate normally distributed with covariance matrix \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\sigma _\eta $$\end{document} , the matrix \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta \eta ^{\mathop {\mathrm {T}}}$$\end{document} is Wishart distributed with one degrees of freedom (Srivastava, Reference Srivastava2003):
Because of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$1 = \nu \le T - 1$$\end{document} , this Wishart distribution is singular (Srivastava, Reference Srivastava2003), but its mean still exists and is equal to \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Sigma _\eta $$\end{document} as in the non-singular case. It follows that
For the \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$i\mathrm{th}$$\end{document} trait, we can obtain an expected RMSE marginalized over \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta $$\end{document} via
I have deliberately taken the expectation before the square-root so that the expression is analytical, in line with common approaches to express average square roots of variance-like terms. The formulas for \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\mathbb {E}_{\eta } \left( \mathrm{MSE}_{\tilde{y}}(\mu _{\mathop {\mathrm {post}}}, \eta ) \right) $$\end{document} and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\overline{\mathrm{RMSE}}(\mu _{\mathrm{post,i}}, \eta _i)$$\end{document} of the unscaled posterior means \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\mu _{\mathop {\mathrm {post}}}$$\end{document} can simply be obtained from Eqs. (38) and (39), by dropping the scaling matrix \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$S$$\end{document} .
As the test information increases, the RMSE decreases. Further, as the test information approaches infinity (i.e., \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\det (M) \Rightarrow \infty $$\end{document} ), we have \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Sigma _{\mathop {\mathrm {post}}} \Rightarrow M^{-1}$$\end{document} and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$S \Rightarrow I$$\end{document} provided that \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$M$$\end{document} is invertible (i.e., the model is identified). Hence, we get
such that also \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\overline{\mathrm{RMSE}}(\delta _{\mathrm{post,i}}, \eta _i) \Rightarrow 0$$\end{document} , as it should be.
As a second important metric of predictive accuracy, I consider the reliability, that is, the proportion of variance in \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\mu _{\mathop {\mathrm {post}}}$$\end{document} explained by the true values \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta $$\end{document} , or equivalently, the squared correlation between \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\mu _{\mathop {\mathrm {post}}}$$\end{document} and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta $$\end{document} (Brown & Maydeu-Olivares, Reference Brown and Maydeu-Olivares2011). In contrast to the RMSE, the reliability requires variation in \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta $$\end{document} by definition and so there are no reliability coefficients of individual \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta $$\end{document} realizations. The estimates \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\mu _{\mathop {\mathrm {post}}}$$\end{document} and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\delta _{\mathop {\mathrm {post}}}$$\end{document} of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta $$\end{document} only vary in their overall scale, which implies their reliabilities to be the same. Accordingly, it is sufficient to derive the reliability for \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\mu _{\mathop {\mathrm {post}}}$$\end{document} . For this purpose, we first compute the cross-covariance matrix of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\mu _{\mathop {\mathrm {post}}}$$\end{document} and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta $$\end{document} over the joint distribution \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$p(\tilde{y}, \eta )$$\end{document} implied by Eqs. (20) and (30) as
and the covariance matrix of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\mu _{\mathop {\mathrm {post}}}$$\end{document} as
where \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varepsilon _{\mathop {\mathrm {post}}} \sim \mathrm{multinormal}(0, \Sigma _{\mathop {\mathrm {post}}})$$\end{document} is an error term uncorrelated with both \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\bar{\mu }_{\mathop {\mathrm {post}}}$$\end{document} and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta $$\end{document} that describes the difference between \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\mu _{\mathop {\mathrm {post}}}$$\end{document} and its data expectation \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\bar{\mu }_{\mathop {\mathrm {post}}}$$\end{document} . Since \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\Sigma _{\eta }}_{ii} = 1$$\end{document} by definition, the reliability for the \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$i\mathrm{th}$$\end{document} trait integrated over parameters \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta $$\end{document} and data \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\tilde{y}$$\end{document} is then given by
In the special case of the prior resembling the sampling distributions, that is \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Sigma _{\mathop {\mathrm {prior}}} = \Sigma _\eta $$\end{document} , the covariance matrix of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\mu _{\mathop {\mathrm {post}}}$$\end{document} simplifies to
so that the reliability simplifies to
As the test information increases, the reliability increases. Further, as the test information approaches infinity, we get
such that \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\mathrm{Rel}(\mu _{{\mathop {\mathrm {post}}},i}) \Rightarrow 1$$\end{document} , independently of whether or not \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Sigma _{\mathop {\mathrm {prior}}} = \Sigma _\eta $$\end{document} .
We now can also ask how we should design our tests such that they maximize reliability or minimize (expected) RMSE of the person parameter estimates. These are questions that can, in the present context, be answered by means of Bayesian optimal design (Bürkner et al., Reference Bürkner, Schwabe and Holling2019b; Chaloner & Verdinelli, Reference Chaloner and Verdinelli1995), and I provide a thorough discussion and results on this topic in Online Supplement B.
4. Numerical Experiments
In Sect. 3, I have derived analytic upper information bounds for TIRT models. Based on these analytic results, I will perform numerical experiments to gain a better understanding of the relative influence of several factors related to test design on the obtainable estimation accuracy. These experiments build on and extend existing simulation studies performed for binary TIRT models (Brown & Maydeu-Olivares, Reference Brown and Maydeu-Olivares2011; Bürkner et al., Reference Bürkner, Schwabe and Holling2019b; Schulte et al., Reference Schulte, Holling and Bürkner2020).
4.1. Simulation Design
To study the behavior of person parameter accuracy more thoroughly, I vary the following factors in a fully crossed manner:
-
• The number of traits \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$T = 2, 3, 5, 10, 20, 30$$\end{document} representing the full range of how many traits one might want to measure in practice.
-
• The total number of item pairs \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$B = 30, 90, 270$$\end{document} ranging from very short to very long tests. The number of item pairs per trait equals \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$B_T = 2 B / T$$\end{document} such that, for constant \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$B$$\end{document} , tests with higher number of traits have fewer item pairs per trait. This stands in contrast to previous simulation studies where, if at all, \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$B_T$$\end{document} rather than \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$B$$\end{document} was varied (see Online Supplement D for additional experiments with varying \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$B_T$$\end{document} ).
-
• The mean standardized factor loading \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\bar{\lambda } = 0.5, 0.65, 0.8$$\end{document} ranging from medium to high values.
-
• The difference between the two-factor loadings of an item pair \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\lambda _{\Delta } = 0.1, 0.2, 0.3$$\end{document} ranging from small to high factor loading differences.
-
• The design type: Either a mixed keyed design (half equally and half unequally keyed pairs) denoted as (+/-) or a fully equally keyed design denoted as (+).
-
• The sampling correlation matrix \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Sigma _\eta $$\end{document} : Either diagonal or one of two conditions inspired by the NEO-PI-R (Costa & McCrae, Reference Costa and McCrae1992; Ostendorf & Angleitner, Reference Ostendorf and Angleitner2004) described in the following. For \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$T \le 5$$\end{document} traits, take a random subset of length \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$T$$\end{document} from the correlation matrix of the Big Five scores measured by the NEO-PI-R. For \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$T \ge 10$$\end{document} , take a random subset of length \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$T$$\end{document} from the correlation matrix of the 30 Big Five sub-scores using the approach of Schulte et al. (Reference Schulte, Holling and Bürkner2020). These correlation matrices, denoted as NEO(+/-), contain a mix of negatively, positively, and uncorrelated traits. Alternatively, create another NEO-PI-R correlation matrix, denoted as NEO(+), by inverting neuroticism into emotional stability which results in all ( \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$T \le 5$$\end{document} ) or most ( \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$T \ge 10$$\end{document} ) correlations to be non-negative (and select a subset of traits as before).
-
• The prior correlation matrix \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Sigma _{\mathop {\mathrm {prior}}}$$\end{document} : Either diagonal or equal to \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Sigma _\eta $$\end{document} .
Experimental conditions are defined by fully crossing the above factors. For each of the conditions, \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$S = 10$$\end{document} simulation trials were run. In each trial, a test design meeting the criteria of the condition was obtained using the Thurstonian IRT package (Bürkner, Reference Bürkner2019). Analytical (expected) RMSE and reliability scores were computed on that basis (see Sect. 3.3). Since the RMSE can also be computed on a per-person basis and hence may vary across people, \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta $$\end{document} parameter values were drawn from sampling distribution (30) for \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$J = 50$$\end{document} people and individual analytic RMSE scores were obtained. In practice, multiple hundred people would be required to accurately estimate item parameters of Thurstonian IRT models (e.g., Bürkner et al., Reference Bürkner, Schwabe and Holling2019b). However, as we assume item parameters to be known, the number of people has no influence on the accuracy of person parameter estimates. Given that for each person we estimate multiple traits, so that the total number of estimated person parameters is in fact equal to \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$J T$$\end{document} per trial, \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$J = 50$$\end{document} per trial is sufficient to show relevant RMSE variations across conditions and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta $$\end{document} parameter values.
4.2. Results
Below, I present selected results of a subset of conditions from which all major conclusions regarding the influence of the above factors can be drawn. Additional results are provided in Online Supplement C and a complete overview is available on OSF (https://osf.io/2g76w/). Obtained reliability scores for \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$B = 90$$\end{document} item pairs are illustrated in Figure 5. As expected based on the analytical findings above as well as existing simulative evidence for binary TIRT models (Brown & Maydeu-Olivares, Reference Brown and Maydeu-Olivares2011; Bürkner et al., Reference Bürkner, Schwabe and Holling2019b; Schulte et al., Reference Schulte, Holling and Bürkner2020), mixed keyed designs imply a different reliability pattern and uniformly higher reliability values than equally keyed designs especially so for smaller number of traits ( \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$T \le 10$$\end{document} ).
For mixed keyed designs and high mean factor loadings ( \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\bar{\lambda } = 0.8$$\end{document} ), reliabilities are always in a satisfactory ( \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\mathrm{Rel} > 0.8$$\end{document} ) to excellent ( \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\mathrm{Rel} > 0.9$$\end{document} ) range independent of all other factors varied in the numerical experiments. In contrast, for low mean factor loadings ( \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\bar{\lambda } = 0.5$$\end{document} ), reliabilities are satisfactory or better only for smaller number of traits ( \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$T \le 10$$\end{document} ). In general, reliabilities are declining in all mixed keyed conditions as the number of traits \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$T$$\end{document} increases, although this decline is much more noticeable for lower mean factor loadings. The factor loading difference \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\lambda _{\Delta }$$\end{document} plays no role for mixed keyed designs, as unequally keyed pairs already inform the within-person trait mean well enough (see also Sect. 3.1).
For equally keyed designs, reliabilities are most influenced by the number of traits \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$T$$\end{document} and the factor loading difference \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\lambda _{\Delta }$$\end{document} . For high factor loading differences \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(\lambda _{\Delta } = 0.3)$$\end{document} , equally keyed designs provide only slightly worse reliabilities than the corresponding mixed keyed designs, with reliabilities decreasing slightly with increasing number of traits \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$T$$\end{document} . Especially when coupled with high mean factor loadings \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(\bar{\lambda } = 0.8)$$\end{document} , satisfactory to excellent reliabilities can be achieved across the whole range of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$T$$\end{document} . In contrast, for small factor loading differences \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(\lambda _{\Delta } = 0.1)$$\end{document} , reliabilities tend to be unsatisfactory especially for smaller number of traits \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(T \le 5)$$\end{document} . They only reach a satisfactory level for high mean factor loadings \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(\bar{\lambda } = 0.8)$$\end{document} as the number of traits increases further. In general, the reliabilities of equally keyed designs tend to converge to the corresponding reliabilities of mixed keyed designs as \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$T$$\end{document} increases, but convergence is not fully achieved in all conditions for the investigated \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$T \le 30$$\end{document} traits.
The influence of the sampling correlation matrix \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Sigma _\eta $$\end{document} is noticeable only for equally keyed designs, where \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Sigma _\eta = \mathrm{Neo(+/-)}$$\end{document} provides uniformly higher reliabilities than \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Sigma _\eta = \mathrm{Neo(+)}$$\end{document} , especially for lower number of traits \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(T \le 5)$$\end{document} and small factor loadings differences \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(\lambda _{\Delta } = 0.1)$$\end{document} . We will explain this finding in Sect. 4.3. Using a misspecified prior (diagonal \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Sigma _{\mathop {\mathrm {prior}}}$$\end{document} , while \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Sigma _\eta $$\end{document} is one of the NEO correlation matrices), is only problematic for a higher number of traits \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(T \ge 10)$$\end{document} , but may there be quite detrimental to the achievable reliabilities especially for small mean factor loadings \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(\bar{\lambda } = 0.5)$$\end{document} . Finally, using a higher total number of item pairs \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$B$$\end{document} implies uniformly higher reliabilities (compare Fig. 5 to the results provided in Online Supplement C).
For the same selected conditions, distributions of obtained RMSE values are illustrated in Figure 6. Qualitatively, results from reliabilities and RMSEs paint a similar picture only that the latter is more nuanced since the RMSE can be computed on a per-person basis, while the reliability is, by definition, an expectation over people. For mixed keyed designs and smaller number of traits, individual RMSE values are almost constant across the whole range of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta $$\end{document} values within each condition. In contrast, for equally keyed designs, RMSE values vary considerably across \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\eta $$\end{document} values within each conditions and are noticeably bigger on average than for the corresponding mixed keyed design. For higher number of traits ( \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$T = 20, 30$$\end{document} in particular), differences in RMSE distributions between equally and mixed keyed designs become smaller or even diminish completely for some conditions. This extends the results of Schulte et al. (Reference Schulte, Holling and Bürkner2020) who investigated RMSEs obtained from binary TIRT models for varying number of traits and found that, for a very high number of traits, average RMSEs obtained from equally and mixed keyed designs are highly similar. Similar to the results obtained for the reliabilities, a misspecified prior (diagonal \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Sigma _{\mathop {\mathrm {prior}}}$$\end{document} , while \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Sigma _\eta $$\end{document} is one of the NEO correlation matrices) increases the expected RMSE primarily for higher number of traits ( \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$T \ge 10$$\end{document} ; see Online Supplement C). However, at the same time, there is a striking variation across trait scores with respect to how much a misspecified prior changes the RMSE of the trait score estimates.
As illustrated in Figure 7, the within-person mean \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\bar{\eta }$$\end{document} over trait scores can explain some of the within-condition variation of RMSE scores in the equally keyed design conditions for small number of traits ( \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$T \le 5$$\end{document} ) and small factor loading differences \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(\lambda _{\Delta } = 0.1)$$\end{document} . In those cases, RMSE scores are particularly high if a person has a very negative or very positive mean, corresponding to overall low and high trait scores. This is the result of partial ipsativity of trait estimates induced by equally keyed designs which, as demonstrated here, is still present even in the limiting case of the latent linear model and thus cannot be eliminated through the application of ordinal TIRT models. Explained in more detail, trait scores of people with low variation between traits are estimated closer to zero, thus inducing strong biases and thus high RMSE for people with low or high average trait scores. This pattern is much less visible for higher number of traits ( \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$T \ge 10$$\end{document} ), a finding which I investigate and explain further in Sect. 4.3. Figure 7 also illustrates that the within-person RMSE mean may vary strongly across people within the same condition, even when their \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\bar{\eta }$$\end{document} values are similar. The variation in mean RMSE scores may vary by a factor of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$2$$\end{document} or even \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$3$$\end{document} between people with the lowest and highest mean RMSE within condition. Notably, this variation can neither be explained by variation across simulation trials within each condition nor by shrinkage of parameter estimates induced by the prior (see Online Supplement C). Together, the results demonstrate that differences in between traits of the same person can be estimated well as long as equally keyed item pairs are present confirming theoretical results (see also Brown & Maydeu-Olivares, Reference Brown and Maydeu-Olivares2011).
4.3. Increasing Test Information by Measuring More Traits
In the above numerical experiments, I systematically varied the total number of item pairs \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$B$$\end{document} rather than the number of item pairs per trait \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$B_T$$\end{document} . Comparing different number of traits for an arbitrary, but fixed \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$B$$\end{document} (my approach above) implies fixing the overall test length. In contrast, comparing different number of traits for an arbitrary, but fixed \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$B_T$$\end{document} implies the test to become longer for a higher number of traits, up to a point where the test length becomes impractically high for most applications. In a simulation study for binary TIRT models, Schulte et al. (Reference Schulte, Holling and Bürkner2020) used the approach of systematically varying \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$B_T$$\end{document} and found that higher number of traits imply strong increases in estimation accuracy, especially for equally keyed designs. Results of additional numerical experiments mirroring the simulation design from Sect. 4.1, apart from systematically varying \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$B_T$$\end{document} rather than \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$B$$\end{document} , confirm that this behavior can be found in latent linear TIRT models as well (see Online Supplement D).
A comprehensive explanation for the apparent benefit of a higher number of traits when using equally keyed designs has been lacking so far. Baron (Reference Baron1996) identified one mechanism related to \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Sigma _\eta $$\end{document} in that highly skewed true trait patterns, where most traits of an individual are either very high or low, become less likely as the number of traits increases (see Online Supplement E for a more detailed discussion). Below, I discuss two additional mechanisms, related to the Fisher information matrix \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$M$$\end{document} and the prior correlation matrix \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Sigma _{\mathop {\mathrm {prior}}}$$\end{document} , respectively. Together, each of the three essential matrices ( \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$M, \Sigma _\eta , \Sigma _{\mathop {\mathrm {prior}}}$$\end{document} ) has a corresponding mechanism by which higher number of traits benefits estimation accuracy.
The mechanism related to \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$M$$\end{document} is the increase in per-trait information in equally keyed designs through measuring more traits while holding \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$B_T$$\end{document} constant. Figure 8 shows the scaled D- and A-optimality criteria (Eqs. (16) and (19)) for varying number of traits \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$T$$\end{document} . Due to the scaling by \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$T$$\end{document} , these criteria are comparable across varying number of traits. For equally keyed designs, it is clearly visible that the per-trait information improves consistently in all investigated conditions as more traits are measured. The information improvement is particularly strong for smaller \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$T$$\end{document} and then gradually flattens out toward a lower asymptote that depends on the mean factor loading \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\bar{\lambda }$$\end{document} and on \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$B_T$$\end{document} . In contrast to equally keyed designs, the per-trait information for mixed keyed designs is constant across traits and equals the asymptote that equally keyed designs are only approaching for higher number of traits. When holding the total number of pairs \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$B$$\end{document} constant, instead of the number of pairs per trait \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$B_T$$\end{document} , a higher number of traits no longer implies an increase in per-trait information (smaller criterion values) but rather an almost linear decrease in information (see Figure 9), with a slope depending on \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\bar{\lambda }$$\end{document} , \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\lambda _{\Delta }$$\end{document} , and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$B$$\end{document} .
The mechanism related to \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Sigma _{\mathop {\mathrm {prior}}}$$\end{document} is the increase in variance of each trait that can be explained by means of all other traits. If prior correlations are nonzero, different trait estimates will inform each other through the prior and thus be pushed along the axes implied by the correlation structure. Clearly, the influence of the prior is only helpful if \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Sigma _{\mathop {\mathrm {prior}}}$$\end{document} contains some true prior information, that is, relates closely enough to the sampling correlation matrix \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Sigma _{\eta }$$\end{document} . If that is the case, as one trait becomes estimated more accurately, so will be all other traits correlated with the former.
Of course, if all traits are mutually uncorrelated, the variance explained would still be zero no matter the number of traits. However, in practice, the more traits are measured, the higher their (absolute) correlations will be in expectation. This is simply due to the fact that the amount of mutually independent personality traits, or other constructs one aims to measure via comparative judgments, is naturally limited. Not even the Big-Five personality scores are completely uncorrelated with each other. For example, the main Big-Five scores measured by NEO-PI-R have an average absolute correlation of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$|r| =$$\end{document} 0.19, while the corresponding 30 Big-Five sub-scores (six per main Big-Five dimension) have an average absolute correlation of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$|r| =$$\end{document} 0.26. Even if the average absolute correlation remained constant for increasing number of traits, the number of traits themselves will lead to an increase in variance explained per trait as more other traits can be used for prediction. For the main Big-Five NEO-PI-R scores, the average percentage of variance explained per trait by means of all other traits is \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$R^2 =$$\end{document} 0.23, while for the corresponding 30 sub-scores, one finds \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$R^2 =$$\end{document} 0.81. This large percentage of variance explained in the latter case is mostly but not exclusively driven by other sub-scores belonging to the same main dimension.
These implications of the between-parameter shrinkage property of joint normal priors not only holds when \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Sigma _{\mathop {\mathrm {prior}}}$$\end{document} is known, as assumed in this paper, but also when it is estimated from the data (Gelman & Hill, Reference Gelman and Hill2006). Of course, \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Sigma _{\mathop {\mathrm {prior}}}$$\end{document} has to be estimated without substantial bias in order to be informative for the trait scores themselves. As shown by previous simulation studies, estimates of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Sigma _{\mathop {\mathrm {prior}}}$$\end{document} may be substantially biased for smaller number of traits ( \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$T \le 5$$\end{document} ) but become much less biased or even unbiased as the number of traits increases further up to \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$T = 30$$\end{document} (Bürkner et al., Reference Bürkner, Schwabe and Holling2019b; Schulte et al., Reference Schulte, Holling and Bürkner2020). This provides some sort of additional sub-mechanism by which the prior improves estimation accuracy when measuring a higher number of traits.
5. Discussion
The presented research is driven by the main goal of obtaining accurate and efficient estimates of people’s latent traits using models of comparative judgments, while keeping an eye specifically on the applicability under high-stakes situations. Any procedure applied in a high-stakes context needs to be faking resistant—or at least have a realistic potential to be faking resistant—while at the same time yielding trait score estimates that are accurate enough for individual-level diagnostic decisions. Toward this goal, two paths have been identified. First, develop unequally keyed item pairs where both items have roughly the same social desirability and can thus be reasonably applied in high-stakes situations. Second, carefully design tests consisting only of equally keyed item pairs so that they alone are sufficient to ensure satisfactory estimation accuracy. This paper focuses on the second path approaching it mainly from a statistical point of view.
Progress was made from several perspectives all related to answering the main question. Below, I will connect these perspectives into a single, coherent picture based on which practical recommendations can be given. Walking along the second path is full of obstacles that we have to get around one by one. As a reminder, unless explicitly stated otherwise, I consider higher trait scores representing the more desirable (better) end of the scale. Because the choice of trait direction is essentially arbitrary, it does not pose any actual restriction, but makes thinking about the discussed problems easier.
5.1. Obtaining Accurate Trait Estimates from Equally Keyed Item Pairs
For the purpose of individual-level diagnostic decisions, using highly accurate trait score estimates is not only sensible but ethically required. One way to increase the information from comparative judgments is to employ ordinal rather than binary response scales (see Section 2 as well as Brown & Maydeu-Olivares, Reference Brown and Maydeu-Olivares2018; Schulte et al., Reference Schulte, Kaup, Bürkner and Holling2021). In this context, I have established that the information factor (defined in Eq. (8)) can be used as an intuitive metric to quantify the Fisher information gained in comparison with binary measurement models, or to quantify the Fisher information lost in comparison with latent linear models. When using two response categories per item pair (i.e., binary decisions), the average expected (Fisher) information is only about 13% of the maximal achievable information under conditions not unrealistic for TIRT models with item pairs matched for social desirability. In comparison, when using as few as five response categories, the average expected information is already about 65% of the maximum and even increases to about 85% when using ten response categories. Even if we consider the cognitive complexity of the response process and treat the ranking of a triplet (two decisions leading to three binary responses) as equivalently complex to a single ordinal response, the latter still contains substantially more information (39% vs. 65% or 85% of the maximum, on average).
Upon studying the information factor in detail, we had seen that its distribution across individual responses becomes narrower and is substantially higher than zero in a much wider range of the latent scale. This leads me to conjecture that, as we increase the number of response categories, the ordinal model may also become more robust to moderate amounts of variation in social desirability. However, this needs to be verified in empirical studies with faking instructions. In any case, such robustness will have its limits: If one of the two items compared in an item pair is much more socially desirable than the other, and if people react to the social desirability, this will imply a strong shift in the ordinal thresholds toward the end of response scale belonging to the less desirable item (see also Schulte et al., Reference Schulte, Kaup, Bürkner and Holling2021). This in turn pushes the latent responses more into the tails, beyond the range where the convergence of the information factor to its maximum is still reasonable fast. Thus, social desirability matching continuous to be mandatory also in ordinal models of comparative judgments, but we may get away with a little less perfect matching compared to the binary case.
Although ordinal models certainly help to increase information on individuals’ trait scores, more measures need to be taken to obtain trait estimates applicable for individual-level diagnostic decisions in high-stakes situations. Again also for ethical reasons, trait estimates of all individuals should have roughly the same accuracy, for example, as quantified via their RMSE. Thus, not only large average RMSEs (or small reliabilities) are problematic in practice, but also large RMSE variations across traits and individuals. The numerical experiments performed in this paper have demonstrated that, when only using equally keyed item pairs, trait score estimates of individuals with particularly low or high (within-person) average scores tend to be comparably less accurate. In personnel selection, or other high-stakes situations, individuals with low or high average scores are of primary interest. Accordingly, it is particularly problematic if trait scores of specifically those individuals cannot be estimated with sufficient accuracy.
From a statistical perspective, the reason for the above problem is that Fisher information about within-person average trait scores is only provided through unequally keyed item pairs or factor loading differences. If we restrict ourselves to equally keyed item pairs only, all we are left with are the factor loading differences. Of course, those differences cannot become arbitrarily large because standardized factor loadings very close to one are hard to achieve in practice (e.g., Costa & McCrae, Reference Costa and McCrae1992). Also, the closer factor loadings get to zero, the less information the corresponding items provide. This induces a natural trade-off between high mean factor loadings and high factor loading differences of item pairs, a trade-off I have approached in this paper in a principled manner by means of optimal design theory.
Different design criteria find different trade-offs between high factor loading means and differences to be optimal. Under several of the considered (frequentist and Bayesian) optimality criteria, very high of even maximal factor loading differences are optimal. That is, one of the two items in a pair should have a maximal factor loading—whatever we consider to be maximally achievable in a given practical setting—while the other should have a very small, or even zero, factor loading. Using items with all zero factor loadings may be statistically optimal but comes with two practical problems. First, one has to design items that do not load on any of the desirable traits of interest but still have comparable social desirability with items that do load strongly on such a trait. A weaker version of this problem is likely to hold also for positive but comparably small factor loadings. I speculate that the more factor loadings differ from each other, the harder it becomes to match the social desirability of the corresponding items, but this needs to be investigated empirically in future research. Second, individuals will almost surely vary in whatever non-modeled traits determine the responses to the items that have only zero (or very small) factor loadings on the modeled traits. As a result, the corresponding variation of the non-modeled traits will be ignored. This is essentially a misspecification of the latent model structure with potentially detrimental effect on the validity of the estimated model (e.g., Hu & Bentler, Reference Hu and Bentler1998). As a result, very high factor loading differences pose a risk to the practical validity of the trait score estimates, and we should thus be careful to trust their statistical optimality too much.
In the numerical experiments, factor loadings differences were varied in a comparably smaller range (0.1–0.3). As expected, the bigger the factor loading differences (within the considered range), the lower RMSEs become across the board, that is, also for individuals with more extreme average trait scores. However, it is only with higher number of traits (starting between 5 and 20 traits depending on condition) that the U-shape of individual RMSE scores as a function of within-person average trait scores starts to vanish. So how do higher number of traits help with estimation accuracy in the tails even if the number of administered item pairs is held constant? Multiple mechanisms have been identified. One of these mechanisms is that that more extreme average trait scores simply become less likely the more traits are considered (Baron, Reference Baron1996, also discussed in Online Supplement E). To put it another way, by measuring a large-enough number of traits, we increase the within-person variation in trait scores relative to the between-person variation; and the former kind of variation can be estimated well with only equally keyed items. Unfortunately, measuring more traits also has the drawback that, in a fixed-length test, the number of items measuring a single trait decreases, thus reducing the average by-trait information. So, again, we find a trade-off between different mechanisms, this time in the number measured traits. For small factor loading differences, this trade-off reached its optimum at around 10 traits, whereas for higher factor loading differences, the optimum is a little smaller somewhere between 5 and 10 traits. This optimum also depends on other factors influencing estimation accuracy (see below) and may very well be higher than 10 depending on these factors. Accordingly, these optimal values should be considered with care and rather taken as a rough rule of thumb.
The structure of the inter-trait correlation matrix is another factor that can have strong influence on the achievable estimation accuracy. Measuring traits with a mix of negative, positive, and zero correlations implies higher estimation accuracy than measuring traits that are mostly zero or positively correlated. The mechanism behind this finding turns out to the same one that is behind the beneficial properties of measuring more traits, that is, reducing the probability of individuals having more extreme average trait scores. Accordingly, we can get away with measuring fewer traits if some of those traits turn out to be negatively correlated. Two notes of caution: First, the above statements apply only when, for all traits, higher values represent the desirable end of the scale. Of course, we can invert some traits to artificially create negative inter-trait correlation, but this does not, in fact, change anything. We simply switch labels so that unequally keyed item pairs, between a higher-means-better trait and a lower-means-better trait, become the ones that can well be matched for social desirability, whereas equally keyed item pairs suddenly become practically problematic. Second, the correlations between traits are usually not something that it under the control of the test developer. Instead, it is determined by the traits being considered as relevant for the given high-stakes situation and the population of individuals taking the test. That is, in some situations, we may be lucky and find these negative correlations between the measured desirable traits, while in some situations, we may not. So we cannot rely on negative inter-trait correlations in general, but it will help if they occur.
5.2. Practical Recommendations
In summary, the results of the present study suggest that achieving practically and ethically acceptable estimation accuracy for inter-individual decision making using only equally keyed item pairs is indeed possible but requires the careful consideration of several factors. For example, acceptable reliabilities ( \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\text {Rel > 0.8}$$\end{document} ) can be achieved if all of the following conditions are met or exceeded: An ordinal measurement model with five response categories is applied to 90 item pairs, ten traits are measured with some of them being negatively correlated, mean factor loadings are high (0.8), and factor loading differences are medium (0.2). If we make some conditions more restrictive, for example, increase the number of response categories or the number of item pairs, other conditions may be relaxed while still retaining acceptably accurate estimates. As different application contexts of comparative judgments may each come with their own idiosyncrasies, one should understand these conditions more like rough guidelines not as definitive recommendations. If in doubt, the analyses performed in this paper can be replicated with the materials provided online (https://osf.io/2g76w/) and adjusted to the given application context.
5.3. Limitations and Future Research
There are several limitations of the present study which should be taken into account and expanded upon in future research. First, most of the presented analytical proofs and numerical experiments focus on latent linear TIRT models that provide an upper bound of the Fisher information obtainable from ordinal (including binary) TIRT models. Thus, the obtained results only indicate the maximal achievable accuracy by means of TIRT models in a given situation, which will not be fully achieved in practice (although one may come quite close; see above). This choice was made to allow for an extensive mathematical analysis that is much harder for the ordinal models themselves, due to the nonlinearity of the ordinal response categorization. Still, important results such as the optimal designs for factor loadings turn out to also hold in the same manner for ordinal models. This underlines the practical relevance of the obtained analytical and numerical findings. What is more, extreme average trait scores that turned out to be difficult to estimate for any kind of TIRT model, can be estimated by the ordinal models almost as well as by the latent linear models. This is because the corresponding latent responses are located in the center of the latent response distribution, where information of the ordinal models approaches optimality very quickly.
Apart from the present study, existing research on ordinal TIRT models (Brown & Maydeu-Olivares, Reference Brown and Maydeu-Olivares2018; Schulte et al., Reference Schulte, Kaup, Bürkner and Holling2021) has only applied them in empirical settings, whereas comprehensive simulation studies on these models are still missing. They may also help to validate the approximate results obtained in this paper. For example, we may use the average information factor to adjust the Fisher information matrix of the latent linear model in order to approximate the expected reliabilities and RMSEs obtained from ordinal models. However, until further validation, the goodness of this approximation remains unclear. Due to its already extensive scope, bespoken simulations were not performed in this paper, but provide an interesting area for future research.
Another limitation of the present study is that the item parameters were considered known instead of being estimated from the data. Again, this choice was made to enable a deeper analytical treatment of the TIRT models. Of course, in practice, item parameters will almost always be estimated, which makes a big difference with regard to the required estimation algorithms and their stability (e.g., Bürkner et al., Reference Bürkner, Schwabe and Holling2019b; van der Linden & Hambleton, Reference van der Linden and Hambleton2013). However, with respect to the estimation accuracy of trait scores, this choice may actually not be that relevant: In the TIRT simulation study of Schulte et al. (Reference Schulte, Holling and Bürkner2020), estimation accuracy of trait scores saturated already with 300 (or more) individuals. This indicates that uncertainty in item parameters becomes irrelevant to trait score accuracy quite quickly. When applying IRT models in practice, it is common to measure several hundreds or perhaps even thousands of people. Accordingly, in practice, the assumption of known item parameters is unlikely to affect the information obtained on individuals’ trait scores to a relevant degree.
A related limitation, again motivated by the requirements of mathematical analysis, was the fixation (rather than estimation) of the inter-trait correlation matrix. What is more, a lot of the presented numerical results were obtained by fixing the correlation matrix to its true value. To investigate the robustness of these results to prior-misspecification, the same experiments were run using a diagonal prior correlation matrix, essentially assuming traits to be uncorrelated a-priori. This kind of misspecification turned out to only affect estimation accuracy noticeably for higher number of traits and even then only in situation where the test design provided comparably little information (e.g., small average factor loadings or small factor loading differences). To better put this into perspective, two things should be considered: First, the assumption of a-priori zero correlations constitutes a strong prior misspecification given that chosen true correlation matrices based on the NEO-PI-R contain a lot of highly nonzero values (Costa & McCrae, Reference Costa and McCrae1992). Second, estimation accuracy was found to be sufficient for individual diagnostics only under conditions where there is quite a lot of test design information; conditions were sensitivity to the prior was low anyway. In practice, the correlation matrix usually represents a hyper-parameter estimated jointly from the data of all individuals. Previous simulation studies have shown the trait correlation matrix tends to be estimated inaccurately and with substantial bias precisely in those situations where individual trait scores are estimated inaccurately as well (Brown & Maydeu-Olivares, Reference Brown and Maydeu-Olivares2011; Bürkner et al., Reference Bürkner, Schwabe and Holling2019b). Conversely, if trait scores were estimated accurately, so was their correlation matrix. In summary, with regard to results of the present paper, it appears unlikely that the conditions identified as yielding sufficiently accurate trait score estimation would yield much different results if the correlation matrix was estimated from the data as part of the model fitting procedure. A more systematic investigation of this topic is desirable and could be an interesting area for future research.
Lastly, a highly important area for future research is to better understand faking behavior of individuals and faking resistance of personality tests. Comparative judgments are commonly understood as being able to reducing faking because they prevent all items from being endorsed maximally at the same time (Cao & Drasgow, Reference Cao and Drasgow2019; Wetzel et al., Reference Wetzel, Böhnke, Brown, Leong, Bartram, Cheung, Geisinger and Iliescu2016). Honest-faking studies have shown that TIRT models of comparative judgments can indeed decrease score inflation compared to standard rating scale models (Schulte et al., Reference Schulte, Kaup, Bürkner and Holling2021; Wetzel et al., Reference Wetzel, Frick and Brown2020; see also Cao & Drasgow, Reference Cao and Drasgow2019). However, one of these studies (Schulte et al., Reference Schulte, Kaup, Bürkner and Holling2021) not only considered score-inflation but also the correlation between honest and faking scores as a metric of faking resistance. Unexpectedly, they found that rating scales lead to higher (better) honest-faking correlations. These inconsistent findings also call for more theoretical research to properly define faking resistance in the first place, on which basis a better understanding of faking and faking resistance can then be obtained by further empirical studies.
6. Conclusion
In this paper, I have investigated the information obtainable from comparative judgments by means of TIRT models using a combination of analytical and numerical approaches. The obtained results suggest that it is indeed possible to design personality tests that yield trait score estimates sufficiently accurate for individual-level diagnostic decisions, while having a realistic potential to prevent faking in high-stakes situations. However, reaching this goal requires the careful joint consideration of several aspects of test design, including number of response categories, number of item pairs, number of measured traits, correlations between traits, average factor loadings, and factor loading differences. While these results are encouraging and ground-breaking, they remain to be validated empirically to demonstrate that tests meeting the given requirements can indeed be constructed and successfully applied in practice in high-stakes situations. If that practical validation succeeds, this would be a major breakthrough for the fields of psychological diagnostics, differential psychology, and their areas of application.
Acknowledgements
Partially funded by Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy—EXC 2075—390740016.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Appendix
Proof of Corollary 2.1
Consider the Fisher information \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\mathbb {I}_\mathrm{TIRT}(\eta )$$\end{document} of an ordinal TIRT model from Eq. (9) and the Fisher information \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\mathbb {I}(\eta )$$\end{document} of the corresponding linear normal model from Eq. (10). Choose observation n arbitrarily and choose the K inner thresholds \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\tau _k$$\end{document} such that \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$s_k = \tau _k - X_n \eta $$\end{document} for some arbitrary choice of design matrix X with nonzero \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$n\mathrm{th}$$\end{document} row. Then, comparing the Fisher information matrices reveals
where the first inequality is because of the positive (semi-)definiteness of the Fisher information and the second inequality is due to Theorem 1.4 in Online Supplement A. The equality to 1 in the limit of infinite thresholds
is now an immediate consequence of Theorem 1.5 in Online Supplement A.
Proof of Theorem 3.1
In case of standardized item parameters (which we can assume without loss of generality), we have
as the square of the denominator occurring in the \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$n\mathrm{th}$$\end{document} row \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X_n$$\end{document} of the design matrix. Let us start with the minimal non-trivial design of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$T = 2$$\end{document} traits and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$R_{12} = 2$$\end{document} comparisons between the \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$1\mathrm{st}$$\end{document} and the \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$2\mathrm{nd}$$\end{document} trait, so that we have a total of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$N = 2$$\end{document} comparisons. Then, we obtain the Fisher information as
and the determinant evaluates to
Clearly, the first of the two additive terms is maximized if \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\lambda _{in} \in \{-\lambda _\mathrm{max}, \lambda _\mathrm{max}\}$$\end{document} for all \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$1 \le i,n \le 2$$\end{document} (note that \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\phi _1^2$$\end{document} and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\phi _2^2$$\end{document} are also minimized by this choice), while the second additive term is minimized if
This equation has many solutions, one of which is given by \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\lambda _{11} = \lambda _{21} = \lambda _{22} = \lambda _\mathrm{max}$$\end{document} and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\lambda _{12} = -\lambda _\mathrm{max}$$\end{document} . But this solution also maximizes the first term which proves its optimality in case of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$T = 2$$\end{document} and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$R_{12} = 2$$\end{document} .
For arbitrary \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$T \ge 2$$\end{document} number of traits and even \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$R_{ij} \ge 0$$\end{document} number of comparisons between traits \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$i \ne j$$\end{document} , we note that every comparison contributes information only to two traits and corresponding four elements of M, while all other elements remain unchanged due to additivity of the Fisher information. That is, having already included an even number of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\tilde{N} < N$$\end{document} comparisons to the design, adding two more comparisons \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\tilde{N}+1$$\end{document} and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\tilde{N}+2$$\end{document} of traits \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$i \ne j$$\end{document} , we get
and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(M_{\tilde{N}+2})_{kl} = (M_{\tilde{N}})_{kl}$$\end{document} for all other elements, where \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$M_{\tilde{N}}$$\end{document} and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$M_{\tilde{N}+2}$$\end{document} denote the Fisher information after a total of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\tilde{N}$$\end{document} and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\tilde{N}+2$$\end{document} comparisons, respectively.
If we plug in our solution from the \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$T = R_{12} = 2$$\end{document} case above, that is, choose \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\lambda _{(\tilde{N}+1)1} = \lambda _{(\tilde{N}+2)1} = \lambda _{(\tilde{N}+2)2} = \lambda _\mathrm{max}$$\end{document} and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\lambda _{(\tilde{N}+1)2} = -\lambda _\mathrm{max}$$\end{document} , we see that the ij and ji increments are 0 such that \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(M_{\tilde{N}+2})_{ij} = (M_{\tilde{N}})_{ij}$$\end{document} and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(M_{\tilde{N}+2})_{ji} = (M_{\tilde{N}})_{ji}$$\end{document} , while the diagonal elements ii and jj increase maximally in the space of all admissible factor loadings. We apply this solution to all such sets of two item pairs, while ensuring that all traits appear in the same number of pairs in total. We denote the resulting Fisher information of this design (suggestively) as \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$M_\mathrm{max}$$\end{document} . By induction, we see that all off-diagonal elements \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(M_\mathrm{max})_{ij} = 0$$\end{document} , and thus the determinant then simply equals
The diagonal elements of any M are of the form
where \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$J_i$$\end{document} is the index set of all items belonging to the \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$i\mathrm{th}$$\end{document} trait and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$n_j$$\end{document} is the comparison with which the \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$j\mathrm{th}$$\end{document} item belongs. Since the total number of factor loadings sums to \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\sum _{i=1}^T |J_i| = 2N$$\end{document} , the product \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\prod _{i=1}^T M_{ii}$$\end{document} is maximal if each \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\frac{1}{\varphi ^2_{n_j}} \lambda ^2_j$$\end{document} is maximal and all \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$M_{ii}$$\end{document} are equal (square maximization property), both of which are satisfied by \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$M_\mathrm{max}$$\end{document} .
It remains to be shown that \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\det (M_\mathrm{max})$$\end{document} is indeed maximal within the space of all admissible designs not just within the space of all design implying a diagonal Fisher information. For this purpose, we apply Cholesky decomposition to M such that \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$M = L L^T$$\end{document} of lower triangular matrix L. This is always possible if M is positive definite; and if M was only positive semi-definite, we had \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\det (M) = 0$$\end{document} and thus a degenerate design, which is certainly not optimal. We obtain \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\det (M) = \det (L) \det (L^T) = \det (L)^2$$\end{document} and, since L is lower triangular, \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\det (L) = \prod _{i=1}^T L_{ii}$$\end{document} . Further, as \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$M_\mathrm{max}$$\end{document} is diagonal, we see that \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$L_\mathrm{max}$$\end{document} is also diagonal and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(L_\mathrm{max})_{ii} = \sqrt{(M_\mathrm{max})_{ii}}$$\end{document} . We now only need to show that \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$L_{ii} \le (L_\mathrm{max})_{ii}$$\end{document} for every admissible design. Applying the Cholesky-Banachiewicz algorithm to determine L, we see that
From \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\prod _{i=1}^T M_{ii} \le \prod _{i=1}^T (M_\mathrm{max})_{ii}$$\end{document} and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$L_{ij}^2 \ge (L_\mathrm{max})^2_{ij} = 0$$\end{document} for \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$i \ne j$$\end{document} , we obtain maximality of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(L_\mathrm{max})_{ii}$$\end{document} and hence the maximality of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\det (M_\mathrm{max})$$\end{document} .
Proof of Theorem 3.2
Under the assumptions of Theorem 3.1, we need to show that the design \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\lambda _{n1} = \lambda _\mathrm{max}$$\end{document} and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\lambda _{n2} = (-1)^{n}\lambda _\mathrm{max}$$\end{document} for all comparisons \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$n \in \{n, \ldots , N\}$$\end{document} , which leads to the design matrix \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$M_\mathrm{max}$$\end{document} , minimizes \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\sum _{i=1}^T (M^{-1})_{ii}$$\end{document} . This is equivalent to minimizing the sum of the (univariate) variances of any efficient estimator.
Let us first consider all designs such that M is diagonal. In that case, \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$M^{-1}$$\end{document} is also diagonal with diagonal entries \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(M^{-1})_{ii} = 1/ M_{ii}$$\end{document} . From the proof of Theorem 3.1, we see that \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\sum _{i=1}^T M_{ii}$$\end{document} is bounded and maximized by any design with \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\lambda _{n1} = \pm \lambda _\mathrm{max}$$\end{document} and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\lambda _{n2} = \pm \lambda _\mathrm{max}$$\end{document} , a class of designs to which the \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$M_\mathrm{max}$$\end{document} design belongs. Because the function
with support in \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$x > y \in \mathbb {R}^{+}$$\end{document} is minimized at \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$y = 0$$\end{document} for any fixed \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$x \in \mathbb {R}^{+}$$\end{document} , it follows that \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\sum _{i=1}^T (M^{-1})_{ii}$$\end{document} is minimized if both \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\sum _{i=1}^T M_{ii}$$\end{document} is maximized and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$M_{ii} = M_{jj}$$\end{document} for all traits \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$i,j \in \{1, \ldots , T\}$$\end{document} . These requirements are fulfilled by \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$M_\mathrm{max}$$\end{document} , proving its A-optimality within the space all designs that imply a diagonal Fisher information.
It remains to be shown that the \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$M_\mathrm{max}$$\end{document} design is indeed A-optimal within the space of all admissible designs. For this purpose, we will again use Cholesky decomposition such that \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$M = L L^T$$\end{document} with a lower triangular matrix L. We first note that \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$M^{-1} = (L^{-1})^{\mathop {\mathrm {T}}} L^{-1}$$\end{document} , by standard rules of matrix inversion. Since L is lower triangular, \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$L^{-1}$$\end{document} is also lower triangular and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(L^{-1})_{ii} = 1 / L_{ii}$$\end{document} . Together, this implies
For any non-diagonal M, we have \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\sum _{i+1}^T (L^{-1})_{ji}^2 \ge 0$$\end{document} , and thus, together the results from the diagonal case, we conclude
which completes the proof.
Proof of Theorem 3.3
Consider an ordinal model with latent linear normal structure and arbitrary but fixed number of thresholds K. Let \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X^+$$\end{document} be the optimal design matrix and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\tau ^+$$\end{document} be the optimal threshold vector of that ordinal model. Let \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\tau ^\star $$\end{document} be the optimal threshold vector given the choice of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X^\star $$\end{document} as design matrix that is optimal under the latent linear model. We need to show that
Let \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\mathbb {O}^K$$\end{document} be the set of all real ordered vectors of length K and let \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$s_\mathrm{max} := \text {arg}\max _{s \in \mathbb {O}^K} I_{K}(s)$$\end{document} be the ordered vector maximizing the Information factor \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$I_{K}$$\end{document} in the parameterization of Eq. (11). It follows that \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\tau ^+_{nk} = s_{\mathrm{max},k} + X^+_n \eta $$\end{document} and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\tau ^\star _{nk} = s_{\mathrm{max},k} + X^\star _n \eta $$\end{document} for each \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$k \in \{1, \ldots , K\}$$\end{document} are the optimal thresholds belonging to \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X^+$$\end{document} and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X^\star $$\end{document} , respectively. These solutions are always valid as \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$s_\mathrm{max}$$\end{document} is ordered and both \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X^+_n \eta $$\end{document} and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X^\star _n \eta $$\end{document} are independent of k, thus \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\tau ^+_{n} = (\tau ^+_{n1}, \ldots , \tau ^+_{nK})$$\end{document} and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\tau ^\star _{n} = (\tau ^\star _{n1}, \ldots , \tau ^\star _{nK})$$\end{document} are ordered as well. Accordingly, we get \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${I}^\star _{nK} = I^+_{nK} = I_{K}(s_\mathrm{max}) =: c_\mathrm{max}$$\end{document} , where \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${I}^\star _{nK}$$\end{document} and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$I^+_{nK}$$\end{document} are the information factors based on \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\tau ^\star _{n}$$\end{document} and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\tau ^+_{n}$$\end{document} , respectively. Notice that \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$c_\mathrm{max}$$\end{document} is independent of n.
Define \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$M^+_n := X^{+ \mathop {\mathrm {T}}}_n X^+_n$$\end{document} . If \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(X^\star , \tau ^\star )$$\end{document} was not optimal for the ordinal model, we would have
which implies
because \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${I}^\star _{nK} = I^+_{nK} = c_\mathrm{max}$$\end{document} for all n. However, due to optimality of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X^\star $$\end{document} for the linear model, we have
Thus, by assumption on C, it follows that
which contradicts (63). Accordingly, \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X^\star $$\end{document} must also be an optimal design matrix for the ordinal model. As the ordinal model was arbitrary chosen, this conclusions holds for all such ordinal models.
Proof of Corollary 3.4.
Using basic properties of the determinant, we see that the D-optimal criterion \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$C_D$$\end{document} defined in (16) satisfies \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$C_D(c \, M) = C_D(M) / c$$\end{document} for any \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$c \in \mathbb {R}^+$$\end{document} . Similarly, using basic properties of matrix inversion, we see that the A-optimal criterion \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$C_A$$\end{document} defined in (19) satisfies \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$C_A(c \, M) = C_A(M) / \sqrt{c}$$\end{document} . Accordingly, the assumptions of Theorem 3.3 are satisfied for both criteria as
and
for any permissible Fisher information matrix M. This completes the proof as the design matrix X in ordinal TIRT models only depends on the factor loadings \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\lambda $$\end{document} .