Identification of Factor Scores by Regression with External Variables in Exploratory Factor Analysis

Naoto Yamashita

doi:10.1017/psy.2025.10025

Identification of Factor Scores by Regression with External Variables in Exploratory Factor Analysis

Published online by Cambridge University Press: 16 June 2025

Naoto Yamashita

Show author details

Naoto Yamashita*: Affiliation:
Faculty of Sociology, Kansai University, Suita, Osaka, Japan
*: Email: nyam@kansai-u.ac.jp

Article contents

Abstract
Introduction
Existing procedures
Proposed method
Related procedures and generalization
Numerical simulation
Applications to real data
Concluding remarks
Data availability statement
Author contributions
Funding statement
Competing interests
References

Rights & Permissions

Abstract

Factor score indeterminacy is a characteristic property of factor analysis (FA) models. This research introduces a novel procedure, regression-based factor score exploration (RFE), which uniquely determines factor scores and simultaneously estimates other parameters of the FA model. RFE uniquely determines factor scores by minimizing a loss function that balances FA and multivariate regression, regulated by a tuning parameter. Theoretical aspects of RFE, including the uniqueness of factor scores, the relationship between observed and latent variables, and rotational indeterminacy, are examined. Additionally, clustering-based factor exploration (CFE) is presented as a variant of RFE, derived by generalizing the penalty term to enable the clustering of factor scores. It is demonstrated that CFE creates cluster structures more accurately than the existing method. A simulation study shows that the proposed procedures accurately recover true parameter matrices even in the presence of error-contaminated data, with lower computational demand compared to existing methods. Real data examples illustrate that the proposed procedures provide interpretable results, demonstrating high relevance to the factor scores obtained by existing methods.

Keywords

clustering factor score indeterminacy penalized estimation regression

Information

Type: Theory and Methods
Information: Psychometrika , First View , pp. 1 - 14

DOI: https://doi.org/10.1017/psy.2025.10025 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: © The Author(s), 2025. Published by Cambridge University Press on behalf of Psychometric Society

1 Introduction

Factor analysis (FA, Adachi, Reference Adachi2019; Bartholomew, Reference Bartholomew, Knott and Moustaki2011; Harman, Reference Harman1976) is a method used to explain the variations among multiple observed variables by assuming unobserved variables called factors. Let $\mathbf { x}$ be a p-dimensional vector of random variables having mean vector ${\boldsymbol \mu }$ and covariance matrix $\boldsymbol { \Sigma }$ . The FA model with r common factors is formally defined as

(1)

$$ \begin{align} \mathbf{ x} = {\boldsymbol \mu} + \boldsymbol { \Lambda}\mathbf{ f} + \mathbf{ e}, \end{align} $$

where $\mathbf { f}$ denotes an r-dimensional random vector representing common factor scores, which satisfies $E[\mathbf { f}] = \mathbf { 0}_r$ and $V[\mathbf { f}] = \mathbf { I}_r$ where $\mathbf { 0}_r$ and $\mathbf { I}_r$ are the r-dimensional vector filled with zeros and the $r\times r$ identity matrix, respectively, and $\boldsymbol { \Lambda } (p \times r)$ is the matrix of factor loadings. Furthermore, $\mathbf { e}$ denotes a p-dimensional random vector of unique factor scores, which satisfies $V[\mathbf { e}] = \boldsymbol { \Psi }^2 = \mathrm {diag}(\psi _1^2, \ldots , \psi _p^2)$ . This assumption indicates that $\mathbf { e}$ is decomposed as $\mathbf { e} = \boldsymbol { \Psi }\mathbf { u}$ where $\mathbf { u}$ is a p-dimensional random vector satisfying $E[\mathbf { u}] = \mathbf { 0}_p$ and $V[\mathbf { u}] = \mathbf { I}_p$ . The matrix $\boldsymbol { \Psi }$ is assumed to be positive-semidefinite; $\boldsymbol { \Psi }> \mathbf { O}$ . $\mathbf { f}$ and $\mathbf { u}$ are assumed to be orthogonal: $E[\mathbf { f}\mathbf { u}^{\prime }] = \ _r\mathbf { O}_p$ where $_r\mathbf { O}_p$ denotes $r\times p$ matrix filled with zeros. The unknown parameters in the model (1) are estimated by minimizing the discrepancy between the sample covariance matrix $\mathbf { S}$ and $\boldsymbol { \Sigma } = \boldsymbol { \Lambda }\boldsymbol { \Lambda }^{\prime } + \boldsymbol { \Psi }^2$ over $\boldsymbol { \Lambda }$ and $\boldsymbol { \Psi }$ . For example, in maximum likelihood estimation, the negative log-likelihood function under the normality assumption on $\mathbf { x}$ is defined as $l_{neg}(\boldsymbol { \Lambda }, \boldsymbol { \Psi }) = \mathrm {tr}\mathbf { S}\boldsymbol { \Sigma }^{-1} - \log |\mathbf { S}\boldsymbol { \Sigma }^{-1}|$ which is minimized to estimate $\boldsymbol { \Lambda }$ and $\boldsymbol { \Psi }$ (Jöreskog, Reference Jöreskog1967; Lawley, Reference Lawley1940).

Alternative formulations of FA have been proposed by several authors (de Leeuw, Reference de Leeuw2004; Sočan, Reference Sočan2003; Stegeman, Reference Stegeman2016). Among these procedures, matrix decomposition FA (MDFA; Adachi, Reference Adachi2022; Adachi & Trendafilov, Reference Adachi and Trendafilov2018) and its theoretical and empirical properties have been investigated in recent years. The distinguishing property of MDFA is that it fits the FA model directly to the data matrix; thus, it is not formulated as a covariance fitting problem. To present the formulation of MDFA, let $\mathbf { X}$ be the $n \times p$ data matrix composed of horizontally stacked $\mathbf { x}_1, \ldots , \mathbf { x}_n$ , where $\mathbf { x}_i$ is the p-dimensional vector of the ith observation; $\mathbf { X} = \{\mathbf { x}_1, \ldots , \mathbf { x}_n\}^{\prime }$ . Note that $\mathbf { X}$ is assumed to be a column-centered matrix. Similarly, let $\mathbf { F} = \{ \mathbf { f}_1, \ldots , \mathbf { f}_n \}^{\prime }$ and $\mathbf { U} = \{ \mathbf { u}_1, \ldots , \mathbf { u}_n \}^{\prime }$ be the matrices of common and unique factor scores for n observations, respectively. MDFA minimizes the least squares loss function

(2)

$$ \begin{align} L_{MDFA}(\mathbf{ Z}, \mathbf{ A}) = \sum_{i=1}^{n}||\mathbf{ x}_i - (\boldsymbol { \Lambda}\mathbf{ f}_i + \boldsymbol { \Psi}\mathbf{ u}_i)||^2 = ||\mathbf{ X} - (\mathbf{ F}\boldsymbol { \Lambda}^{\prime} + \mathbf{ U}\boldsymbol { \Psi})||^2 \end{align} $$

over the $n \times (p + r)$ matrix of factor scores $\mathbf { Z} = [\mathbf { F}, \mathbf { U}]$ and the $p \times (p + r)$ matrix $\mathbf { A} = [\boldsymbol { \Lambda }, \boldsymbol { \Psi }]$ . The loss function in (2) evaluates the loss of fitness of the model part $\mathbf { F}\boldsymbol { \Lambda }^{\prime } + \mathbf { U}\boldsymbol { \Psi }$ against the data matrix $\mathbf { X}$ . The constraint on $\mathbf { Z}$ is noted as

(3)

$$ \begin{align} \mathbf{ 1}_n^{\prime}\mathbf{ Z} = \mathbf{ 0}_{p+r}^{\prime},\ n^{-1}\mathbf{ Z}^{\prime}\mathbf{ Z} = \mathbf{ I}_{p+r}. \end{align} $$

Factor indeterminacy is a characteristic property of FA (Maraun, Reference Maraun1996; Steiger, Reference Steiger1979, Reference Steiger1996), which causes the standard and unique factor scores, $\mathbf { f}$ and $\mathbf { u}$ , to be not uniquely determined for all observations. Factor scores are also indeterminate in MDFA. Let the singular value decomposition of $n^{-1/2}\mathbf { X}\mathbf { A}$ be defined as

(4)

$$ \begin{align} n^{-1/2}\mathbf{ X}\mathbf{ A}= \tilde{\mathbf{K}}\tilde{\boldsymbol { \Theta}}\tilde{\mathbf{L}}^{\prime} = [\mathbf{ K}, \mathbf{ K}_{\perp}] \begin{bmatrix} \boldsymbol { \Theta} & \ \\ \ & \boldsymbol { \Theta}_{\perp} \end{bmatrix} \begin{bmatrix} \mathbf{ L}^{\prime}\\ \mathbf{ L}_{\perp}^{\prime} \end{bmatrix}, \end{align} $$

where $\tilde {\boldsymbol { \Theta }}$ is a $(p+r) \times (p+r)$ diagonal matrix of the singular values of $n^{-1/2}\mathbf { X}\mathbf { A}$ in descending order, and $\tilde {\mathbf {K}}(n \times (p+r))$ and $\tilde {\mathbf {L}}((p+r) \times (p+r))$ are matrices of the corresponding left and right singular vectors, respectively. Furthermore, the first p columns of $\tilde {\mathbf {K}}$ and $\tilde {\mathbf {L}}$ are denoted as $\mathbf { K}$ and $\mathbf { L}$ , respectively, and $\mathbf { K}_{\perp }$ and $\mathbf { L}_{\perp }$ are matrices of the remaining columns. $\boldsymbol { \Theta }$ is a $p \times p$ diagonal matrix of $\tilde {\boldsymbol { \Theta }}$ ’s first p diagonal elements, and $\boldsymbol { \Theta }_{\perp }$ is a diagonal matrix of the remaining elements. The optimal $\mathbf { Z}$ which minimizes (2) is given by $ \mathbf { Z} = n^{1/2}\tilde {\mathbf {K}}\tilde {\mathbf {L}}^{\prime } = n^{1/2}\mathbf { K}\mathbf { L}^{\prime } + n^{1/2}\mathbf { K}_{\perp }\mathbf { L}_{\perp }^{\prime }. $ This suggests that the matrix of the optimal factor scores comprises two parts: $n^{1/2}\mathbf { K}\mathbf { L}^{\prime } = \mathbf { Z}_d = [\mathbf { F}_d, \mathbf { U}_d]$ and $n^{1/2}\mathbf { K}_{\perp }\mathbf { L}_{\perp }^{\prime }= \mathbf { Z}_u = [\mathbf { F}_u, \mathbf { U}_u]$ . Here, we have $\text {rank}(n^{-1/2}\mathbf { X}\mathbf { A}) = p$ when $\mathbf { X}$ is a full-column-rank matrix where $\text {rank}(\bullet )$ denotes rank of a matrix. The r smallest singular values of $n^{-1/2}\mathbf { X}\mathbf { A}$ are zero, and $\boldsymbol { \Theta }_{\perp } = \mathbf { O}_{r \times r}$ . Therefore, $\mathbf { K}_{\perp }$ and $\mathbf { L}_{\perp }$ are not uniquely determined as long as they satisfy $\mathbf { K}^{\prime }\mathbf { K}_{\perp } = \mathbf { O}_{p \times r}$ and $\mathbf { L}^{\prime }\mathbf { L}_{\perp } = \mathbf { O}_{p \times r}$ . $\mathbf { Z}_u$ is not unique, indicating that the factor score matrix $\mathbf { Z}$ in MDFA is not uniquely determined. Note that $\mathbf { K}_{\perp }$ is not necessary to be column centered, while $\mathbf { K}$ is column centered as long as $\mathbf { 1}_n^{\prime }\mathbf { X} = \mathbf { 0}^{\prime }_p$ , and therefore $\mathbf { 1}^{\prime }_n\mathbf { Z} = \mathbf { 0}^{\prime }_{p+r}$ may not be satisfied in MDFA. MDFA explicitly separates the unique and non-unique parts of factor scores, $\mathbf { Z}_d$ and $\mathbf { Z}_u$ , allowing further consideration of factor score indeterminacy, unlike standard FA. Thus, we address this issue within the MDFA framework.

In this study, we propose a new method for identifying factor scores in the framework of MDFA. The proposed method is as an alternative to the existing procedure detailed in the next section, addressing its problems. Moreover, the proposed method proved to determine common and unique factor scores uniquely. The proposed method can be generalized to include various procedures dealing with latent variables.

To enhance the accessibility of our proposed methods, we have developed the R package mdfaScoreIden, available at https://github.com/nyamashita/mdfaScoreIden. This package includes the procedures introduced in this article.

The article is structured as follows. The next section reviews existing methods for specifying factor scores. The third section defines the proposed method, its optimization algorithm, and its properties, including the uniqueness of estimated scores. The fourth section presents a general formulation with special cases. The fifth and sixth sections report simulation studies and real data examples, respectively. The final section discusses conclusions and future research directions.

2 Existing procedures

The current section introduces two existing procedures to estimate or specify factor scores and addresses their limitations, as a motivation for the present study.

Although factor scores in FA are not uniquely determined, individual scores are often of interest in applied research. For example, Sergi et al. (Reference Sergi, Picconi, Tommasi, Saggino, Ebisch and Spoto2021) examined emotional intelligence, anxiety, and depression by regressing factor scores by FA, as did Gass (Reference Gass1996) and Zammuner (Reference Zammuner1998). To meet such needs, various post-hoc estimation methods exist, including regression, Bartlett’s (Bartlett, Reference Bartlett1937), and Anderson & Rubin’s (Anderson & Rubin, Reference Anderson, Rubin and Neyman1956) methods. Bartlett’s method estimates $\mathbf { F}$ by minimizing $ ||(\mathbf { X} - \mathbf { F}\hat {\boldsymbol { \Lambda }}^{\prime })\hat {\boldsymbol {\Phi }}^{-1/2}||^2 $ using parameter estimates $\hat {\boldsymbol { \Lambda }}$ , $\hat {\boldsymbol { \Psi }}$ , and the rotated factor correlation matrix $\hat {\boldsymbol {\Phi }}$ . The estimated factor scores, $\hat {\mathbf {F}}$ , are given by $ \hat {\mathbf {F}} = \mathbf {X}\hat {\boldsymbol { \Psi }}^{-1}\hat {\boldsymbol { \Lambda }}(\hat {\boldsymbol { \Lambda }}^{\prime } \hat {\boldsymbol {\Phi }}\hat {\boldsymbol { \Lambda }})^{-1\prime }. $ as a linear combination of observed variables in $\mathbf { X}$ .

In MDFA, Uno et al. (Reference Uno, Adachi and Trendafilov2019) proposed clustered common factor exploration (CCFE) as a pioneering method for resolving factor score indeterminacy. CCFE assumes that individuals are classified into a few clusters, and those in the same cluster share similar common factor scores” (Uno et al., Reference Uno, Adachi and Trendafilov2019), a desirable property for $\mathbf { F}$ . Based on this assumption, the indeterminate part of $\mathbf { Z}$ , $\mathbf { Z}_u$ , is identified as follows.

First, by applying MDFA, the uniquely determined part of $\mathbf { Z}$ is computed as a linear combination of the p observed variables in $\mathbf { X}$ . Next, let $\mathbf { M}(n \times k)$ be an unknown membership matrix with elements 0 or 1 and row sums equal to 1, and $\mathbf { C}(k \times p)$ be an unknown cluster centroid matrix. Then,

(5)

$$ \begin{align} L_{CCFE}(\mathbf{ M}, \mathbf{ C}, \mathbf{ Z}_u) = ||\mathbf{ MC} - (\mathbf{ F}_d + \mathbf{ F}_u)||^2 + ||\mathbf{ U}_u||^2 \end{align} $$

is minimized to determine $\mathbf { M}$ , $\mathbf { C}$ , and $\mathbf { Z}_u = [\mathbf { F}_u,\ \mathbf { U}_u]$ , where $\mathbf { F}_d$ is the matrix of uniquely determined common factor scores. The minimization of $L_{CCFE}(\mathbf { M}, \mathbf { C}, \mathbf { Z}_u)$ uniquely determines $\mathbf { Z}_u$ , as well as $\mathbf { Z}$ .

CCFE effectively identifies well-clustered common factor scores but has two limitations. First, classifying individuals into clusters based on factor scores is rarely ideal. In FA, it is often more meaningful to compare scores by demographics like gender or age rather than by estimated clusters. Factor scores numerically represent construct tendencies, such as intelligence, allowing comparisons across groups to reveal meaningful differences. For example, Weisberg et al. (Reference Weisberg, DeYoung and Hirsh2011) analyzes personality traits by comparing Big5 factor scores by gender, while Hughes et al. (Reference Hughes, Costello, Pearman, Razavi, Bedford-Petersen, Ludwig and Srivastava2021) examines socioeconomic indicators. These studies relate factor scores to external criteria, which CCFE does not. As a result, CCFE clusters may not align with demographic groups, making interpretation difficult. Recent advances in exploratory FA using an additional set of variables are found in Carrioza et al. (Reference Carrizosa, Guerrero, Romero-Morales and Satorra2019) to improve interpretability of extracted factors.

The second problem is that CCFE creates clustered common factor scores, even when the scores have no cluster structure. To demonstrate this issue, the CCFE algorithm with three clusters was specifically applied to a $500 \times 6$ data matrix $\mathbf { X}$ , which was synthesized in the following manner. $\mathbf { X}$ was synthesized as $\mathbf { X} = \mathbf { F}_{T}\boldsymbol { \Lambda }_T^{\prime } + \mathbf { U}_T\boldsymbol { \Psi }_T + 0.1 \times \mathbf { E}$ where $\mathbf { F}_T\ (500 \times 2)$ and $\mathbf { U}_T\ (500 \times 6)$ are randomly-generated matrices of factor scores satisfying $n^{-1}\mathbf { Z}^{\prime }\mathbf { Z} = \mathbf { I}_8$ , with $\mathbf { Z}_T = [\mathbf { F}_T, \mathbf { U}_T]$ . $\boldsymbol { \Lambda }_T (6 \times 2)$ and $\boldsymbol { \Psi }_T (6 \times 6)$ are defined as

(6)

$$ \begin{align} \boldsymbol { \Lambda}_T= \begin{bmatrix} 0.6 & 0.7 & -0.8 & 0.2 & 0.0 & 0.0\\ 0.0 & 0.0 & 0.3 & 0.7 & -0.8 & 0.9 \end{bmatrix}^{\prime} \end{align} $$

and $\boldsymbol { \Psi }_T = \textrm {dg}(\mathbf { I}_6 - \boldsymbol { \Lambda }_T\boldsymbol { \Lambda }_T^{\prime })$ , where $\textrm {dg}(\mathbf { H})$ stands for the diagonal matrix substituting the off-diagonal elements of the square matrix $\mathbf { H}$ with zeros. $\mathbf { E}\ (500 \times 6)$ is a matrix of residuals filled with randomly generated elements drawn from the uniform distribution $U(-1, 1)$ . Figure 1a,b displays scatter plots of true and estimated factor scores using CCFE.

Figure 1 Result of CCFE applied to an artificial data having no cluster structure; (a) true common factor scores and (b) common factor scores estimated by CCFE.

The figure on the left indicates that the common factor scores exhibit no cluster structure, which is used in the data matrix $\mathbf { X}$ ’s generation process. However, the estimated factor scores on the right exhibit a three-cluster structure, which is unreasonable. The cluster structure created here by CCFE can be considered as illusionary because it does not exist in the data generation process. Regrettably, CCFE is the only method for determining factor scores in MDFA. Given the critical issues highlighted here, developing an alternative to CCFE is imperative.

3 Proposed method

This section first presents the mathematical formulation of the proposed method, followed by the iterative algorithm for estimating the unknown parameters. Subsequent subsections discuss the proposed method’s theoretical properties. The specification of the tuning parameter involved in the proposed method is also addressed.

3.1 Formulation

We propose a new procedure for specifying the factor scores, called regression-based factor score exploration (RFE). RFE is formulated as the minimization of the least-squares loss function

(7)

$$ \begin{align} L_{RFE}(\mathbf{ Z}, \mathbf{ A}, \mathbf{ W}) = ||\mathbf{ X} - \mathbf{ Z}\mathbf{ A}^{\prime}||^2 + \lambda||\mathbf{ Y} - \mathbf{ F}\mathbf{ W}^{\prime}||^2 \end{align} $$

over the parameter matrices under the constraint on $\mathbf { Z}$ in (3). The first two parameter matrices, $\mathbf { Z}$ and $\mathbf { A}$ , match those in MDFA. In contrast, the newly introduced $\mathbf { W}(q \times r)$ represents regression coefficients for q external criteria on r common factors. $\mathbf { Y}$ , the $n \times q$ matrix of column-centered external criteria, is included. The first term in (7) is the MDFA’s loss, while the second represents multivariate regression loss, where $\mathbf { Y}$ is explained by $\mathbf { F}$ , with a predefined positive tuning parameter $\lambda $ . Unlike post-hoc factor score estimation, RFE jointly estimates all parameter matrices including uniquely determined factor scores.

3.2 Algorithm

The closed-form solution for minimizing (7) does not exist, and the parameter matrices are estimated by the following iterative algorithm starting from their initial values.

To derive the algorithm, we first find the optimal $\mathbf { Z}$ that minimizes the loss function in (7). First, the two block matrices $\tilde {\mathbf {X}}(n \times (2p+q))$ and $\mathbf { B}((2p+q) \times (p+r))$ are defined as $\tilde {\mathbf {X}} = [\mathbf { X}, \lambda ^{1/2}\mathbf { Y}, \ _n\mathbf { O}_p]$ and

(8)

$$ \begin{align} \mathbf{ B} = \begin{bmatrix} \mathbf{ A}^{\prime} & \tilde{\mathbf{W}}^{\prime} \end{bmatrix}^{\prime} \end{align} $$

respectively, where $\tilde {\mathbf {W}} = \textrm {bdiag}(\lambda ^{1/2}\mathbf { W}, \ _{p}\mathbf { O}_p)$ is a $(p+q) \times (p+r)$ matrix, and $\textrm {bdiag}(\bullet )$ denotes a block diagonal matrix of the arguments with appropriate dimensions. Using these, equation (7) can be rewritten as $ L_{RFE}(\mathbf { Z}, \mathbf { A}, \mathbf { W}) = \mathrm {tr}\tilde {\mathbf {X}}^{\prime }\tilde {\mathbf {X}} -2\mathrm {tr}\tilde {\mathbf {X}}^{\prime }\mathbf { Z}\mathbf { B}^{\prime } + n\mathrm {tr}\tilde {\mathbf {W}}^{\prime }\tilde {\mathbf {W}}. $ Therefore, minimizing equation (7) with respect to $\mathbf { Z}$ is equivalent to maximizing $f_{\mathbf { Z}}(\mathbf { Z}) = \mathrm {tr}(\tilde {\mathbf {X}}^{\prime }\mathbf {Z}\mathbf {B}^{\prime })$ under (3). Furthermore, the maximization of $f_{\mathbf { Z}}(\mathbf { Z})$ is achieved as $f_{\mathbf { Z}}(\mathbf { Z}) \leq \mathrm {tr}\tilde {\mathbf {\Delta }}$ using the singular value decomposition

(9)

$$ \begin{align} n^{-1/2}\tilde{\mathbf{X}}\mathbf{ B} = n^{-1/2}\mathbf{ X}^{\#}\mathbf{ B}^{\#} = \tilde{\mathbf{U}}\tilde{\boldsymbol{\Delta}}\tilde{\mathbf{V}}^{\prime} \end{align} $$

following ten Berge (Reference ten Berge1993), with $\mathbf { X}^{\#} = [\mathbf { X}, \lambda ^{1/2}\mathbf { Y}]$ and $\mathbf { B}^{\#}$ being the matrix omitting the last p rows of $\mathbf { B}$ . $\tilde {\mathbf { U}}(n \times (p+r))$ and $\tilde {\mathbf { V}}((p+r) \times (p+r))$ are matrices with the corresponding left and right singular vectors, respectively. $f_{\mathbf { Z}}(\mathbf { Z})$ is maximized when

(10)

$$ \begin{align} \mathbf{ Z} = \sqrt{n}\tilde{\mathbf{ U}}\tilde{\mathbf{ V}}^{\prime}. \end{align} $$

When $\mathbf { X}$ and $\mathbf { Y}$ are column-centered matrices, $\mathbf { 1}_n^{\prime }\tilde {\mathbf { U}} = \mathbf { 0}_{p+r}^{\prime }$ , so it is easy to verify that the $\mathbf { Z}$ in equation (10) satisfies (3). It will be shown that the $\mathbf { Z}$ in equation (10) is uniquely determined.

Next, we seek $\mathbf { A}$ that minimizes equation (7). Following Adachi & Trendafilov (Reference Adachi and Trendafilov2018), the optimal $\mathbf { A}$ is given by

(11)

$$ \begin{align} \mathbf{ A} = \left[ n^{-1}\mathbf{ X}^{\prime}\mathbf{ F}, \textrm{dg}\left( n^{-1}\mathbf{ X}^{\prime}\mathbf{ U} \right) \right]. \end{align} $$

Finally, the $\mathbf { W}$ that minimizes the loss function (7) is obtained as

(12)

$$ \begin{align} \mathbf{ W} = \mathbf{ Y}^{\prime}\mathbf{ F}(\mathbf{ F}^{\prime}\mathbf{ F})^{-1} = n^{-1}\mathbf{ Y}^{\prime}\mathbf{ F}. \end{align} $$

The derivation of the update formulae are presented in Supplementary Material 1.

Based on the above discussion, the iterative algorithm for parameter estimation in RFE can be summarized as follows:

Step 1: Assign appropriate initial values to $\mathbf { A}$ and $\mathbf { W}$ .
Step 2: Update $\mathbf { Z}$ using equation (10).
Step 3: Update $\mathbf { A}$ using equation (11).
Step 4: Update $\mathbf { W}$ using equation (12).
Step 5: If the decrease in the objective function between Steps 2–4 updates is less than $\epsilon $ , terminate the algorithm. Otherwise, return to Step 2.

The function value is guaranteed to converge as discussed in Supplementary Material 1. Note that random initialization is necessary, and the algorithm will be started from M different initial values. The solution that provides the smallest function value will be adopted as the final solution.

The number of common factors r and the tuning parameter $\lambda $ must be predefined, with $\lambda $ detailed in the next section. Given $\lambda $ , r can be determined using standard methods like parallel analysis (Humphreys & Montanelli, Reference Humphreys and Montanelli1975).

3.3 Properties

This subsection presents five distinctive properties that help emphasize the differences and similarities between the proposed method and existing procedures.

Uniqueness of factor scores

Theorem 1. The factor score matrix $\mathbf { Z}$ defined in (10) is uniquely determined when the following conditions hold:

(13)

$$ \begin{align} \mathrm{rank}(\mathbf{ X}) = p,\ \mathrm{rank}(\mathbf{ Y}) \geq r, \ \mathrm{rank}(\mathbf{ B}^{\#}) = p+r \end{align} $$

and $\mathrm {rank}([\mathbf {X}, \mathbf {Y}]) = \mathrm {rank}(\mathbf {X}) + \mathrm {rank}(\mathbf {Y})$ , indicating that columns of $\mathbf { X}$ and $\mathbf { Y}$ are linearly independent.

The proof is found in Supplementary Material 1. Among the conditions (13) necessary to uniquely determine $\mathbf { Z}$ , the first conditions are generally satisfied in FA, as well as the third condition also holds considering (11) and (12). In contrast, the second condition and the linear independence of $\mathbf { X}$ and $\mathbf { Y}$ introduce a requirement for external criteria in RFE. Specifically, to ensure the unique determination of $\mathbf { Z}$ , the number of external criteria must be at least equal to the number of factors when $\mathbf { Y}$ is a full-column-rank matrix.

Here, denote the parameter estimates obtained by RFE as $\hat {\mathbf {Z}} = [\hat {\mathbf {F}}, \hat {\mathbf {U}}]$ , $\hat {\mathbf {A}} = [\hat {\boldsymbol { \Lambda }}, \hat {\boldsymbol { \Psi }}]$ , and $\hat {\mathbf {W}}$ . Based on the uniqueness of $\mathbf {Z}$ discussed above, the data matrix $\mathbf {X}$ can be uniquely decomposed as $\mathbf {X} = \hat {\mathbf {F}}\hat {\boldsymbol { \Lambda }}^{\prime } + \hat {\mathbf {U}}\hat {\boldsymbol { \Psi }} + \hat {\mathbf {E}}$ , where $\hat {\mathbf {E}}$ is the $n \times p$ . Therefore, RFE allows to uniquely compute individual-wise residual by $\hat {\mathbf {E}} = \mathbf {X} - (\hat {\mathbf {F}}\hat {\boldsymbol { \Lambda }}^{\prime } + \hat {\mathbf {U}}\hat {\boldsymbol { \Psi }})$ , which indicates the degree of incompliance of the observed data to FA model.

Relationship between the observed and latent variables

Theorem 2. The factor score matrix $\mathbf { Z}$ obtained by (10) is a linear combination of p and q observed variables in $\mathbf { X}$ and $\mathbf { Y}$ , respectively.

See Supplementary Material 2 for the proof. The theorem indicates that the column space of the factor score matrix estimated by RFE, $Sp(\mathbf { Z})$ , is included in the column space of $\mathbf { X}^{\#} = [\mathbf { X}, \lambda ^{1/2}\mathbf { Y}]$ , $Sp(\mathbf { X}^{\#})$ ; $Sp(\mathbf { Z}) \subset Sp(\mathbf { X}^{\#})$ . This indicates that the factor score matrix obtained by RFE fundamentally differs from the factor scores obtained by standard FA followed by post-hoc estimation methods, which are represented as linear combinations of the observed variables $\mathbf { X}$ that are the subject of FA. However, the factor scores obtained by RFE cannot be represented in this way, and they are represented as linear combinations of $\mathbf { X}^{\#}$ , which includes the external criteria used in RFE.

Rotational indeterminacy

Equation (7) can be rewritten using the block diagonal matrix $\tilde {\mathbf { T}} = \textrm {bdiag}(\mathbf { T}, \mathbf { I}_p)$ , where $\mathbf { T}$ is an $r \times r$ non-singular matrix satisfying $\mathrm {diag}(\mathbf { T}^{\prime }\mathbf { T}) = \mathbf { I}_r$ , as $ L_{RFE}(\mathbf { Z}, \mathbf { A}, \mathbf { W}) = ||\mathbf { X} - \mathbf { Z}\tilde {\mathbf { T}}\tilde {\mathbf { T}}^{-1}\mathbf {A}^{\prime }||^2 + \lambda ||\mathbf { Y} - \mathbf { F}\mathbf { T}\mathbf { T}^{-1}\mathbf { W}^{\prime }||^2$ . From this, it can be seen that the parameter matrices $\mathbf { F}$ , $\boldsymbol { \Lambda }$ , and $\mathbf { W}$ in RFE are indeterminate with respect to nonsingular transformations by $\mathbf { T}$ . $\mathbf { T}$ can be specified by using the oblique rotational procedure, and see Supplementary Material 3 for details of the specification.

RFE as a penalized FA procedure

As previously mentioned, the balance between the first and second terms in the overall objective function, or the extent to which the FA estimates are influenced by the regression represented by the second term, is controlled by the tuning parameter $\lambda $ . Here, the second term in equation (7) can be expressed as

(14)

$$ \begin{align} \lambda||\mathbf{ Y} - \mathbf{ FW}^{\prime}||^2 = \tilde{\lambda} P_{\mathbf{ W}}(\mathbf{ W}) + 2\tilde{\lambda}P_{\mathbf{ F}, \mathbf{ W}}(\mathbf{ F}, \mathbf{ W}) + const, \end{align} $$

where $\tilde {\lambda } = n\lambda $ , $P_{\mathbf { W}} = ||\mathbf { W}||^2$ , and $P_{\mathbf { F}, \mathbf { W}}(\mathbf { F}, \mathbf { W}) = \mathrm {tr}\left (-n^{-1}\mathbf { Y}^{\prime }\mathbf { F}\mathbf { W}^{\prime }\right )$ , and $const.$ represents a constant term not related to $\mathbf { F}$ and $\mathbf { W}$ . Also, $n^{-1}\mathbf { Y}^{\prime }\mathbf { F}\mathbf { W}^{\prime }$ represents the covariance matrix of the q external criteria and the q composite variables weighted by $\mathbf { F}$ and $\mathbf { W}$ , considering $\mathbf { 1}_n^{\prime }\mathbf { Y} = \mathbf { 0}_q^{\prime }$ and $\mathbf { 1}_n^{\prime }\mathbf { FW}^{\prime } = \mathbf { 0}_q^{\prime }$ . From the above discussion, the objective function of RFE can be expressed as $ L_{RFE}(\mathbf { Z}, \mathbf { A}, \mathbf { W}) = L_{MDFA}(\mathbf { Z}, \mathbf { A}) + \tilde {\lambda } P_{\mathbf { W}}(\mathbf { W}) + 2\tilde {\lambda }P_{\mathbf { F}, \mathbf { W}}(\mathbf { F}, \mathbf { W}). $ This indicates that RFE represents penalized MDFA, where penalty terms are added to various parameters of FA. The first penalty term $P_{\mathbf { W}}(\mathbf { W})$ is an $L_2$ penalty term on $\mathbf { W}$ . The second penalty term $P_{\mathbf { F}, \mathbf { W}}(\mathbf { F}, \mathbf { W})$ acts to increase the sum of the column covariances of $\mathbf { Y}$ and $\mathbf { FW}^{\prime }$ , thereby forming $\mathbf { FW}^{\prime }$ proportional to $\mathbf { Y}$ .

Several methods for penalized estimation of parameter matrices in FA have been proposed (Choi et al., Reference Choi, Zou and Oehlert2011; Hirose & Terada, Reference Hirose and Terada2023; Hirose & Yamamoto, Reference Hirose and Yamamoto2015). These procedures aim to simplify $\boldsymbol { \Lambda }$ , which plays a central role in interpreting the FA’s results. In contrast, the RFE proposed in this study introduces penalized estimation of parameter matrices to uniquely determine factor scores by removing the indeterminacy of factor scores.

3.4 Specification of $\lambda $

When executing the RFE algorithm, it is necessary to set the value of $\lambda $ , which controls the balance between the first and second terms of the equation (7). The article recommends a semi-automatic procedure for selecting $\lambda $ within a predefined range, as detailed in Supplementary Material 5. Alternatively, a small fixed value such as $\lambda = 0.01$ can be used, as supported by the following observations. Theorem 1 shows that $\mathbf { Z}$ is uniquely determined under condition (13) when $\lambda> 0$ . Moreover, the simulation in the next section indicates that the RFE solution remains accurate and stable for small $\lambda $ . Another simulation, reported in Supplementary Material 5, confirms that the semi-automatic procedure consistently selects $\lambda = 0.01$ as optimal, demonstrating the effectiveness of using a small $\lambda $ .

4 Related procedures and generalization

The loss function in (7) can be generalized by considering not only the regression of $\mathbf { Y}$ on $\mathbf { F}$ but also the reverse regression:

(15)

$$ \begin{align} L_{MDFA}(\mathbf{ Z}, \mathbf{ A}) + \lambda||\mathbf{ Y}\mathbf{ V} - \mathbf{ F}\mathbf{ W}||^2. \end{align} $$

In this section, we demonstrate that by setting penalty terms on the parameter matrices added to the objective function of FA in various ways, we can endow the parameter matrices in FA with useful properties.

By setting the matrices other than $\mathbf { Y}$ in the second term of (15), we can represent RFE, where $\mathbf { V}=\mathbf { I}_q$ and $\mathbf { W}$ is treated as a free parameter.

Next, by setting $\mathbf { Y} = \mathbf { F}$ and treating it as a free parameter, while also setting $\mathbf { V}=\mathbf { I}_r$ , equation (15) corresponds to the objective function of MDSEM by Yamashita (Reference Yamashita2024). MDSEM is a modeling framework capable of handling various models, including regressions among latent variables, which works as a structural equation modeling (Bentler, Reference Bentler1980; Mulaik, Reference Mulaik1986). Note that the factor scores in MDSEM are not uniquely determined as proved in Yamashita (Reference Yamashita2024).

Another procedure is obtained by letting $\mathbf { Y}$ be the $n \times k$ membership matrix $\mathbf { M}$ , and $\mathbf { V}$ be the $k \times r$ centroid matrix $\mathbf { C}$ , with $\mathbf { W}=\mathbf { I}_r$ . This method formulates the second term as the objective function of K-means clustering analysis, which classifies the common factor scores $\mathbf { F}$ into k clusters:

(16)

$$ \begin{align} L_{CFE}(\mathbf{ Z}, \mathbf{ A}, \mathbf{ M}, \mathbf{ C}) = ||\mathbf{ X} - \mathbf{ ZA}^{\prime}||^2 + \lambda||\mathbf{ MC} - \mathbf{ F}||^2. \end{align} $$

Thus, similar to CCFE by Uno et al. (Reference Uno, Adachi and Trendafilov2019), the obtained factor score matrix is expected to form k clusters. Hereafter, we will refer to this method as clustering-based factor exploration (CFE). The parameter estimation algorithm in CFE is detailed in Supplementary Material 5.

A key limitation of CCFE is that it can create artificial cluster structures even when no cluster structure exists in the common factor scores. In contrast, CFE avoids this issue, as shown in Figure 2, where the cluster boundaries appear less distinct than in CCFE’s results. This difference is further reflected in the within-cluster variance and the distance between cluster centroids. For CFE, the minimum within-cluster variance and maximum between-cluster distance are 0.177 and 2.008, respectively, compared to 0.056 and 2.494 in CCFE. The clusters estimated by CCFE exhibit smaller within-cluster variance and more widely separated centroids than those obtained by CFE, indicating that CFE’s clusters are less distinct. This occurs because CFE balances FA and clustering through its penalization term in the loss function (16). Specifically, the tuning parameter was set to $\lambda = 10^{-8}$ , as determined by the procedure in Supplementary Material 4, suggesting that clustering plays a minimal role in (16).

Figure 2 Factor scores estimated by CFE are applied to artificial data with no cluster structure.

5 Numerical simulation

A numerical experiment was conducted to evaluate the effectiveness of the proposed method in accurately reproducing the true parameter matrices. This section reports the design and results of the experiment.

5.1 Design

The artificial data matrices $\mathbf { X}$ and $\mathbf { Y}$ , to which the RFE proposed in the article is applied, were generated using the following procedure. First, the true factor score matrix $\mathbf { Z}_T = [\mathbf { F}_T, \mathbf { U}_T]$ was randomly generated to satisfy (3). Specifically, an $n\times (p+r)$ matrix $\mathbf { Z}_T^{*}$ of the same size as $\mathbf { Z}_T$ was generated with elements randomly drown from a uniform distribution $U(-1, 1)$ . Then, $\mathbf { K}_{\mathbf { Z}}$ , the left singular vector matrix of the column-standardized $\mathbf { Z}_T^{*}$ , was obtained as $\mathbf { Z}_T = \sqrt {n}\mathbf { K}_{\mathbf { Z}}$ . Next, $\mathbf { Y}$ was constructed as $\mathbf { Y} = \mathbf { F}_T\mathbf { W}_T + \rho (\theta _{\mathbf { Y}})\mathbf { E}_{\mathbf { Y}}$ . Here, $\mathbf { W}_T$ is the true value of $\mathbf { W}$ , which is set as

(17)

$$ \begin{align} \mathbf{ W}_T = \begin{bmatrix} \# & \# & 0 & 0 \\ 0 & 0 & \# & \# \\ 0 & \# & 0 & \# \end{bmatrix}. \end{align} $$

The matrix’s non-zero elements noted as # were generated from a uniform distribution $U(0.6, 0.9)$ , with their signs randomly determined. Additionally, $\mathbf { E}_{\mathbf { Y}}$ is an $n \times q$ error matrix with elements generated from the standard normal distribution $N(0, 1)$ . The function $\rho (\theta _{\mathbf { Y}})$ adjusts the relative size of the error matrix, with $\theta _{\mathbf { Y}} \in [0, 1]$ representing the variance explained, defined in Yamashita & Mayekawa (Reference Yamashita and Mayekawa2015). Similarly, using the $n \times p$ error matrix $\mathbf { E}_{\mathbf { X}}$ whose elements were generated from $N(0, 1)$ , $\mathbf { X}$ was generated as $ \mathbf { X} = \mathbf { Z}_T[\boldsymbol { \Lambda }_T, \boldsymbol { \Psi }_T]^{\prime } + \rho (\theta _{\mathbf { X}})\mathbf { E}_{\mathbf { X}}. $ Here, $\boldsymbol { \Lambda }_T$ is the factor loading matrix defined as

(18)

$$ \begin{align} \boldsymbol { \Lambda}_T = \begin{bmatrix} \# & \# & 0 & 0 & 0 & 0 \\ 0 & 0 & \# & \# & 0 & 0 \\ 0 & 0 & 0 & 0 & \# & \# \end{bmatrix}^{\prime}. \end{align} $$

The non-zero elements were randomly generated from a uniform distribution $U(0.6, 0.9)$ , with their signs randomly chosen. The true value of the uniquenesses’ square root $\boldsymbol { \Psi }_T$ was set as $\boldsymbol { \Psi }_T = \textrm {dg}(\mathbf { I}_p - \boldsymbol { \Lambda }_T\boldsymbol { \Lambda }_T^{\prime })$ .

Three levels of error in the data, $\rho = \rho _{\mathbf { X}} = \rho _{\mathbf { Y}} = \{0.90, 0.85, 0.60\}$ , were considered. For each level, 100 pairs of $(\mathbf { X}, \mathbf { Y})$ were generated using the above procedure with $(n, p, q, r) = (100, 6, 3, 3)$ For each pair of data matrices, RFE was executed with six different values of $\lambda $ satisfying $\log (\lambda ) = -3, -2, \ldots , 2$ , and the estimated parameter matrices $\{\hat {\boldsymbol { \Lambda }}, \hat {\boldsymbol { \Psi }}, \hat {\mathbf {W}}, \hat {\mathbf {F}}, \hat {\mathbf {U}}\}$ were obtained.

The deviation of the estimated parameter matrices from their true values was evaluated using the root mean square error of approximation (RMSEA) for any matrix $\mathbf { M}(s \times t)$ and its true counterpart $\mathbf { M}_T$ : $RMSEA(\mathbf { M}, \mathbf { M}_T) = (st)^{-1}\sqrt {||\mathbf { M} - \mathbf { M}_T||^2}$ . The accuracy of the true values’ recovery for $\hat {\boldsymbol { \Lambda }}$ , $\hat {\mathbf {W}}$ , $\hat {\mathbf {F}}$ , and $\hat {\mathbf {U}}$ was evaluated using $RMSEA(\hat {\boldsymbol { \Lambda }}, \boldsymbol { \Lambda }_T)$ , $RMSEA(\hat {\mathbf {W}}, \mathbf { W}_T)$ , $RMSEA(\hat {\mathbf {F}}, \mathbf {F}_T)$ , and $RMSEA(\hat {\mathbf {U}}, \mathbf {U}_T)$ , respectively. For evaluating the recovery of $\boldsymbol { \Psi }_T$ , the uniquenesses calculated from the estimates and their true values were compared using $RMSEA(\mathrm {diag}(\hat {\boldsymbol \psi }), \mathrm {diag}({\boldsymbol \psi }_T))$ . $\hat {\boldsymbol \psi }$ and ${\boldsymbol \psi }_T$ denote the vectors of the diagonal elements of $\hat {\boldsymbol { \Psi }}^2$ and $\boldsymbol { \Psi }_T^2$ , respectively, which indicate the vectors of uniqueness.

The frequency of local solutions in each application of RFE was evaluated as follows. Let $M=100$ denote the number of initial values in RFE. After executing the algorithm, let the function values of equation (7) obtained post-convergence be $\hat {L}_{RFE}^{(1)}, \ldots , \hat {L}_{RFE}^{(100)}$ . The minimum function value is $\hat {L}_{RFE}^{(min)} = \min (\hat {L}_{RFE}^{(1)}, \ldots , \hat {L}_{RFE}^{(100)})$ . The number of j satisfying $\hat {L}_{RFE}^{(j)} - \hat {L}_{RFE}^{(min)}> 10^{-3}$ was counted as the frequency of local solutions, and this value divided by M was taken as the proportion of local solutions.

For comparison, we estimate maximum likelihood FA and Anderson & Rubin’s factor scores. RMSEA is computed for loadings, uniqueness, and common factor scores, but unique factor scores and $\mathbf { W}$ are unavailable in the existing method.

5.2 Result

Table 1 shows the median and standard deviation of RMSEA for the five parameter matrices for $\lambda = 0.01$ . The whole result is shown in Supplementary Material 6. For all parameter matrices, even in the case with the smallest variance explained rate, the median RMSEA is mostly below 0.1 with a small variance. This result suggests that the true parameter matrices recovered better than FA and Anderson & Rubin’s factor score. Additionally, the recovery accuracy of the parameter matrices does not change significantly with different values of $\lambda $ . This indicates that RFE is relatively robust to the value of $\lambda $ in terms of recovery accuracy of the parameter matrices, which is practically useful. The median and standard deviation of the frequency of local solutions are presented. Uno et al. (Reference Uno, Adachi and Trendafilov2019) empirically demonstrated that CCFE frequently encounters local solutions. In contrast, the proposed RFE method, as shown in the table, rarely encounters local solutions. Therefore, from the perspective of computational efficiency, RFE is superior to existing methods.

Table 1 Medians and standard deviations (s.d.) of $RMSEA$ s of the estimated parameter matrices obtained by RFE in each condition and frequency of local minimum for $\lambda = 0.01$

6 Applications to real data

6.1 Big five inventory data

In the first application, we applied RFE and related existing methods to responses from a 25-item Big Five Inventory (Goldberg, Reference Goldberg, Mervielde, Deary, Fruyt and Ostendorf1999). A total of 2,800 participants completed the inventory as part of an online personality assessment project (Revelle et al., Reference Revelle, Wilt, Rosenthal, Gruszka, Matthews and Szymura2010). For this analysis, we used responses to ten items designed to serve as indicators of two latent factors: Neuroticism and Openness. Additionally, the dataset included respondents’ gender.

After removing cases with missing values, we obtained a $2634 \times 10$ data matrix of item responses, denoted as $\mathbf { X}$ , for FA, and a $2634 \times 2$ dummy-coded matrix for gender, denoted as $\mathbf { Y}$ , to uniquely determine the factor scores. We applied RFE with two factors and compared it with maximum likelihood FA (MLFA). The regularization parameter for RFE was set to $\lambda = 0.01$ . Geomin rotation (Yates, Reference Yates1987) was applied to the factor loading matrices obtained by both RFE and MLFA.

Table 2 compares the factor loadings and uniquenesses from RFE and MLFA, showing similar estimates. The simple structure observed in both methods supports interpreting the two factors as Neuroticism and Openness, consistent with MLFA, thereby supporting the validity of RFE.

Table 2 Factor loading matrices, uniquenesses (Uniq.), and inter-factor correlation matrices obtained by RFE and FA with maximum likelihood estimation (MLFA)

Note: The loadings larger than 0.4 in absolute are bolded.

Factor scores of MLFA were estimated post hoc using Bartlett’s method. The correlation coefficients between the Bartlett’s scores and those obtained via RFE were 0.942 and 0.917 for the two factors, respectively, indicating that RFE produces factor scores comparable to those from conventional FA with post hoc estimation. The critical difference between RFE and existing methods lies in the factor score distributions across demographic groups. RFE emphasizes score differences between individuals—particularly between genders—more strongly than methods such as Bartlett’s and those of ten Berge et al. (Reference ten Berge, Krijnen, Wansbeek and Shapiro1999). Table 3 shows that gender differences in factor scores are more clearly captured with RFE = 0.95. These findings indicate that RFE not only yields unique factor scores but also enhances sensitivity to group-level differences that traditional methods tend to obscure.

Table 3 Factor score differences between genders (male $-$ female) and 95% confidence intervals of Cohen’s (Reference Cohen1988) d obtained by RFE, Bartlett’s, and ten Berge’s methods

Weisberg et al. (Reference Weisberg, DeYoung and Hirsh2011) reported that among the Big Five factors, Neuroticism shows the largest gender difference, while differences in Openness are often masked by overlapping score distributions. Our method reveals such subtle differences more clearly, making it useful for studies comparing groups, such as Weisberg et al. (Reference Weisberg, DeYoung and Hirsh2011).

6.2 Job impression dataset

In the fourth section, we proposed CFE as a variant of RFE. In this subsection, we empirically demonstrate the usefulness of CFE by comparing it with the existing method CCFE by Uno et al. (Reference Uno, Adachi and Trendafilov2019).

We applied CFE and CCFE to the occupational impression data (Committee for Guiding Psychological Experiments, 1985; Uno et al., Reference Uno, Adachi and Trendafilov2019) collected in Japan. The occupational impression data consists of ratings of fourteen occupations listed in Figure 3 using six adjectives: Useful, Good, Firm, Quick, Noisy, and Busy. The respondents were asked to evaluate the extent to which their impression of each occupation applies to each adjective and their responses were averaged over the respondents; therefore, the data matrix $\mathbf { X}$ used for the analysis is a $14 \times 6$ matrix. We applied CFE and CCFE to the standardized data matrix $\mathbf { X}$ with the number of factors and clusters set to 2 and 3, respectively. These settings for the number of factors and clusters were the same as those used by Uno et al. (Reference Uno, Adachi and Trendafilov2019) for comparison. In CFE, the value of $\lambda $ was set to 0.01. Additionally, to identify the orthogonal matrix $\mathbf { R}$ which makes the factor scores obtained by CFE and CCFE comparable, $\mathbf { R}$ was determined to minimize $ ||\hat {\mathbf {F}}_{CFE}\mathbf { R} - \hat {\mathbf {F}}_{CCFE}||^2 $ where $\hat {\mathbf {F}}_{CFE}$ and $\hat {\mathbf {F}}_{CCFE}$ represent the estimated common factor score matrices obtained by CFE and CCFE, respectively. The optimal $\mathbf { R}$ can be obtained by the orthogonal Procrustes rotation of $\hat {\mathbf {F}}_{CFE}$ with the target matrix $\hat {\mathbf {F}}_{CCFE}$ (Gower & Dijksterhuis, Reference Gower and Dijksterhuis2004). The CFE algorithm converged successfully in the current application. Standard FA, in contrast, can produce improper solutions with small samples in $\mathbf { X}$ , yielding near-zero uniqueness ( $<0.005$ ), close to negative variance. MDFA, CFE, and CCFE mitigate this issue, as discussed in the final section.

Figure 3 Factor scores estimated by CFE with three different $\lambda $ s (a-c) and CFE (d) applied to the job impression dataset. Note: In all figures, the horizontal and vertical axes of the figures stand for positive image and busyness, respectively. The occupations are abbreviated as follows: monk (MO) bank clerk (BC), comic artist (CT), designer (DE), nurse (NU), professor (PR), doctor (DR), police officer (PO), journalist (JO), sailor (SA), athlete (AT), novelist (NO), actor (AC), and cabin attendant (CA)

The estimated factor loading matrices and uniquenesses are shown in Supplementary Material 9. The two factor loading matrices exhibit very similar simple structures, with the first factor showing large positive loadings on Useful, Good, and Firm, which can be interpreted as a positive image of the occupation. The second factor shows high positive loadings on Quick, Noisy, and Busy, which can be interpreted as representing the busyness of the occupation. Figure 3 shows scatter plots of the estimated common factor scores obtained by RFE using three different values of $\lambda $ , including $\lambda = 0.01$ , and the factor scores obtained by CCFE. The plots of the factor scores obtained by RFE in Figure 3a and those obtained by CCFE in Figure 3d are very similar, demonstrating that CFE provides consistent factor score estimates with the existing method. The correlation coefficients between the factor scores obtained by CFE and CCFE were very high, 0.999 for the first factor and 0.990 for the second factor. Furthermore, the individual memberships obtained by both methods were completely consistent. The first cluster is composed of police officer, nurse, and doctor, the second cluster is composed of bank clerk, novelist, professor, and monk. The remaining seven occupations are in the third cluster. Figure 3 shows that comic artists and athletes are closely positioned within the same cluster, reflecting their shared cultural image as “good” and “busy” professions in Japan, where comic artists are also seen as engaging in demanding work (Ristola, Reference Ristola2016).

Figure 3b,d shows the plots of the factor scores obtained by applying CFE with different $\lambda $ values. The factor scores for $\lambda = 0.001$ are similar to those of CFE with $\lambda = 0.01$ and CCFE, but the factor scores for $\lambda = 1.000$ considerably differ from the others. The factor scores obtained with $\lambda = 1.000$ are positioned closer to the cluster centers for each individual, resulting in larger within-cluster variance than the other results. This indicates that the proposed CFE method allows adjusting the degree of cluster cohesion of the factor scores by tuning the value of $\lambda $ . Moreover, since the individual membership information output by CFE perfectly matches that of CCFE for all $\lambda $ values, it can be inferred that the cluster structures of individuals extracted by both methods are consistent with each other.

7 Concluding remarks

Factor score indeterminacy is a fundamental property of FA, but scores often need specification for further analysis. The proposed method RFE addresses this by integrating score estimation into FA using external criteria. Further, it generalizes to methods like CFE and MDSEM. This study focuses on CFE, which extends RFE’s penalty to enable clustering while avoiding artificial structures, improving on Uno et al. (Reference Uno, Adachi and Trendafilov2019).

Even though MDFA is less popular than standard FA, it offers several advantages. First, unlike standard FA, which treats factor scores as random variables, MDFA estimates them directly as model parameters (Adachi, Reference Adachi2019), making factor score specification reasonable. Second, MDFA avoids improper solutions where uniqueness is negative, as $\boldsymbol { \Psi }^2$ ensures non-negativity. Third, MDFA allows for observation-specific residual evaluation without additional computations, unlike standard FA (Adachi, Reference Adachi2022). Moreover, MDFA has proven to have statistical properties, including consistency (Terada, Reference Terada2024), making it a promising FA procedure.

The key difference between RFE and CCFE is their approach to factor score estimation. RFE uses penalized FA to uniquely estimate all parameters, including factor scores, making $\boldsymbol { \Lambda }$ and $\boldsymbol { \Psi }$ differ from standard FA and aligning RFE more with multi-task learning than strict score identification. CCFE, however, resolves factor score indeterminacy while keeping $\boldsymbol { \Lambda }$ and $\boldsymbol { \Psi }$ identical to standard FA. Simulations show RFE estimates true parameters well when $\lambda $ is small, making its penalization impact minimal and unlikely to affect interpretation.

The unique determination of $\mathbf { Z}$ allows $\mathbf { X}$ to be decomposed as $\mathbf { X} = \hat {\mathbf {F}}\hat {\boldsymbol { \Lambda }}^{\prime } + \hat {\mathbf {U}}\hat {\boldsymbol { \Psi }} + \hat {\mathbf {E}}$ , where the first term represents common factor scores, the second unique factor scores, and the third unexplained variance shown. This decomposition enables individual analysis without post-hoc calculations required in standard FA. However, interpreting unique factors requires more discussion, as they are rarely considered in FA.

Semiparametric and regularized FA methods perform well under the assumptions on loading homogeneity and technical identification constraints (Guo & Li, Reference Guo and Li2022; Li et al., Reference Li, Zhang and Kong2018). This study instead highlights the importance of individual-level factor scores, which are often of primary interest in psychological research. Such applications require scores to be uniquely specified and interpretable. The proposed method addresses this need by resolving indeterminacy through regularization with external information. To the best of our knowledge, no existing method has provided a unified approach that ensures both uniqueness and interpretability of individual scores. Our method uniquely determines factor scores in a way that reflects meaningful patterns, such as demographic differences, which are difficult to capture with existing approaches.

Despite their advantages, the proposed methods have limitations. A key issue is the reliance on well-chosen external criteria in RFE; if poorly selected, factor scores may be ambiguous or less interpretable. Proper $\lambda $ selection in RFE and CFE is also critical, as improper tuning can distort clustering or score determination. While sufficiently small $\lambda $ works well, further research is needed for broader applicability.

Future research directions include extending RFE and CFE to more complex models, such as hierarchical or multi-level FA and structural equation modeling, to broaden their applicability. Additionally, exploring integrating alternative penalty functions within the RFE framework could improve parameter estimation and interpretability.

In conclusion, RFE and CFE represent significant advancements in FA, offering unique solutions to the limitations of traditional methods. Their robust performance in simulations and real data applications underscores their practical relevance and potential for broad application across various research domains. These methods provide a solid foundation for future developments in FA and related fields, promising enhanced interpretability and flexibility in understanding complex data structures.

Supplementary material

The supplementary material for this article can be found at https://doi.org/10.1017/psy.2025.10025.

Data availability statement

R package mdfaScoreIden is available via GitHub. Please visit https://github.com/nyamashita/mdfaScoreIden for detail.

Acknowledgements

The author is deeply grateful to the associate editor and the three anonymous reviewers for their careful reviews and constructive comments for improving the quality of the paper.

Author contributions

The author confirms sole responsibility for the following: study conception and mathematical development, data collection, analysis and interpretation of results, and manuscript preparation.

Funding statement

This research was supported by JSPS KAKENHI Grant Number 23K16854.

Competing interests

The author declares no competing interests exist.

References

Adachi, K. (2019). Factor analysis: Latent variable, matrix decomposition, and constrained uniqueness formulations. Wiley Interdisciplinary Reviews: Computational Statistics, 11(3), e1458.Google Scholar

Adachi, K. (2022). Factor analysis procedures revisited from the comprehensive model with unique factors decomposed into specific factors and errors. Psychometrika, 87(3), 967–991.Google Scholar

Adachi, K., & Trendafilov, N. T. (2018). Some mathematical properties of the matrix decomposition solution in factor analysis. Psychometrika, 83, 407–424.Google Scholar

Anderson, T. W., & Rubin, H. (1956). Statistical Inference in factor analysis. In Neyman, J. (Ed.), Proceedings of the Berkeley Symposium on Mathematical Statistics and Probability, 5 (pp. 111–150). University of California Press.Google Scholar

Bartholomew, D. J., Knott, M., & Moustaki, I. (2011). Latent variable models and factor analysis: A unified approach. John Wiley & Sons.Google Scholar

Bartlett, M. S. (1937). The statistical conception of mental factors. British Journal of Psychology, 28, 97.Google Scholar

Bentler, P. M. (1980). Multivariate analysis with latent variables: Causal modeling. Annual Review of Psychology, 31(1), 419–456.Google Scholar

Carrizosa, A., Guerrero, V., Romero-Morales, D., & Satorra, A. (2019). Enhancing interpretability in factor analysis by means of mathematical optimization. Multivariate Behavioral Research, 55, 748–762.Google Scholar

Choi, J., Zou, H., & Oehlert, G. (2011). A penalized maximum likelihood approach to sparse factor analysis. Statistical Interface, 3(4), 429–436.Google Scholar

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2^nd ed.) Academic Press.Google Scholar

Committee for Guiding Psychological Experiments. (1985). Experiments and tests: The foundations of psychology (Explanation version). Bifukan (in Japanese).Google Scholar

de Leeuw, J. (2004). Least squares optimal scaling of partially observed linear systems. In K. Montfort, J. Oud, & A. Satorra (Eds.), Recent developments on structural equation models: Theory and applications (pp. 121–134). Springer.Google Scholar

Gass, C. S. (1996). MMPI-2 variables in attention and memory test performance. Psychological Assessment, 8(2), 135–138.Google Scholar

Goldberg, L. R. (1999). A broad-bandwidth, public domain, personality inventory measuring the lower-level facets of several five-factor models. In Mervielde, I., Deary, I. J., De Fruyt, F., & Ostendorf, F. (Eds.), Personality psychology in Europe (Vol. 7, pp. 7–28). Tilburg University Press.Google Scholar

Gower, J. C., & Dijksterhuis, G. B. (2004). Procrustes problems (Vol. 30). OUP Oxford.Google Scholar

Guo, C., & Li, J. (2022) Homogeneity and structure identification in semiparameteric factor models. Journal of Business & Ecomonic Statistics, 40, 408–422.Google Scholar

Harman, H. H. (1976). Modern factor analysis (3^rd ed.) The University of Chicago Press.Google Scholar

Hirose, K., & Terada, Y. (2023). Sparse and simple structure estimation via prenet penalization. Psychometrika, 88, 1381–1406.Google Scholar

Hirose, K., & Yamamoto, M. (2015). Sparse estimation via nonconcave penalized likelihood in factor analysis model. Statistics and Computing, 25, 863–875.Google Scholar

Hughes, B. T., Costello, C. K., Pearman, J., Razavi, P., Bedford-Petersen, C., Ludwig, R. M., & Srivastava, S. (2021). The Big Five across socioeconomic status: Measurement invariance, relationships, and age trends. Collabra: Psychology, 7(1), 24431.Google Scholar

Humphreys, L. G., & Montanelli, R. G. Jr. (1975). An investigation of the parallel analysis criterion for determining the number of common factors. Multivariate Behavioral Research, 10(2), 193–205.Google Scholar

Jöreskog, K. G. (1967). Some contributions to maximum likelihood factor analysis. Psychometrika, 32, 443–482.Google Scholar

Lawley, D. N. (1940). The estimation of factor loadings by the method of maximum likelihood. Proceedings of the Royal Society of Edinburgh, 60, 64–82.Google Scholar

Li, J., Zhang, W., & Kong, E. (2018). Factor models for asset returns based on transformed factors. Journal of Economitrics, 207, 432–448.Google Scholar

Maraun, M. D. (1996). Metaphor taken as math: Indeterminancy in the factor analysis model. Multivariate Behavioral Research, 31(4), 517–538.Google Scholar

Mulaik, S. A. (1986). Factor analysis and Psychometrika: Major developments. Psychometrika, 51, 23–33.Google Scholar

Revelle, W., Wilt, J., & Rosenthal, A. (2010). Individual differences in cognition: New methods for examining the personality–cognition link. In Gruszka, A., Matthews, G., & Szymura, B. (Eds.), Handbook of individual differences in cognition: Attention, memory and executive control (pp. 27–49). Springer.Google Scholar

Ristola, J. (2016). Blood, sweat, ink, and tears: Exploitation of labour in the Japanese animation industry. In GLRC graduate student symposium 2016 (pp. 85–93).Google Scholar

Sergi, M. R., Picconi, L., Tommasi, M., Saggino, A., Ebisch, S. J. H., & Spoto, A. (2021). The role of gender in the association among the emotional intelligence, anxiety and depression, Frontiers in Psychology, 12, 747702.Google Scholar

Sočan, G. (2003). The incremental value of minimum rank factor analysis [PhD thesis]. University of Groningen.Google Scholar

Stegeman, A. (2016). A new method for simultaneous estimation of the factor model parameters, factor scores, and unique parts. Computational Statistics & Data Analysis, 99, 189–203.Google Scholar

Steiger, J. H. (1979). Factor indeterminacy in the 1930’s and the 1970’s some interesting parallels. Psychometrika, 44, 157–167.Google Scholar

Steiger, J. H. (1996). Coming full circle in the history of factor indeterminancy. Multivariate Behavioral Research, 31(4), 617–630.Google Scholar

ten Berge, J. M. F. (1993). Least squares optimization in multivariate analysis. DSWO Press, Leiden University.Google Scholar

ten Berge, J. M. F., Krijnen, W. P., Wansbeek, T., & Shapiro, A. (1999). Some new results on correlation-preserving factor scores prediction methods. Linear Algebra and its Applications, 289, 311–318.Google Scholar

Terada, Y. (2024). Statistical properties of matrix decomposition factor analysis [Preprint]. arXiv preprint arXiv:2403.06968.Google Scholar

Uno, K., Adachi, K., & Trendafilov, N. T. (2019). Clustered common factor exploration in factor analysis. Psychometrika, 84(4), 1048–1067.Google Scholar

Weisberg, Y. J., DeYoung, C. G., & Hirsh, J. B. (2011). Gender differences in personality across the ten aspects of the Big Five. Frontiers in Psychology, 2, 11757.Google Scholar

Yates, A. (1987). Multivariate exploratory data analysis: A perspective on exploratory factor analysis. Suny Press.Google Scholar

Yamashita, N. (2024). Matrix decomposition approach for structural equation modeling as an alternative to covariance structure analysis and its theoretical properties. Structural Equation Modeling: A Multidisciplinary Journal, 31(5), 817–834.Google Scholar

Yamashita, N., & Mayekawa, S. (2015). A new biplot procedure with joint classification of objects and variables by fuzzy c-means clustering. Advances in Data Analysis and Classification, 9, 243–266.Google Scholar

Zammuner, V. L. (1998). Concepts of emotion: “Emotionness”, and dimensional ratings of Italian emotion words. Cognition & Emotion, 12(2), 243–272.Google Scholar

Figure 1 Result of CCFE applied to an artificial data having no cluster structure; (a) true common factor scores and (b) common factor scores estimated by CCFE.

Figure 2 Factor scores estimated by CFE are applied to artificial data with no cluster structure.

Table 1 Medians and standard deviations (s.d.) of $RMSEA$s of the estimated parameter matrices obtained by RFE in each condition and frequency of local minimum for $\lambda = 0.01$

Table 2 Factor loading matrices, uniquenesses (Uniq.), and inter-factor correlation matrices obtained by RFE and FA with maximum likelihood estimation (MLFA)

Table 3 Factor score differences between genders (male $-$ female) and 95% confidence intervals of Cohen’s (1988) d obtained by RFE, Bartlett’s, and ten Berge’s methods

Figure 3 Factor scores estimated by CFE with three different $\lambda $s (a-c) and CFE (d) applied to the job impression dataset. Note: In all figures, the horizontal and vertical axes of the figures stand for positive image and busyness, respectively. The occupations are abbreviated as follows: monk (MO) bank clerk (BC), comic artist (CT), designer (DE), nurse (NU), professor (PR), doctor (DR), police officer (PO), journalist (JO), sailor (SA), athlete (AT), novelist (NO), actor (AC), and cabin attendant (CA)

Yamashita supplementary material

File 145.3 KB

Article contents

Identification of Factor Scores by Regression with External Variables in Exploratory Factor Analysis

Abstract

Keywords

Information

1 Introduction

2 Existing procedures

3 Proposed method

3.1 Formulation

3.2 Algorithm

3.3 Properties

Uniqueness of factor scores

Relationship between the observed and latent variables

Rotational indeterminacy

RFE as a penalized FA procedure

3.4 Specification of $\lambda $

4 Related procedures and generalization

5 Numerical simulation

5.1 Design

5.2 Result

6 Applications to real data

6.1 Big five inventory data

6.2 Job impression dataset

7 Concluding remarks

Supplementary material

Data availability statement

Acknowledgements

Author contributions

Funding statement

Competing interests

References

Yamashita supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests