We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This paper presents a cross-language study of lexical semantics within the framework of distributional semantics. We used a wide range of predefined semantic categories in Mandarin and English and compared the clusterings of these categories using FastText word embeddings. Three techniques of dimensionality reduction were applied to mapping 300-dimensional FastText vectors into two-dimensional planes: multidimensional scaling, principal components analysis, and t-distributed stochastic neighbor embedding. The results show that t-SNE provides the clearest clustering of semantic categories, improving markedly on PCA and MDS. In both languages, we observed similar differentiation between verbs, adjectives, and nouns as well as between concrete and abstract words. In addition, the methods applied in this study, especially Procrustes analysis, make it possible to trace subtle differences in the structure of the semantic lexicons of Mandarin and English.
This paper presents a new hierarchical classes model, called Tucker2-HICLAS, for binary three-way three-mode data. As any three-way hierarchical classes model, the Tucker2-HICLAS model includes a representation of the association relation among the three modes and a hierarchical classification of the elements of each mode. A distinctive feature of the Tucker2-HICLAS model, being closely related to the Tucker3-HICLAS model (Ceulemans, Van Mechelen & Leenen, 2003), is that one of the three modes is minimally reduced and, hence, that the differences among the association patterns of the elements of this mode are maximally retained in the model. Moreover, as compared to Tucker3-HICLAS, Tucker2-HICLAS implies three rather than four different types of parameters and as such is simpler to interpret. Two types of Tucker2-HICLAS models are distinguished: a disjunctive and a conjunctive type. An algorithm for fitting the Tucker2-HICLAS model is described and evaluated in a simulation study. The model is illustrated with longitudinal data on interpersonal emotions.
This paper describes the conjunctive counterpart of De Boeck and Rosenberg's hierarchical classes model. Both the original model and its conjunctive counterpart represent the set-theoretical structure of a two-way two-mode binary matrix. However, unlike the original model, the new model represents the row-column association as a conjunctive function of a set of hypothetical binary variables. The conjunctive nature of the new model further implies that it may represent some conjunctive higher order dependencies among rows and columns. The substantive significance of the conjunctive model is illustrated with empirical applications. Finally, it is shown how conjunctive and disjunctive hierarchical classes models relate to Galois lattices, and how hierarchical classes analysis can be useful to construct lattice models of empirical data.
Two-mode binary data matrices arise in a variety of social network contexts, such as the attendance or non-attendance of individuals at events, the participation or lack of participation of groups in projects, and the votes of judges on cases. A popular method for analyzing such data is two-mode blockmodeling based on structural equivalence, where the goal is to identify partitions for the row and column objects such that the clusters of the row and column objects form blocks that are either complete (all 1s) or null (all 0s) to the greatest extent possible. Multiple restarts of an object relocation heuristic that seeks to minimize the number of inconsistencies (i.e., 1s in null blocks and 0s in complete blocks) with ideal block structure is the predominant approach for tackling this problem. As an alternative, we propose a fast and effective implementation of tabu search. Computational comparisons across a set of 48 large network matrices revealed that the new tabu-search heuristic always provided objective function values that were better than those of the relocation heuristic when the two methods were constrained to the same amount of computation time.
In this paper, hierarchical and non-hierarchical tree structures are proposed as models of similarity data. Trees are viewed as intermediate between multidimensional scaling and simple clustering. Procedures are discussed for fitting both types of trees to data. The concept of multiple tree structures shows great promise for analyzing more complex data. Hybrid models in which multiple trees and other discrete structures are combined with continuous dimensions are discussed. Examples of the use of multiple tree structures and hybrid models are given. Extensions to the analysis of individual differences are suggested.
A monotone invariant method of hierarchical clustering based on the Mann-Whitney U-statistic is presented. The effectiveness of the complete-link, single-link, and U-statistic methods in recovering tree structures from error perturbed data are evaluated. The U-statistic method is found to be consistently more effective in recovering the original tree structures than either the single-link or complete-link methods.
Extended redundancy analysis (ERA), a generalized version of redundancy analysis (RA), has been proposed as a useful method for examining interrelationships among multiple sets of variables in multivariate linear regression models. As a limitation of the extant RA or ERA analyses, however, parameters are estimated by aggregating data across all observations even in a case where the study population could consist of several heterogeneous subpopulations. In this paper, we propose a Bayesian mixture extension of ERA to obtain both probabilistic classification of observations into a number of subpopulations and estimation of ERA models within each subpopulation. It specifically estimates the posterior probabilities of observations belonging to different subpopulations, subpopulation-specific residual covariance structures, component weights and regression coefficients in a unified manner. We conduct a simulation study to demonstrate the performance of the proposed method in terms of recovering parameters correctly. We also apply the approach to real data to demonstrate its empirical usefulness.
A Generalized INDCLUS model, termed GINDCLUS, is presented for clustering three-way two-mode proximity data. In order to account for the heterogeneity of the data, both a partition of the subjects into homogeneous classes and a covering of the objects into groups are simultaneously determined. Furthermore, the availability of information which is external to the three-way data is exploited to better account for such heterogeneity: the weights of both classifications are linearly linked to external variables allowing for the identification of meaningful classes of subjects and groups of objects. The model is fitted in a least-squares framework, and an efficient Alternating Least-Squares algorithm is provided. An extensive simulation study and an application on benchmark data are also presented.
In many psychological research domains stimulus-response profiles are explained by conjecturing a sequential process in which some variables mediate between stimuli and responses. Charting sequential processes is often a complex task because (1) many possible mediating variables may exist, and (2) interindividual differences may occur in the relationship between these mediating variables and the response. Recently, Ceulemans and Van Mechelen (Psychometrika 73(1):107–124, 2008) addressed these challenges by developing the CLASSI model. A major drawback of CLASSI is that it requires information about the same set of stimuli for all participants (i.e., crossed data), whereas recently a number of data gathering techniques have been proposed in which the set of stimuli differs across participants, yielding nested data. Therefore we present the CLASSI-N model, which extends the CLASSI model to nested data. A simulated annealing algorithm is proposed. The results of a simulation study are discussed as well as an application to data concerning depression.
In this paper, we consider a class of models for two-way matrices with binary entries of 0 and 1. First, we consider Boolean matrix decomposition, conceptualize it as a latent response model (LRM) and, by making use of this conceptualization, generalize it to a larger class of matrix decomposition models. Second, probability matrix decomposition (PMD) models are introduced as a probabilistic version of this larger class of deterministic matrix decomposition models. Third, an algorithm for the computation of the maximum likelihood (ML) and the maximum a posteriori (MAP) estimates of the parameters of PMD models is presented. This algorithm is an EM-algorithm, and is a special case of a more general algorithm that can be used for the whole class of LRMs. And fourth, as an example, a PMD model is applied to data on decision making in psychiatric diagnosis.
A three-way three-mode extension of De Boeck and Rosenberg's (1988) two-way two-mode hierarchical classes model is presented for the analysis of individual differences in binary object × attribute arrays. In line with the two-way hierarchical classes model, the three-way extension represents both the association relation among the three modes and the set-theoretical relations among the elements of each model. An algorithm for fitting the model is presented and evaluated in a simulation study. The model is illustrated with data on psychiatric diagnosis. Finally, the relation between the model and extant models for three-way data is discussed.
Quite a few studies in the behavioral sciences result in hierarchical time profile data, with a number of time profiles being measured for each person under study. Associated research questions often focus on individual differences in profile repertoire, that is, differences between persons in the number and the nature of profile shapes that show up for each person. In this paper, we introduce a new method, called KSC-N, that parsimoniously captures such differences while neatly disentangling variability in shape and amplitude. KSC-N induces a few person clusters from the data and derives for each person cluster the types of profile shape that occur most for the persons in that cluster. An algorithm for fitting KSC-N is proposed and evaluated in a simulation study. Finally, the new method is applied to emotional intensity profile data.
In this paper, we propose a cluster-MDS model for two-way one-mode continuous rating dissimilarity data. The model aims at partitioning the objects into classes and simultaneously representing the cluster centers in a low-dimensional space. Under the normal distribution assumption, a latent class model is developed in terms of the set of dissimilarities in a maximum likelihood framework. In each iteration, the probability that a dissimilarity belongs to each of the blocks conforming to a partition of the original dissimilarity matrix, and the rest of parameters, are estimated in a simulated annealing based algorithm. A model selection strategy is used to test the number of latent classes and the dimensionality of the problem. Both simulated and classical dissimilarity data are analyzed to illustrate the model.
A least squares algorithm for fitting additive trees to proximity data is described. The algorithm uses a penalty function to enforce the four point condition on the estimated path length distances. The algorithm is evaluated in a small Monte Carlo study. Finally, an illustrative application is presented.
Often problems result in the collection of coupled data, which consist of different N-way N-mode data blocks that have one or more modes in common. To reveal the structure underlying such data, an integrated modeling strategy, with a single set of parameters for the common mode(s), that is estimated based on the information in all data blocks, may be most appropriate. Such a strategy implies a global model, consisting of different N-way N-mode submodels, and a global loss function that is a (weighted) sum of the partial loss functions associated with the different submodels. In this paper, such a global model for an integrated analysis of a three-way three-mode binary data array and a two-way two-mode binary data matrix that have one mode in common is presented. A simulated annealing algorithm to estimate the model parameters is described and evaluated in a simulation study. An application of the model to real psychological data is discussed.
This paper presents a new model for binary three-way three-mode data, called Tucker3 hierarchical classes model (Tucker3-HICLAS). This new model generalizes Leenen, Van Mechelen, De Boeck, and Rosenberg's (1999) individual differences hierarchical classes model (INDCLAS). Like the INDCLAS model, the Tucker3-HICLAS model includes a hierarchical classification of the elements of each mode, and a linking structure among the three hierarchies. Unlike INDCLAS, Tucker3-HICLAS (a) does not restrict the hierarchical classifications of the three modes to have the same rank, and (b) allows for more complex linking structures among the three hierarchies. An algorithm to fit the Tucker3-HICLAS model is described and evaluated in an extensive simulation study. An application of the model to hostility data is discussed.
Similarity data can be represented by additive trees. In this model, objects are represented by the external nodes of a tree, and the dissimilarity between objects is the length of the path joining them. The additive tree is less restrictive than the ultrametric tree, commonly known as the hierarchical clustering scheme. The two representations are characterized and compared. A computer program, ADDTREE, for the construction of additive trees is described and applied to several sets of data. A comparison of these results to the results of multidimensional scaling illustrates some empirical and theoretical advantages of tree representations over spatial representations of proximity data.
The p-median offers an alternative to centroid-based clustering algorithms for identifying unobserved categories. However, existing p-median formulations typically require data aggregation into a single proximity matrix, resulting in masked respondent heterogeneity. A proposed three-way formulation of the p-median problem explicitly considers heterogeneity by identifying groups of individual respondents that perceive similar category structures. Three proposed heuristics for the heterogeneous p-median (HPM) are developed and then illustrated in a consumer psychology context using a sample of undergraduate students who performed a sorting task of major U.S. retailers, as well as a through Monte Carlo analysis.
A Cultural Consensus Theory approach for ordinal data is developed, leading to a new model for ordered polytomous data. The model introduces a novel way of measuring response biases and also measures consensus item values, a consensus response scale, item difficulty, and informant knowledge. The model is extended as a finite mixture model to fit both simulated and real multicultural data, in which subgroups of informants have different sets of consensus item values. The extension is thus a form of model-based clustering for ordinal data. The hierarchical Bayesian framework is utilized for inference, and two posterior predictive checks are developed to verify the central assumptions of the model.
In this paper we investigated two of the most common representations of proximities, two-dimensional euclidean planes and additive trees. Our purpose was to develop guidelines for comparing these representations, and to discover properties that could help diagnose which representation is more appropriate for a given set of data. In a simulation study, artificial data generated either by a plane or by a tree were scaled using procedures for fitting either a plane (KYST) or a tree (ADDTREE). As expected, the appropriate model fit the data better than the inappropriate model for all noise levels. Furthermore, the two models were roughly comparable: for all noise levels, KYST accounted for plane data about as well as ADDTREE accounted for tree data. Two properties of the data proved useful in distinguishing between the models: the skewness of the distribution of distances, and the proportion of elongated triangles, which measures departures from the ultrametric inequality, Applications of KYST and ADDTREE to some twenty sets of real data, collected by other investigators, showed that most of these data could be classified clearly as favoring either a tree or a two-dimensional representation.