We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This chapter moves from regression to methods that focus on the pattern presented by multiple variables, albeit with applications in regression analysis. A strong focus is to find patterns that beg further investigation, and/or replace many variables by a much smaller number that capture important structure in the data. Methodologies discussed include principal components analysis and multidimensional scaling more generally, cluster analysis (the exploratory process that groups “alike” observations) and dendogram construction, and discriminant analysis. Two sections discuss issues for the analysis of data, such as from high throughput genomics, where the aim is to determine, from perhaps thousands or tens of thousands of variables, which are shifted in value between groups in the data. A treatment of the role of balance and matching in making inferences from observational data then follows. The chapter ends with a brief introduction to methods for multiple imputation, which aims to use multivariate relationships to fill in missing values in observations that are incomplete, allowing them to have at least some role in a regression or other further analysis.
In many applications, dimensionality reduction is important. Uses of dimensionality reduction include visualization, removing noise, and decreasing compute and memory requirements, such as for image compression. This chapter focuses on low-rank approximation of a matrix. There are theoretical models for why big matrices should be approximately low rank. Low-rank approximations are also used to compress large neural network models to reduce computation and storage. The chapter begins with the classic approach to approximating a matrix by a low-rank matrix, using a nonconvex formulation that has a remarkably simple singular value decomposition solution. It then applies this approach to the source localization application via the multidimensional scaling method and to the photometric stereo application. It then turns to convex formulations of low-rank approximation based on proximal operators that involve singular value shrinkage. It discusses methods for choosing the rank of the approximation, and describes the optimal shrinkage method called OptShrink. It discusses related dimensionality reduction methods including (linear) autoencoders and principal component analysis. It applies the methods to learning low-dimensionality subspaces from training data for subspace-based classification problems. Finally, it extends the method to streaming applications with time-varying data. This chapter bridges the classical singular value decomposition tool with modern applications in signal processing and machine learning.
Is culture the glue that holds the social structures of society together? Or are there “culture wars” that fundamentally divide us? Clearly, the answer is somewhere in the middle, and trying to understand precisely how culture and social structure interrelate to unite or divide remains a core sociological endeavor. Social network analysis alone cannot resolve such an enormous puzzle, but its methods provide important tools for formalizing a jointly structural and cultural approach to studying society. In this chapter, we conclude Part II on Seeing Structure by outlining efforts to see dualities in the connections between structure and culture – that is, to study how enduring patterns of interaction interrelate with shared understandings, tastes, meanings, and other attitudinal measures. We also discuss the structural analysis of meanings themselves and the application of social network techniques to cultural phenomena.
Another way of examining the patterns among objects based on multiple variables is to plot the objects in multidimensional space based on their pairwise dissimilarities. We first describe multidimensional scaling as a very flexible ordination method that can be based on a wide range of dissimilarities. We also introduce cluster analysis based on dissimilarities, where the pattern among objects is represented in a tree-like plot called a dendrogram. We show how to correlate dissimilarities to other continuous and/or grouping variables and fit linear models that treat the dissimilarities as responses modeled against continuous or categorical predictors.
This chapter is inspired by work in comparative sociolinguistics and quantitative dialectometry. We use a corpus-based method (Variation-Based Distance and Similarity Modeling – VADIS for short) to quantify the similarity between, and coherence across, the varieties of English under study as a function of the correspondence of the ways in which language users choose between different ways of saying the same thing. Key findings include the result that probabilistic grammars are remarkably stable across varieties but that coherence across alternations is not perfect.
Multivariate Analysis focuses on the most essential tools for analyzing compositional and/or multivariate data sets that often emerge when performing geochemical analysis. The chapter starts by introducing groundwater contamination in one of the world’s largest agricultural areas: the Central Valley of California. The goal is to use data science to discover the processes that caused contaminations, whether geogenic or anthropogenic. Knowing these causes aids deciding on mitigation actions. The reader will take a path of discovery through several protocols of applying data-scientific tools to unmask the processes, including principal component analysis, multivariate outlier detection and factor analysis. The key to using these tools is to understand the compositional nature of geochemical datasets, and how compositions need to be treated appropriately to draw meaningful conclusions, a field termed compositional data analysis. This chapter emphasizes the need for data scientists to work with domain experts.
This chapter presents what I call the meaning-tracks-use argument for the gradualist hypothesis: (1) If the vast majority of competent language users frequently and sincerely use RIGHT and WRONG as gradable concepts, then RIGHT and WRONG are gradable concepts. (2) The antecedent of the first premise is true. (3) Therefore, RIGHT and WRONG are gradable concepts. To support the empirical part of the argument I use the tools of experimental philosophy. Results from three surveys (n = 715, 578, and 182) indicate that respondents use right and wrong as gradable terms to approximately the same extent as color terms, meaning that rightness and wrongness come in degrees roughly as much as colors do. In the largest study, only four percent persistently used right and wrong as non-gradable terms.
This chapter elaborates what you learned in Chapter 7. It points out that despite the fact that many problems have an obvious visual representation, we need to be able to incorporate more abstract cognitive representations in our theory of problem solving. Traditionally, multidimensional scaling (MDS) has been used to infer Euclidean representations of concepts based on judged similarities. Here, after providing an example of how MDS has been used in vision, some cautionary comments are made about what MDS can and cannot provide. Because MDS is usually used to represent clusters of concepts, a formal discussion of clustering is included in this chapter. The chapter continues with two examples: one that is related to clustering in long-term memory and the other related to clustering in short-term memory. In both cases, clustering is used to interpret the mental navigation of memory representations as being analogous to navigation in our physical environment, just as it was when we discussed the TSP. The last section of this chapter illustrates how MDS can be used to explain how TSP tours are produced in the presence of obstacles where obstacles change the pairwise distances and make the distances not Euclidean. MDS can use the pairwise distances around obstacles to produce a Euclidean approximation. Preliminary experimental evidence suggests that this is what the human mind does.
The western European present perfect is subject to substantial crosslinguistic variation. The literature, however, focuses on individual languages or on comparisons of a restricted number of languages. We piece together the puzzle and do so in a data-driven way by comparing the use of the present perfect through a parallel corpus based on the French novel L’Étranger and its translations in Italian, German, Dutch, European Spanish, British English, and Modern Greek. We introduce and showcase Translation Mining, a software suite combining a parallel corpus database with annotation and analysis tools. Translation Mining allows us to generate descriptive statistics of tense use across languages but also to visualize variation through its multidimensional scaling component and to link the variation we find to the underlying data through its integrated setup. We confirm that the present perfect competes with the past and we reveal the fine-grained scalar nature of the variation. To complete the puzzle, we ascertain the dimensions of variation, ranging from lexical and compositional semantics to dynamic semantics and pragmatics.1
Chapter 4 analyzes public policy networks, especially in relation to policymaking events. We begin by reviewing key concepts in this field – policy communities, policy events, and event public networks – before presenting a restricted 2-mode perspective on policy communities. Our application is to the US labor policy domain, analyzed with concepts and methods introduced in the preceding chapters: core/periphery models and optimal modularity community analysis. We next extend the application to a less-restricted 3-mode network of private-sector organizations’ interests in events, government organizations’ interests in events, and direct communication ties between (but not within) the private and government organizations. A multidimensional scaling analysis of this 3-mode structure reveals how homogenous and relatively tightly structured this policy field is. By preserving complete multimodal network information, the results both support previous research on event publics and yield a more nuanced understanding of the structural contexts within which policy communities attend to their interests.
We introduce a method for scaling two datasets from different sources. The proposed method estimates a latent factor common to both datasets as well as an idiosyncratic factor unique to each. In addition, it offers a flexible modeling strategy that permits the scaled locations to be a function of covariates, and efficient implementation allows for inference through resampling. A simulation study shows that our proposed method improves over existing alternatives in capturing the variation common to both datasets, as well as the latent factors specific to each. We apply our proposed method to vote and speech data from the 112th U.S. Senate. We recover a shared subspace that aligns with a standard ideological dimension running from liberals to conservatives, while recovering the words most associated with each senator’s location. In addition, we estimate a word-specific subspace that ranges from national security to budget concerns, and a vote-specific subspace with Tea Party senators on one extreme and senior committee leaders on the other.
Men sexually interested in children of a specific combination of maturity and sex tend to show some lesser interest in other categories of persons. Patterns of men's sexual interest across erotic targets' categories of maturity and sex have both clinical and basic scientific implications.
Method
We examined the structure of men's sexual interest in adult, pubescent, and prepubescent males and females using multidimensional scaling (MDS) across four datasets, using three large samples and three indicators of sexual interest: phallometric response to erotic stimuli, sexual offense history, and self-reported sexual attraction. The samples were highly enriched for men sexually interested in children and men accused of sexual offenses.
Results
Results supported a two-dimensional MDS solution, with one dimension representing erotic targets' biological sex and the other dimension representing their sexual maturity. The dimension of sexual maturity placed adults and prepubescent children on opposite ends, and pubescent children intermediate. Differences between men's sexual interest in adults and prepubescent children of the same sex were similar in magnitude to the differences between their sexual interest in adult men and women. Sexual interest in adult men was no more associated with sexual interest in boys than sexual interest in adult women was associated with sexual interest in girls.
Conclusions
Erotic targets' sexual maturity and biological sex play important roles in men's preferences, which are predictive of sexual offending. The magnitude of men's preferences for prepubescent children v. adults of their preferred sex is large.
analyzes globalization-related conflict in the UNGA and the European Parliament. These two ‘strong publics’ feature debates on issues related to the permeability of borders and are directly tied to important centers of decision-making in global governance. The findings show a powerful cosmopolitan presence in both assemblies. The EP also features more communitarian counter voices. An indepth analysis of the partisan nature of debate in the EP and the difference between directly elected Members of the EP and appointed European Commissioners lends strength to the hypothesis that electoral accountability strengthens the presence of communitarian voice in supranational arenas. Direct elections and proportional representation appears to increase the presence of communitarians in global governance. This finding implies that cosmopolitan democrats face a difficult trade-off. They can democratize global governance, but it will likely come at the price of less cosmopolitan policies made in international institutions. Alternatively, they can pursue cosmopolitan policies, but only if they limit the democratic accountability of key global governance institutions.
compares cosmopolitan versus communitarian issue positions by mass publics and elites across our study. We investigate whether there is an attitude gap between elites, who tend to adhere to cosmopolitan positions, and mass publics with more communitarian leanings. Contrasting mass opinion surveys with results from our own elite survey, we show that the mass-elite divide on globalization issues is indeed pervasive and found in all five countries of study. We consider both economic causes in the shape of diverging material interests and cultural ones, the latter pointing towards cultural capital and symbolic boundaries defining transnational cosmopolitan class consciousness. The results align more with the cultural than with the economic explanation. Political elites in the five countries display convergent cosmopolitan positions across issues as varied as international trade, climate change, migration and supranational integration. Mass publics are much more divided on these issues. Also, education alone does not explain the mass-elite gap because the elites are still significantly more cosmopolitan than highly educated members of mass publics, even within the same country.
compares cosmopolitan vs communitarian issue positions of national, European and global elites. It is important to go beyond the national elite focus since the prototypical members of a cosmopolitan elite are thought to be no longer attached to one national context but to have an entire region or even the ‘global village’ as their point of reference. Our empirical analysis supports this expectation: The positions of European-level elites turn out to be even more strongly cosmopolitan than those of national elites, which indicates that a particularly large gap exists between the cosmopolitanism of European elites and the more communitarian orientation of mass publics. Cultural explanations - measured by embeddedness in transnational networks - have the greatest explanatory power. Those elites who have more transnational contacts and travel experience are more cosmopolitan with regard to trade, immigration and supranational integration. However, economic explanations help us to explain within-elite variance in cosmopolitanism. In particular, we find that business and labour union elites diverge strongly in their positions on international trade and supranational integration.
maps issue linkage in the public sphere as key component of cleavage formation. Cleavage coalitions in public debates about globalization are mapped using inductive methods. Multidimensional scaling reveals a powerful globalist coalition in all five countries under study - Germany, Poland, Mexico, Turkey and USA - which links various globalization-related issues in a general call for open borders. It is opposed by a protectionist coalition arguing against free trade, a nationalist coalition arguing against immigration and a neoliberal coalition that only champions free trade. This confirms that globalization-related conflict is two dimensional, with conflict over cultural and economic globalization distinct from each other.
We introduce a model that extends the standard vote choice model to encompass text. In our model, votes and speech are generated from a common set of underlying preference parameters. We estimate the parameters with a sparse Gaussian copula factor model that estimates the number of latent dimensions, is robust to outliers, and accounts for zero inflation in the data. To illustrate its workings, we apply our estimator to roll call votes and floor speech from recent sessions of the US Senate. We uncover two stable dimensions: one ideological and the other reflecting to Senators’ leadership roles. We then show how the method can leverage common speech in order to impute missing data, recovering reliable preference estimates for rank-and-file Senators given only leadership votes.
We present a statistical approach to data mining and quantitatively evaluating detrital age spectra for sedimentary provenance analyses and palaeogeographic reconstructions. Multidimensional scaling coupled with density-based clustering allows the objective identification of provenance end-member populations and sedimentary mixing processes for a composite crust. We compiled 58 601 detrital zircon U–Pb ages from 770 Precambrian to Lower Palaeozoic shelf sedimentary rocks from 160 publications and applied statistical provenance analysis for the Peri-Gondwanan crust north of Africa and the adjacent areas. We have filtered the dataset to reduce the age spectra to the provenance signal, and compared the signal with age patterns of potential source regions. In terms of provenance, our results reveal three distinct areas, namely the Avalonian, West African and East African–Arabian zircon provinces. Except for the Rheic Ocean separating the Avalonian Zircon Province from Gondwana, the statistical analysis provides no evidence for the existence of additional oceanic lithosphere. This implies a vast and contiguous Peri-Gondwanan shelf south of the Rheic Ocean that is supplied by two contrasting super-fan systems, reflected in the zircon provinces of West Africa and East Africa–Arabia.
In this study we chart the aspectual characteristics of performative utterances in a cross-linguistic sample of sixteen languages on the basis of native-speaker elicitations. We conclude that there is not one single aspectual type (e.g., perfectives) that is systematically reserved for performative contexts. Instead, the aspectual form of performative utterances in a given language is epistemically motivated, in the sense that the language will turn to that aspectual construction which it generally selects to refer to situations that are fully and instantly identifiable as an instance of a given situation type at the time of speaking. We use the method of Multidimensional Scaling to demonstrate this: whatever the exact value of a given aspectual marker, if it is used to mark performatives, then it also commonly features in the expression of states and habits, which have the subinterval property (they can be fully verified based on a random segment), demonstrations, and other special contexts featuring more or less predictable and therefore instantly identifiable events. On the other hand, our study shows that performative contexts do not normally feature progressive aspect, which is dedicated to the expression of events that are not fully and instantly identifiable.