We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Adequate measurement of psychological phenomena is a fundamental aspect of theory construction and validation. Forming composite scales from individual items has a long and honored tradition, although, for predictive purposes, the power of using individual items should be considered. We outline several fundamental steps in the scale construction process, including (1) choosing between prediction and explanation; (2) specifying the construct(s) to measure; (3) choosing items thought to measure these constructs; (4) administering the items; (5) examining the structure and properties of composites of items (scales); (6) forming, scoring, and examining the scales; and (7) validating the resulting scales.
Validated computerized assessments for cognitive functioning are crucial for older individuals and those at risk of cognitive decline. The National Institutes of Health (NIH) Toolbox Cognition Battery (NIHTB-CB) exhibits good construct validity but requires validation in diverse populations and for adults aged 85+. This study uses data from the Assessing Reliable Measurement in Alzheimer’s Disease and cognitive Aging study to explore differences in the factor structure of the NIHTB-CB for adults 85 and older, Black participants versus White participants, and those diagnosed as amnestic Mild Cognitive Impairment (aMCI) vs cognitively normal (CN).
Method:
Subtests from the NACC UDS-3 and NIHTB-CB were administered to 503 community-dwelling Black and White adults ages 55–99 (367 CN; 136 aMCI). Confirmatory factor analyses were used to investigate the original factor structure of NIHTB-CB that forms the basis for NIHTB-CD Index factor scores.
Results:
Factor analyses for all participants and some participant subsets (aMCI, White, 85+) substantiated the two anticipated factors (Fluid and Crystallized). However, while Black aMCI participants had the expected two-factor structure, for Black CN participants, the List Sorting Working Memory and Picture Sequence tests loaded on the Crystallized factor.
Conclusions:
Findings provide psychometric support for the NIHTB-CB. Differences in factor structure between Black CN individuals and Black aMCI individuals suggest potential instability across levels of cognitive impairment. Future research should explore changes in NIHTB-CB across diagnoses in different populations.
The ICD-11 introduced a new diagnosis of complex post-traumatic stress disorder (CPTSD) defined by disturbances in self-organisation in addition to traditional post-traumatic stress disorder symptoms. The International Trauma Questionnaire (ITQ) is the established measure of this construct and has been validated for use in a variety of populations and languages; however, evidence for the measure's use in Latin America is limited.
Aims
This study sought to validate the factor structure of the Latin American Spanish version of the ITQ in a trauma-exposed sample in Colombia.
Method
Confirmatory factor analysis was used to assess a range of factor models validated previously, including first- and second-order factor models.
Results
Assessment of fit indices demonstrated that a correlated six-factor model comprised of re-experiencing, avoidance, sense of threat, affect dysregulation, negative self-concept and disturbed relationships provided the best fit for these data. Factor loadings for this model were found to be high and statistically significant.
Conclusion
Results concur with prior research validating the use of alternative language versions of the ITQ internationally, and with the theoretical underpinnings of the CPTSD diagnostic category. The ITQ is therefore a valid measure of CPTSD in this Latin American sample. Further validation research is needed in clinical populations in this region.
Currently we may have access to large databases, sometimes coined as Big Data, and for those large datasets simple econometric models will not do. When you have a million people in your database, such as insurance firms or telephone providers or charities, and you have collected information on these individuals for many years, you simply cannot summarize these data using a small-sized econometric model with just a few regressors. In this chapter we address diverse options for how to handle Big Data. We kick off with a discussion about what Big Data is and why it is special. Next, we discuss a few options such as selective sampling, aggregation, nonlinear models, and variable reduction. Methods such as ridge regression, lasso, elastic net, and artificial neural networks are also addressed; these latter concepts are nowadays described as so-called machine learning methods. We see that with these methods the number of choices rapidly increases, and that reproducibility can reduce. The analysis of Big Data therefore comes at a cost of more analysis and of more choices to make and to report.
Information related to the climate, sowing time, harvest, and crop development is essential for defining appropriate strategies for agricultural activities, which helps both producers and responsible bodies. Paraná, the second largest soybean producer in Brazil, has high climatic variability, which greatly influences planting, harvesting, and crop productivity periods. Therefore, the objective of this study was to regionalize the state of Paraná, considering decennial metrics associated with climate variables and the enhanced vegetation index (EVI) during the soybean cycle. Individual and global analyses of these metrics were conducted performed using multivariate techniques. These analyses were carried out in agricultural scenarios with low, medium, and high precipitation, corresponding to harvest years 2011/2012, 2013/2014, and 2015/2016, respectively. The results obtained from the scores of the retained factors and the cluster analysis were the profile of the groups, with Group 1 presenting more favourable climatic and agronomic conditions for the development of soybean crops for the three harvest years. The opposite occurred for Groups 2 (2011/2012 and 2013/2014) and Group 3 (2015/2016). During the soybean reproductive phases (R2 – R5), precipitation values were inadequate, especially for Group 2 (2011/2012 and 2013/2014) with high water deficit, resulting in a drop in soybean productivity. The climatic and agronomic regionalization of Paraná made it possible to identify the regions most suitable for growing soybeans, the effect of climatic conditions on phenological stages, and the variability of soybean productivity in the three harvest years.
The differential diagnosis of psychiatric disorders is relatively challenging for several reasons. In this context, we believe that task-based magnetic resonance imaging (MRI) can serve as a tool for differential diagnosis. The aim of this study was to explore the commonalities in brain activities among individuals with psychiatric disorders and to identify the key brain regions that can distinguish between these disorders.
Methods
The PubMed, MEDLINE, EMBASE, Web of Science, Scopus, PsycINFO, and Google Scholar databases were searched for whole-brain functional MRI studies that compared psychiatric patients and normal controls. The psychiatric disorders included schizophrenia (SCZ), bipolar disorder (BD), major depressive disorder (MDD), obsessive–compulsive disorder, attention-deficit/hyperactivity disorder (ADHD), and autism spectrum disorder (ASD). Studies using go–nogo paradigms were selected, we then conducted activation likelihood estimation (ALE) meta-analysis, factor analysis, and regression analysis on these studies subsequently.
Results
A total of 152 studies (108 with patients) were selected and a consistent pattern was found, that is, decreased activities in the same brain regions across six disorders. Factor analysis clustered six disorders into three pairs: SCZ and ASD, MDD and BD, and ADHD and BD. Furthermore, the heterogeneity of SCZ and ASD was located in the left and right thalamus; and the heterogeneity of MDD and BD was located in the thalamus, insula, and superior frontal gyrus.
Conclusion
The results can lead to a new classification method for psychiatric disorders, benefit the differential diagnosis at an early stage, and help to understand the biobasis of psychiatric disorders.
Mainstream cognitive behavioural theory stipulates that clinically significant health anxiety persists over time at least partially due to negatively reinforced health-related behaviours, but there exists no broad and psychometrically valid measure of such behaviours.
Aims:
To draft and evaluate a new self-report scale – the Health Anxiety Behavior Inventory (HABI) – for the measurement of negatively reinforced health anxiety behaviours.
Method:
We drafted the HABI from a pool of 20 candidate items administered in a clinical trial at screening, and before and after cognitive behaviour therapy (n=204). A psychometric evaluation focused on factor structure, internal consistency, convergent and discriminant validity, test–retest reliability, and sensitivity to change.
Results:
Based on factor analysis, the HABI was completed as a 12-item instrument with a four-dimensional factor structure corresponding to the following scales: (i) bodily preoccupation and checking, (ii) information- and reassurance-seeking, (iii) prevention and planning, and (iv) overt avoidance. Factor inter-correlations were modest. The internal consistency (α=.73–.87) and 2-week test–retest reliability (r=.75–.90) of the scales was adequate. The bodily preoccupation and checking, and information- and reassurance-seeking scales were most strongly correlated with the cognitive and emotional components of health anxiety (r=0.41, 0.48), and to a lower extent correlated to depressive symptoms and disability. Change scores in all HABI scales correlated with improvement in the cognitive and emotional components of health anxiety during cognitive behaviour therapy.
Conclusions:
The HABI appears to reliably measure negatively reinforced behaviours commonly seen in clinically significant health anxiety, and might be clinically useful in the treatment of health anxiety.
The Personal Need for Structure (PNS) scale assesses individuals’ tendency to seek out clarity and structured ways of understanding and interacting with their environment. The main aim of this study was to adapt the PNS scale to Spanish and assess its psychometric properties. There are two versions of the PNS scale being used, which vary in the number of dimensions (1 vs. 2), and in the number of items (12 vs. 11; because one version excludes Item 5). Therefore, an additional aim of this study was to compare the two existing versions of the PNS scale. This comparison aimed to address the debate regarding the inclusion of Item 5, and the number of dimensions that comprise the PNS scale. A sample of 735 individuals was collected. First, through an approach combining exploratory and confirmatory analyses, evidence was found in favor of the scale being composed of two related but distinguishable factors: Desire for Structure and Response to the Lack of Structure. Scores on these subscales showed acceptable internal consistency and test-retest reliability. Evidence supporting the invariance of the internal structure across sociodemographic variables such as gender and age was found. Validity evidence was also analyzed by examining the relationships with other relevant measures. The results indicated that Item 5 can be excluded without reducing scores validity or reliability, which supports preceding research in the literature. In conclusion, the PNS scale was satisfactorily adapted to and validated in Spanish and its use in this context is recommended.
Cognitive reserve (CR) is typically operationalized as episodic memory residualized on brain health indices. The dimensionality of more generalized models of CR has rarely been examined.
Methods:
In a sample of N = 113 dementia-free older adults (ages 62–86 years at MRI scan; 58.4% women), the domain-specific representation of general cognition (COG) before vs. after residualization on brain indices (brain volume loss, cerebral blood flow, white matter hyperintensities) was compared (i.e., COG vs. CR). COG and CR were assessed by 15 tasks spanning five domains: processing speed, verbal memory, visuospatial memory, fluid reasoning, and vocabulary. Measurement invariance and item-construct representation were tested in a series of structural factor analyses. COG and CR were then examined in relation to 22 risk and protective factors and dementia status at time of death.
Results:
Item-factor loadings differed such that CR more strongly emphasized fluid reasoning. More years of education, higher occupational class, more hobbies/interests, and fewer difficulties with personal mobility similarly predicted better COG and CR. Only the sub-domain of visuospatial memory (both before and after residualization) was associated with conversion to dementia by end-of-life (r = −.30; p = .01).
Conclusions:
Results provide tentative support for the role of fluid reasoning (intelligence) as a potential compensatory factor for age- and/or neuropathology-related reductions in processing speed and memory. Intellectually stimulating work, efforts to preserve personal mobility, and a diversity of hobbies and interests may attenuate age- and/or pathology-related reductions in cognitive functioning prior to dementia onset.
Different aspects of social relationships (e.g., social network size or loneliness) have been associated with dementia risk, while their overlap and potentially underlying pathways remain largely unexplored. This study therefore aimed to (1) discriminate between different facets of social relationships by means of factor analysis, (2) examine their associations with dementia risk, and (3) assess mediation by depressive symptoms.
Methods
Thirty-six items from questionnaires on social relationships administered in Wave 2 (2004/2005) of the English Longitudinal Study of Ageing (n = 7536) were used for exploratory and confirmatory factor analysis. Factors were then used as predictors in Cox proportional hazard models with dementia until Wave 9 as outcome, adjusted for demographics and cardiovascular risk factors. Structural equation modeling tested mediation by depressive symptoms through effect decomposition.
Results
Factor analyses identified six social factors. Across a median follow-up time of 11.8 years (IQR = 5.9–13.9 years), 501 people developed dementia. Higher factor scores for frequency and quality of contact with children (HR = 0.88; p = 0.021) and more frequent social activity engagement (HR = 0.84; p < 0.001) were associated with lower dementia risk. Likewise, higher factor scores for loneliness (HR = 1.13; p = 0.011) and negative experiences of social support (HR = 1.10; p = 0.047) were associated with higher dementia risk. Mediation analyses showed a significant partial effect mediation by depressive symptoms for all four factors. Additional analyses provided little evidence for reverse causation.
Conclusions
Frequency and quality of social contacts, social activity engagement, and feelings of loneliness are associated with dementia risk and might be suitable targets for dementia prevention programs, partly by lowering depressive symptoms.
Neurocognitive dysfunction is a transdiagnostic finding in psychopathology, but relationships among cognitive domains and general and specific psychopathology dimensions remain unclear. This study aimed to examine associations between cognition and psychopathology dimensions in a large youth cohort.
Method
The sample (N = 9350; age 8–21 years) was drawn from the Philadelphia Neurodevelopmental Cohort. Data from structured clinical interviews were modeled using bifactor confirmatory factor analysis (CFA), resulting in an overall psychopathology (‘p’) factor score and six orthogonal psychopathology dimensions: dysphoria/distress, obsessive-compulsive, behavioral/externalizing, attention-deficit/hyperactivity, phobias, and psychosis. Neurocognitive data were aggregated using correlated-traits CFA into five factors: executive functioning, memory, complex cognition, social cognition, and sensorimotor speed. We examined relationships among specific and general psychopathology dimensions and neurocognitive factors.
Results
The final model showed both overall and specific associations between cognitive functioning and psychopathology, with acceptable fit (CFI = 0.91; TLI = 0.90; RMSEA = 0.024; SRMR = 0.054). Overall psychopathology and most psychopathology dimensions were negatively associated with neurocognitive functioning (phobias [p < 0.0005], behavioral/externalizing [p < 0.0005], attention-deficit/hyperactivity [p < 0.0005], psychosis [p < 0.0005 to p < 0.05]), except for dysphoria/distress and obsessive-compulsive symptoms, which were positively associated with complex cognition (p < 0.05 and p < 0.01, respectively).
Conclusion
By modeling a broad range of cognitive and psychopathology domains in a large, diverse sample of youth, we found aspects of neurocognitive functioning shared across clinical phenotypes, as well as domain-specific patterns. Findings support transdiagnostic examination of cognitive performance to parse variability in the link between neurocognitive functioning and clinical phenotypes.
Existing systematic reviews related to advance care planning (ACP) largely focus on specific groups and intervention efficacy or are limited to contextual factors. This research aims to identify the modifiable factors perceived by different users of ACP in healthcare settings and inform healthcare professionals about the factors affecting ACP practice.
Methods
Five English-language databases (ProQuest, PubMed, CINAHL Plus, Scopus, and Medline) and two Chinese-language databases (CNKI and NCL) were searched up to November 2022. Empirical research identifying factors related to ACP in healthcare settings was included. ACP is defined as a discussion process on future end-of-life care. Thematic synthesis was performed on all included studies.
Results
A total of 1871 unique articles were screened; the full texts of 193 were assessed by 4 reviewers, and 45 articles were included for analysis. Twenty-two (54%) studies were qualitative, 15 (33%) were quantitative, and 6 (13%) used mixed methods. Foci varied from 28 (62%) studies on a single subject group (either patient, family, or physician), 11 (25%) on 2 subject groups (either patient and family or patient and healthcare professional), and 6 (13%) covered 3 subject groups (patient, family, and healthcare professional). Among the 17 studies involving more than 1 subject group, only 2 adopted a dyadic lens in analysis. Complex interwoven factors were categorized into (1) intrapersonal factors, (2) interpersonal factors, and (3) socio-environmental factors, with a total of 11 themes: personal belief, emotions, the burden on others, timing, responsiveness, relationship, family dynamics, experience, person taking the lead, culture, and support.
Significance of results
Patients, families, and healthcare professionals are the essential stakeholders of ACP in healthcare settings. Factors are interweaved among the intrapersonal, interpersonal, and socio-environmental dimensions. Research is warranted to examine the dynamic interactions of the 3 essential stakeholders from a multidimensional perspective, and the mechanism of the interweaving of factors.
Few studies have examined the psychometric properties of the Connor-Davidson Resilience Scale (CD-RISC) in a large adolescent community sample, finding a significant disparity. This study explores the psychometric properties of the CD-RISC among Spanish adolescents by means of exploratory factor analysis (EFA), Rasch analysis, and measurement invariance (MI) across sex, as well as internal consistency and criterion validity. The sample was comprised of 463 adolescents (231 girls), aged 12 to 18 years, who completed the CD-RISC and other measures on emotional status and quality of life. The EFA suggested that the CD-RISC structure presented a unidimensional model. Consequently, shorter unidimensional CD-RISC models observed in the literature were explored. Thus, the Campbell-Sills and Stein CD–RISC–10 showed the soundest psychometric properties, providing an adequate item fit and supporting MI and non-differential item functioning across sex. Item difficulty levels were biased toward low levels of resilience. Some items showed malfunctioning in lower response categories. With regard to reliability, categorical omega was. 82. Strong associations with health-related quality of life, major depressive disorder symptoms, and emotional symptoms were observed. A weak association was found between resilience and the male sex. Campbell-Sills and Stein’s CD–RISC–10 model emerges as the best to assess resilience among Spanish adolescents, as already reported in adults. Thus, independently of the developmental stage, the core of resilience may reside in the aspects of hardiness and persistence.
The aim of the present study was to discriminate between distinct types of clay units by applying multivariate statistical techniques, which have seldom been applied to the exploitation of ceramic clays. At the outcrop scale, texturally similar argillaceous or clayey layers of different ceramic types cannot be effectively distinguished, which can result in the misuse and loss of raw materials. Representative samples of clayey raw materials from central Portugal Cenozoic deposits with potential use in the manufacture of structural clay products were first assessed for granulometric, mineralogical, chemical, and technological properties. Based on those properties and the use of multivariate statistical techniques, i.e., factor analysis (FA) and cluster analysis (CA), a novel statistical approach that combined all these variable properties was produced. This approach made it possible to distinguish the ceramic suitability and perceive which parameters most influence that suitability. The use of R-mode FA made it feasible to differentiate and group samples based on the most influential variables: the contents of Al2O3, Fe, illite, quartz, feldspars, and K2O. The use of R-mode CA substantiated the FA results in the identification of influential variables, such as Al2O3, Fe, and illite. The use of Q-mode CA established two main clusters: clayey-silt samples and sandy and/or feldspathic samples, the clayey-silt samples encompassed three sub-clusters. These three sub-clusters match ceramic types with different suitabilities and relate sample stratigraphic setting to the encompassing stratigraphic units. Diagrams that relate the grain size, the content of different oxides, the content of different minerals, and the plasticity to the ceramic suitability illustrate the CA groupings. An adequate blend of sand and clay for red stoneware (bricks and tiles) manufacture was indicated as a major requirement for most raw materials of the clayey-silt cluster. Raw materials represented by the sandy and/or feldspathic cluster can either be used to blend with materials that lack sand or to blend with excessively plastic samples.
Farm productivity and social sustainability are essential to realizing agro-based value chains’ full potential. This paper aims to empirically conduct an analysis of the impact of formal value chain governance practices on farm productivity and social sustainability in Pakistan's potato industry. A multi-stage sampling method was employed from 10 villages to examine growers’ motivations to adopt the contract and its effect on their income and farm employment. The main findings of this study stipulate that buyers’ technical assistance and provision of quality inputs are the growers’ primary motives for contracting, non-contracted farms earned 40% less than contracted farms from each unit invested, contracted farms employed more labor with better wages, and welfare arrangements than the non-contracting farms. The study concluded that formal value chain governance practices significantly affect farm productivity and social sustainability and can spur growth in the agricultural sector in developing countries. The results reveal that any governmental initiative aiming to support formal value chain governance should consider the role that intermediaries play in the value chain and accordingly minimize their risks and food losses and improve social outcomes.
The Cognitive Change Index (CCI-20) is a validated questionnaire that assesses subjective cognitive complaints (SCCs) across memory, language, and executive domains. We aimed to: (a) examine the internal consistency and construct validity of the CCI-20 in patients with movement disorders and (b) learn how the CCI-20 corresponds to objective neuropsychological and mood performance in individuals with Parkinson’s disease (PD) or essential tremor (ET) seeking deep brain stimulation (DBS).
Methods:
216 participants (N = 149 PD; N = 67 ET) underwent neuropsychological evaluation and received the CCI-20. The proposed domains of the CCI-20 were examined via confirmatory (CFA) and exploratory (EFA) factor analyses. Hierarchical regressions were used to assess the relationship among subjective cognitive complaints, neuropsychological performance and mood symptoms.
Results:
PD and ET groups were similar across neuropsychological, mood, and CCI-20 scores and were combined into one group who was well educated (m = 15.01 ± 2.92), in their mid-60’s (m = 67.72 ± 9.33), predominantly male (63%), and non-Hispanic White (93.6%). Previously proposed 3-domain CCI-20 model failed to achieve adequate fit. Subsequent EFA revealed two CCI-20 factors: memory and non-memory (p < 0.001; CFI = 0.924). Regressions indicated apathy and depressive symptoms were associated with greater memory and total cognitive complaints, while poor executive function and anxiety were associated with more non-memory complaints.
Conclusion:
Two distinct dimensions were identified in the CCI-20: memory and non-memory complaints. Non-memory complaints were indicative of worse executive function, consistent with PD and ET cognitive profiles. Mood significantly contributed to all CCI-20 dimensions. Future studies should explore the utility of SCCs in predicting cognitive decline in these populations.
The components or functions derived from an eigenanalysis are linear combinations of the original variables. Principal components analysis (PCA) is a very common method that uses these components to examine patterns among the objects, often in a plot termed an ordination, and identify which variables are driving those patterns. Correspondence analysis (CA) is a related method used when the variables represent counts or abundances. Redundancy analysis and canonical CA are constrained versions of PCA and CA, respectively, where the components are derived after taking into account the relationships with additional explanatory variables. Finally, we introduce linear discriminant function analysis as a way of identifying and predicting membership of objects to predefined groups.
This study aimed to quantify a latent variable for body size (BS) in pigs by using five linear body measurements including body length (BL), body height (BH), chest width (CW), chest girth (CG) and tube girth (TG), and also to identify the most associated single nucleotide polymorphisms (SNP) and related genes with BS by using the genomic best linear unbiased prediction (GBLUP) based genome-wide association study (GWAS) or GBLUP-GWAS methodology. To perform a GWAS on the BS latent trait, we used a mixed linear model and identified a total of 53 significant SNPs. Additionally, we found that nine genes, including Rho GTPase activating protein 12 (ARHGAP12), transmembrane protein 108 (TMEM108), T-cell lymphoma invasion and metastasis inducing factor 1 (TIAM1), ras homologue gene family member B (RHOB), POU class 4 homeobox 1 (POU4F1), follistatin-related protein 4 (FSTL4), cellular communication network factor 2 (CCN2), beaded filament structural protein 2 (BFSP2) and attractin-like protein 1 (ATRNL1) were associated with the BS trait in pigs. These genes are involved in several biological processes, including the regulation of anatomical structure, morphogenesis, the regulation of cell size and growth. The results suggest that the identified SNP and related genes may play important roles in regulating the growth and development of pigs. The results imply that these genes could be promising candidates for further exploration of the underlying mechanisms of body size variation. Furthermore, the findings have significant practical implications for enhancing the efficiency and profitability of pig farming through genetic selection.
In this chapter we present brief discussions of a few statistical topics not covered in earlier chapters. We first cover structural equation models, factor analysis, and path analysis. In future work fitting regression models in the social sciences, we frequently see reference to one or more of them. In the second section of the chapter, we address in summary form a few topics already discussed but which we believe require some additional attention. For instance, as part of our discussion of ordinary least squares regression, we covered in Chapter 8 the topic of regression diagnostics. But regression diagnostics is not an issue applicable only to OLS regression; so we present here a further discussion. Similarly, we expand with some additional commentary our earlier discussions of addressing issues of survey design (covered in Chapter 10) and multilevel models (covered in Chapter 16).
Multivariate Analysis focuses on the most essential tools for analyzing compositional and/or multivariate data sets that often emerge when performing geochemical analysis. The chapter starts by introducing groundwater contamination in one of the world’s largest agricultural areas: the Central Valley of California. The goal is to use data science to discover the processes that caused contaminations, whether geogenic or anthropogenic. Knowing these causes aids deciding on mitigation actions. The reader will take a path of discovery through several protocols of applying data-scientific tools to unmask the processes, including principal component analysis, multivariate outlier detection and factor analysis. The key to using these tools is to understand the compositional nature of geochemical datasets, and how compositions need to be treated appropriately to draw meaningful conclusions, a field termed compositional data analysis. This chapter emphasizes the need for data scientists to work with domain experts.