We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Transfer learning has been highlighted as a promising framework to increase the accuracy of the data-driven model in the case of data sparsity, specifically by leveraging pretrained knowledge to the training of the target model. The objective of this study is to evaluate whether the number of requisite training samples can be reduced with the use of various transfer learning models for predicting, for example, the chemical source terms of the data-driven reduced-order modeling (ROM) that represents the homogeneous ignition of a hydrogen/air mixture. Principal component analysis is applied to reduce the dimensionality of the hydrogen/air mixture in composition space. Artificial neural networks (ANNs) are used to regress the reaction rates of principal components, and subsequently, a system of ordinary differential equations is solved. As the number of training samples decreases in the target task, the ROM fails to predict the ignition evolution of a hydrogen/air mixture. Three transfer learning strategies are then applied to the training of the ANN model with a sparse dataset. The performance of the ROM with a sparse dataset is remarkably enhanced if the training of the ANN model is restricted by a regularization term that controls the degree of knowledge transfer from source to target tasks. To this end, a novel transfer learning method is introduced, Parameter control via Partial Initialization and Regularization (PaPIR), whereby the amount of knowledge transferred is systemically adjusted in terms of the initialization and regularization schemes of the ANN model in the target task.
This paper derives from new work on Mesolithic human skeletal material from Strøby Egede, a near coastal site in eastern Sjælland, with two foci. The first confirms sex identifications from original work carried out in 1986. The second, and central focus, re-examines comments by one of us (CM) based on work in 1992, and a new statistical analysis including data from the two Strøby Egede adults. In 1998 it was suggested that the Strøby Egede sample more closely resembled Skateholm, on the coast of Skåne in southern Sweden, than Vedbæk-Bøgebakken on Sjælland, fitting lithic patterns noted earlier by Vang Petersen. We revisit the 1998 suggestion below, comparing data from Strøby Egede to those available from southern Scandinavia and Germany, and suggest that the 1998 comment was, in all probability, incorrect. The analysis below suggests overall morphological similarity between individuals in eastern Sjælland and Skåne, while noting the existence of apparent outliers.
Cognitive impairment constitutes a prevailing issue in the schizophrenia spectrum, severely impacting patients' functional outcomes. A global cognitive score, sensitive to the stages of the spectrum, would benefit the exploration of potential factors involved in the cognitive decline.
Methods
First, we performed principal component analysis on cognitive scores from 768 individuals across the schizophrenia spectrum, including first-degree relatives of patients, individuals at ultra-high risk, who had a first-episode psychosis, and chronic schizophrenia patients, alongside 124 healthy controls. The analysis provided 10 g-factors as global cognitive scores, validated through correlations with intelligence quotient and assessed for their sensitivity to the stages on the spectrum using analyses of variance. Second, using the g-factors, we explored potential mechanisms underlying cognitive impairment in the schizophrenia spectrum using correlations with sociodemographic, clinical, and developmental data, and linear regressions with genotypic data, pooled through meta-analyses.
Results
The g-factors were highly correlated with intelligence quotient and with each other, confirming their validity. They presented significant differences between subgroups along the schizophrenia spectrum. They were positively correlated with educational attainment and the polygenic risk score (PRS) for cognitive performance, and negatively correlated with general psychopathology of schizophrenia, neurodevelopmental load, and the PRS for schizophrenia.
Conclusions
The g-factors appeared as valid estimators of global cognition, enabling discerning cognitive states within the schizophrenia spectrum. Educational attainment and genetics related to cognitive performance may have a positive influence on cognitive functioning, while general psychopathology of schizophrenia, neurodevelopmental load, and genetic liability to schizophrenia may have an adverse impact.
Casuarina equisetifolia L. commonly called whistling pine is an economically and industrially important tree species with global significance. Although species possess versatile importance worldwide, efforts imparted for selection and designing a robust model of selection index are inadequate. The selection process, based on quantitative and qualitative traits, identified 15 superior trees from the eastern coastal plain of Odisha. These superior trees showcased exceptional qualitative and quantitative attributes. Correlation analysis highlighted key similarities among various traits like volume and above ground biomass (AGB), volume and diameter at breast height (DBH), DBH and AGB, DBH and Tree Height (TH), crown length (CL), height, AGB and height. Principal component analysis emphasized substantial contributions of traits like DBH, height, CL, crown width, AGB and volume across different clusters. Furthermore, culmination resulted in a comprehensive selection index, integrating both qualitative and quantitative characters, reaching 52.04, signifying superior performance among specific accessions. The current study provides valuable insights into selection and designing optimal selection index of C. equisetifolia, guiding future decisions concerning optimal wood production and resource management.
Access to psychedelic drugs is liberalizing, yet responses are highly unpredictable. It is therefore imperative that we improve our ability to predict the nature of the acute psychedelic experience to improve safety and optimize potential therapeutic outcomes. This study sought to validate the ‘Imperial Psychedelic Predictor Scale’ (IPPS), a short, widely applicable, prospective measure intended to be predictive of salient dimensions of the psychedelic experience.
Methods
Using four independent datasets in which the IPPS was completed prospectively – two online surveys of ‘naturalistic’ use (N = 741, N = 836) and two controlled administration datasets (N = 30, N = 28) – we conducted factor analysis, regression, and correlation analyses to assess the construct, predictive, and convergent validity of the IPPS.
Results
Our approach produced a 9-item scale with good internal consistency (Cronbach's α = 0.8) containing three factors: set, rapport, and intention. The IPPS was significantly predictive of ‘mystical’, ‘challenging’, and ‘emotional breakthrough’ experiences. In a controlled administration dataset (N = 28), multiple regression found set and rapport explaining 40% of variance in mystical experience, and simple regression found set explained 16% of variance in challenging experience. In another (N = 30), rapport was related to emotional breakthrough explaining 9% of variance.
Conclusions
Together, these data suggest that the IPPS is predictive of relevant acute features of the psychedelic experience in a broad range of contexts. We hope that this brief 9-item scale will be widely adopted for improved knowledge of psychedelic preparedness in controlled settings and beyond.
The geochemistry of lake sediments provides valuable information on environmental conditions and geochemical processes in polar regions. To characterize geochemical composition and to analyse weathering and provenance, 26 lakes located in six islands of the South Shetland Islands (SSI) and James Ross Archipelago (JRA) were analysed. Regarding major composition, the studied lake sediments correspond to ferruginous mudstones and to a lesser extent to mudstones. The weathering indices indicate incipient chemical alteration (Chemical Index of Alteration = 52.6; Plagioclase Index of Alteration = 57.6). The La-Th-Sc plot shows different provenance signatures. SSI lake sediments correspond to oceanic island arcs, whereas those of JRA denote a signal of continental arcs with mixed sources. In James Ross Island lake sediments are of continental arcs (inland lakes), oceanic island arcs (coastal lakes) and a middle signature (foreland lakes). Multi-elemental analysis indicates that the sediments are enriched from regional basalts in Ba, Rb, Th, Cs and U (typical of silica-rich rocks) and depleted in Cr and Co due to mafic mineral weathering. The geochemical signals identified by principal component analysis enable us to group the sediments according to the studied islands and their geomorphological characteristics. This study underlines the importance of knowing the geochemical background levels in pristine lake sediments to evaluate potential future anthropogenic effects.
Periwinkle (Catharanthus roseus (L.) G. Don) is a vital summer season perennial semi-shrub and multipurpose drought-resilient flower crop of the tropical region of the Indian subcontinent. This industrially dominant crop is primarily used as border, bedding and pot culture in landscaping. There is a lack of information on the genetics of important traits and its correlation with quantitative characters like flower yield and understanding the co-segregation of these traits might be useful in crop improvement. Therefore, the present study was performed using 30 F2 segregating lines of Catharanthus developed from diallel crossing of six genetically dissimilar parents varied in many traits. Phenotyping of population was executed for 12 morphological traits. Results indicated that a significant positive association between days to flowering and plant height (0.753**), and leaf area and number of branches (0.463**) was recorded. Flowers per plant exhibit significantly positive correlation with all attributes except flower diameter (−005). The path coefficient analysis reported solely two traits, such as number of seeds per follicle (0.357) and corolla tube length (0.308) exerted positively significant direct effects on flower yield per plant. The scrutiny of principal components showed that the first three components demonstrated a cumulative variability of 70.1%. The dissipating of F2 plants in bi-plot is impenitent to our prior reports that six inbred lines were genetically diverse and quite different for the characters under study. The current research might be useful in breeding programmes for selection and hybridization of periwinkle in future.
In many applications, dimensionality reduction is important. Uses of dimensionality reduction include visualization, removing noise, and decreasing compute and memory requirements, such as for image compression. This chapter focuses on low-rank approximation of a matrix. There are theoretical models for why big matrices should be approximately low rank. Low-rank approximations are also used to compress large neural network models to reduce computation and storage. The chapter begins with the classic approach to approximating a matrix by a low-rank matrix, using a nonconvex formulation that has a remarkably simple singular value decomposition solution. It then applies this approach to the source localization application via the multidimensional scaling method and to the photometric stereo application. It then turns to convex formulations of low-rank approximation based on proximal operators that involve singular value shrinkage. It discusses methods for choosing the rank of the approximation, and describes the optimal shrinkage method called OptShrink. It discusses related dimensionality reduction methods including (linear) autoencoders and principal component analysis. It applies the methods to learning low-dimensionality subspaces from training data for subspace-based classification problems. Finally, it extends the method to streaming applications with time-varying data. This chapter bridges the classical singular value decomposition tool with modern applications in signal processing and machine learning.
Obesity is a multifactorial pathophysiological condition with an imbalance in biochemical, immunochemical, redox status and genetic parameters values. We aimed to estimate the connection between relative leucocyte telomere lengths (rLTL) – biomarker of cellular ageing with metabolic and redox status biomarkers values in a group of obese and lean children. The study includes 110 obese and 42 lean children and adolescents, both sexes. The results suggested that rLTL are significantly shorter in obese, compared with lean group (P < 0·01). Negative correlation of rLTL with total oxidant status (TOS) (Spearman’s ρ = –0·365, P < 0·001) as well as with C-reactive protein (Spearman’s ρ = –0·363, P < 0·001) were observed. Principal component analysis (PCA) extracted three distinct factors (i.e. principal components) entitled as: prooxidant factor with 35 % of total variability; antioxidant factor with 30 % of total variability and lipid antioxidant – biological ageing factor with 12 % of the total variability. The most important predictor of BMI > 30 kg/m2 according to logistic regression analysis was PCA-derived antioxidant factor’s score (OR: 1·66, 95th Cl 1·05–2·6, P = 0·029). PCA analysis confirmed that oxidative stress importance in biological ageing is caused by obesity and its multiple consequences related to prooxidants augmentation and antioxidants exhaustion and gave us clear signs of disturbed cellular homoeostasis deepness, even before any overt disease occurrence.
The ability to provide adequate nutrition is considered a key factor in evaluating the sustainability of foods and diets. Nutrient indices are used as functional units (FU) in life cycle assessment of foods to include nutritional performance in the environmental assessment of a product. Several general and food-group-specific nutrient indices exist but many lack validation, particularly when used as FU. In addition, the nutrient selection strategies and reference units for nutrient intake can vary considerably among studies. To validate intake-based product-group-specific nutrient indices previously developed for protein (NR-FIprot) and carbohydrate (NR-FIcarb) foods and for fruits and vegetables (NR-FIveg), we applied principal component analysis to investigate correlations between nutrients in foods and dishes representing a typical Finnish diet. The reference amounts for meal components were based on a plate model that reflected Finnish dietary recommendations. The portion sizes for the different food groups were anchored at 100 g, 135 g and 350 g for proteins, carbohydrates and fruits/vegetables, respectively. Statistical modelling largely validated the NR-FI indices, highlighting protein foods as sources of niacin, vitamin B12 and Se, carbohydrate foods as sources of Mg, Fe and phosphorous, and fruits/vegetables as sources of potassium, vitamin K, vitamin C, fibre and thiamine. However, in contrast to the intake-based approach applied in NR-FIprot, the dietary recommendation-based validation process suggested that fruits and vegetables should be favoured as sources of riboflavin and vitamin B6.
Drought is a major abiotic stress worldwide limiting chickpea yield drastically. Low heritability and high genotype × environment interactions make the trait-based breeding strategy an unreliable approach. This study was planned to identify the drought-tolerant lines by evaluating yield-based selection indices in a recombinant inbred line (RIL) population derived from an inter-specific cross between drought-tolerant genotype GPF 2 (Cicer arietinum L.) and drought-sensitive accession ILWC 292 (C. reticulatum) at two locations in India (Ludhiana and Faridkot). A total of six yield-based selection indices were calculated and significant variation was observed in the RILs and their parents for yield-based selection indices at both locations. A holistic approach across association analysis and principal component analysis identified drought tolerance index, mean productivity, geometric mean productivity and harmonic mean productivity as key selection indices, which could be used for indirect selection of drought-tolerant lines. Overall, on the basis of these approaches, a total of 15 promising RILs were identified for their use in chickpea breeding programme for developing drought-tolerant cultivars.
Ulcerative colitis (UC) is a chronic inflammatory disease involving the colon and rectum. One of the most modifiable environmental factors affecting UC severity is the patient’s dietary pattern. Although the role of dietary patterns on UC aetiology has been investigated previously, its relationship with disease severity has not yet been elucidated. This study examined the association between UC patients’ dietary patterns and disease severity. This cross-sectional study was conducted in 340 UC patients. Using an FFQ, food patterns were assessed. Twenty-five food categories were categorised based on the similarity of the nutrient composition of the food using the factor analysis method. A simple clinical colitis activity index was used to determine disease severity. Three dietary patterns were identified based on the factor analysis: healthy, unhealthy and Western dietary pattern. After adjusting for potential confounding factors, patients who were in the highest tertile of healthy dietary pattern compared with the lowest tertile were 92 % less likely to have severe UC (OR: 0·08; 95 % CI: 0·03, 0·22). Also, those in the highest tertile of the Western dietary pattern were 3·86 times more likely to have severe UC than those in the lowest tertile (OR: 3·86; 95 % CI: 1·86, 8·00). Even after controlling for confounding variables, unhealthy dietary pattern did not increase the risk of severe UC. Our data indicate the beneficial role of healthy dietary pattern in amelioration of disease severity in UC patients. To confirm this association, more studies are needed, especially prospective cohort studies.
To explore dietary patterns in relation to periodontitis and number of teeth.
Design:
A cross-sectional study.
Setting:
We used data from the seventh survey of the Tromsø Study in Norway, 2015–2016. Three periodontitis groups were compared: (i) no periodontitis/slow bone loss; (ii) moderate bone loss; and (iii) rapid bone loss. Number of teeth was categorised as 25–28, 20–24 and ≤ 19. Dietary patterns were identified by principal component analysis. Multiple logistic regression was applied to examine associations between tertiles of dietary pattern scores and periodontitis, and between these same tertiles and number of teeth.
Participants:
1487 participants (55·5 % women) aged 40–79 years who were free of major chronic diseases, attended an oral health examination and completed a FFQ.
Results:
Four dietary patterns were identified, which explained 24 % of the total variability in food intake: fruit and vegetables, Westernised, meat/fish and potatoes, and refined grain and dessert. The fruit and vegetables pattern was inversely associated with periodontitis characterised by rapid bone loss when compared with no periodontitis/slow bone loss (OR tertile 3 v. 1 0·49, 95 % CI: 0·25, 0·98). Participants who were in the highest tertile of the refined grain and dessert pattern (tertile 3 v. 1) had 2·38- and 3·52-fold increased odds of having ≤ 19 than 20–24 and 25–28 teeth, respectively.
Conclusion:
Out of four identified dietary patterns, only the fruit and vegetables pattern was negatively associated with advanced periodontitis. A more apparent positive association was observed between the refined grain and dessert pattern and having fewer teeth (≤ nineteen teeth).
While maternal at-risk drinking is associated with children's emotional and behavioral problems, there is a paucity of research that properly accounts for genetic confounding and gene–environment interplay. Therefore, it remains uncertain what mechanisms underlie these associations. We assess the moderation of associations between maternal at-risk drinking and childhood emotional and behavioral problems by common genetic variants linked to environmental sensitivity (genotype-by-environment [G × E] interaction) while accounting for shared genetic risk between mothers and offspring (GE correlation).
Methods
We use data from 109 727 children born to 90 873 mothers enrolled in the Norwegian Mother, Father, and Child Cohort Study. Women self-reported alcohol consumption and reported emotional and behavioral problems when children were 1.5/3/5 years old. We included child polygenic scores (PGSs) for traits linked to environmental sensitivity as moderators.
Results
Associations between maternal drinking and child emotional (β1 = 0.04 [95% confidence interval (CI) 0.03–0.05]) and behavioral (β1 = 0.07 [0.06–0.08]) outcomes attenuated after controlling for measured confounders and were almost zero when we accounted for unmeasured confounding (emotional: β1 = 0.01 [0.00–0.02]; behavioral: β1 = 0.01 [0.00–0.02]). We observed no moderation of these adjusted exposure effects by any of the PGS.
Conclusions
The lack of strong evidence for G × E interaction may indicate that the mechanism is not implicated in this kind of intergenerational association. It may also reflect insufficient power or the relatively benign nature of the exposure in this sample.
Efficiently distinguishing various Syzygium cumini L. Skeels (jamun) accessions holds practical significance for selection purposes. This study concentrated on 15 superior genotypes of jamun from the North Western Indian Himalayas, selected for their pivotal horticultural traits. Drawn from a pool of 82 collected genotypes and assessed across two consecutive years (2019 and 2020), these genotypes underwent morphological evaluations utilizing a randomized block design replicated thrice. Concurrently, random amplified polymorphic DNA (RAPD) and inter simple sequence repeat (ISSR) markers were employed for molecular analysis. Substantial variations surfaced among genotypes, both in morphological traits and fruit biochemistry. Notably, tree 43 exhibited promise across multiple horticultural facets, encompassing fruit weight, length, pulp weight, pulp-to-seed ratio and pulp percentage. Conversely, tree 49 excelled in elevated levels of total soluble solids, total sugar and reducing sugar. While principal component analysis and cluster analysis unveiled modest genetic variability, RAPD and ISSR markers unveiled pronounced molecular-level polymorphism. Agglomerative hierarchical clustering delineated the genotypes into five distinct clusters. Cluster I encompassed two genotypes, cluster II embraced five while the largest group, cluster III, included six genotypes. Clusters IV and V highlighted individual genotypes, trees 43 and 54 respectively. In the molecular analysis, UPGMA clustering yielded two primary clusters, spotlighting the noteworthy similarity between genotypes trees 49 and 52 whereas, trees 40, 43, 44 and 48 stood distinct. The observed genetic diversity stands as a valuable resource with substantial potential to enrich diverse breeding initiatives. These salient genetic variations underscore the richness within the studied population, offering a valuable asset for focused future pursuits.
The lack of excellent wheat germplasm resources on the Qinghai-Tibet Plateau has led to a gradual decrease in genetic diversity and an increasingly narrow genetic background in wheat grown in this region. Rational use of excellent genes from wheat relatives is important to increase genetic diversity, broaden the genetic base and achieve high yield and quality in common wheat. The objective of this study was to use principal component and cluster analyses of 13 important agronomic traits of 44 Polish wheat varieties over 3 years and comprehensively evaluate them to screen for excellent germplasm resources, thus providing the basic material for broadening the genetic base of Qinghai-Tibet Plateau wheat germplasm resources.
In this chapter, we introduce the design of statistical anomaly detectors. We discuss types of data – continuous, discrete categorical, and discrete ordinal features – encountered in practice. We then discuss how to model such data, in particular to form a null model for statistical anomaly detection, with emphasis on mixture densities. The EM algorithm is developed for estimating the parameters of a mixture density. K-means is a specialization of EM for Gaussian mixtures. The Bayesian information criterion (BIC) is discussed and developed – widely used for estimating the number of components in a mixture density. We also discuss parsimonious mixtures, which economize on the number of model parameters in a mixture density (by sharing parameters across components). These models allow BIC to obtain accurate model order estimates even when the feature dimensionality is huge and the number of data samples is small (a case where BIC applied to traditional mixtures grossly underestimates the model order). Key performance measures are discussed, including true positive rate, false positive rate, and receiver operating characteristic (ROC) and associated area-under-the-curve (ROC AUC). The density models are used in attack detection defenses in Chapters 4 and 13. The detection performance measures are used throughout the book.
Multivariate Analysis focuses on the most essential tools for analyzing compositional and/or multivariate data sets that often emerge when performing geochemical analysis. The chapter starts by introducing groundwater contamination in one of the world’s largest agricultural areas: the Central Valley of California. The goal is to use data science to discover the processes that caused contaminations, whether geogenic or anthropogenic. Knowing these causes aids deciding on mitigation actions. The reader will take a path of discovery through several protocols of applying data-scientific tools to unmask the processes, including principal component analysis, multivariate outlier detection and factor analysis. The key to using these tools is to understand the compositional nature of geochemical datasets, and how compositions need to be treated appropriately to draw meaningful conclusions, a field termed compositional data analysis. This chapter emphasizes the need for data scientists to work with domain experts.
This chapter builds on the discussion of product shipping from the previous chapter, but by introducing a different sort of product: commodity or semi-luxury goods (in the words of Lin Foxhall), things transported in ceramic amphoras that were also loaded onto ships. The distribution of pottery from across various sanctuaries and urban sites is considered to make the point that certain sites ‘specialised’ in various products, and that there might be evidence for Greeks selecting certain products for import or export. This element of choice is indicative of a wide amount of economic knowledge circulating in the Greek world that is not immediately materially visible. Spatial network modelling is conducted for this dataset too, revealing similar shapes to those from the previous chapter, and making the case for possible ‘piggy-backing’ of goods shipped from similar production sites to points of consumption.
This article analyzes raw driving data of passenger cars in the city of Semnan in Iran, with the objective of understanding the impact of traffic conditions at different times of day (morning, noon, evening, and night). For this study, two cars, the Toyota Prius and the Peugeot Pars (or the IKCO Persia), were used, and the data of speed, longitude, latitude, and altitude of the vehicles were acquired. This data was collected over a week (July 21–28, 2022) for a distance of 670 km (13 hr), with the help of the Global Positioning System application, and were presented for both cars. In addition to this, the data on fuel consumption and average speed, based on the Electronic Control Unit in the Prius, was also collected. Finally, a sensitivity analysis was done on the features of the raw data, based on the Principal Component Analysis method.