We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Chapters 16 presents the first of the two real-life multivariate biomarker discovery studies included in the book. The goal of this study – which implements the method presented in Chapters 14 and 15 – is to identify the essential gene expression patterns and a multivariate biomarker common for multiple types of cancer. This study is based on the TCGA RNA-Seq data of 3,528 patients and 20,530 gene expression variables; the data represent five tumor types of five different tissues. A parsimonious multivariate biomarker (consisting of ten genes) with high sensitivity and specificity has been identified.
This chapter moves from regression to methods that focus on the pattern presented by multiple variables, albeit with applications in regression analysis. A strong focus is to find patterns that beg further investigation, and/or replace many variables by a much smaller number that capture important structure in the data. Methodologies discussed include principal components analysis and multidimensional scaling more generally, cluster analysis (the exploratory process that groups “alike” observations) and dendogram construction, and discriminant analysis. Two sections discuss issues for the analysis of data, such as from high throughput genomics, where the aim is to determine, from perhaps thousands or tens of thousands of variables, which are shifted in value between groups in the data. A treatment of the role of balance and matching in making inferences from observational data then follows. The chapter ends with a brief introduction to methods for multiple imputation, which aims to use multivariate relationships to fill in missing values in observations that are incomplete, allowing them to have at least some role in a regression or other further analysis.
The clay mineral distributions in fault gouges from shear zones in several slates, phyllites, mica schists, and gneisses of the Eastern Alps were statistically analyzed for consistencies in their occurrence. Discriminant analyses suggested significant groupings of the most common minerals: illite, smectite, kaolinite, and chlorite. The clay mineral distributions in the fault gouges appeared to be related to regional geological units. No relationship, however, was found with the piles of nappes of the Alps. The influence of the mineralogical composition of the parent rock on the clay mineral assemblages appeared to be minor, but the shear behavior of the parent rocks, which is mainly a function of rock strength, was found to control the formation of the clay minerals. In hard rocks (e.g., gneisses), solution transfer at an early stage of the shear process was apparently extensive enough to favor kaolinite formation. As shearing continued, the rate of solution transfer gradually decreased and favored the formation of smectite. In softer rocks (e.g., phyllites), the extent of solution transfer during the shear process was less than in the gneisses and generated an environment that favored smectite formation, even during the early stages of shearing.
Use of a discriminant analysis has verified and grouped three suspected varieties of kaolinite found in kaolin-rich clay strata of late Paleocene to early Eocene age across north-central Mississippi. Initial identification of each type of kaolinite was based on clay-texture characteristics observed on scanning electron micrographs and the differences in pattern configurations of X-ray diffractograms. The discriminant function used for data treatment clearly segregated and grouped each variety. The discrimination variables were found to be the Hinckley index and, to a lesser extent, the Si4+ content relative to the Al3+ content.
The oldest variety is the Blue Mountain clay, composed of preserved hexagonal plates usually clustered into booklets with a vermiform texture. The Ashland variety, stratigraphically younger than the Blue Mountain clay, appears to have been derived from the erosion of the Blue Mountain clay. The Ashland cannot be recognized by any type of diagnostic texture, as it is made up of individual plates that have been corroded and abraded to the point where a hexagonal outline can no longer be recognized. The Sardis variety is the stratigraphically youngest of the three varieties and is at least a second, or possibly a third generation detrital product. The Sardis clay can be recognized by a distinct “ribbon” or “swirl” texture commonly found in ball clays.
Data from this study are not sufficient for complete petrogenetic interpretation. However, speculation on possible differences in depositional environments and modes of deposition can be based on the data at hand. The Blue Mountain variety is considered from previous studies to be primary. The Ashland variety is probably a first generation alluvial clay. The Sardis variety appears to be a multiple generation, detrital product that accumulated as part of overbank swamp deposits.
Multivariate analysis of variance and discriminant analysis were used to establish the crystal chemistry of several Al-rich smectites. The statistical analyses were carded out on 78 samples taken from the literature which were classified on the basis of their physicochemical properties. A strong discrimination exists between beidellites and montmorillonites, ‘non-ideal’ montmorillonites and ‘ideal’ montmorillonites, and Wyoming-type and Cheto-type montmorillonites. Of the Cheto-type montmorillonites, the Tatatilla-type samples are strongly discriminated, whereas the distinction between Chambers- and Otay-types is not strong. AlIV, AlVI, Fe, Mg, and Ca are generally important discriminating variables, whereas the tetrahedral portion of the layer charge, commonly used as a discriminating factor among these minerals, is only moderately significant.
The unsolved systematics of the genus Cardiomya has led to a sequence of astonishing identification mistakes. This scenario is a result of the rarity of specimens and, more importantly, the lack of knowledge about which characters are relevant to the genus taxonomy. In this study, we developed a method based on standard linear discriminant analysis to identify the smallest number of morphological characters that efficiently distinguish individuals at the species level of Brazilian Cardiomya. Starting from 29 morphometric measurements obtained from photographed Cardiomya shells, we were able to identify only five characters: the dorsal inflection of the rostrum, the distance from the posterior most rib end to the umbonal posterior margin and the distance from the central point of the valve to the anterior margin at 45°, 15° and −30° angles. Surprisingly, all these characters are related to the shell outline and not the ornamentation, which is a remarkable character in Cardiomya. We performed a one-way ANOVA with post-hoc Tukey HSD test specifically using the total number of ribs to verify its discriminant power in species identification. Our analysis demonstrated that the number of ribs does not show a significant difference between the analysed species.
Compositional data for 464 clay minerals (2:1 type) were analyzed by statistical techniques. The objective was to understand the similarities and differences between the groups and subgroups and to evaluate statistically clay mineral classification in terms of chemical parameters. The statistical properties of the distributions of total layer charge (TLC), K, VIAl, VIMg, octahedral charge (OC) and tetrahedral charge (TC) were initially evaluated. Critical-difference (P = 1%) comparisons of individual characteristics show that all the clay micas (illite, glauconite and celadonite) differ significantly from all the smectites (montmorillonite, beidellite, nontronite and saponite) only in their TLC and K levels; they cannot be distinguished by their VIAl, VIMg, TC or OC values which reveal no significant differences between several minerals.
Linear discriminant analysis using equal prior was therefore performed to analyze the combined effect of all the chemical parameters. Using six parameters [TLC, K, VIAl, VIMg, TC and OC], eight minerals groups could be derived, corresponding to the three clay micas, four smectites (mentioned above) and vermiculite. The fit between predicted and experimental values was 88.1%. Discriminant analysis using two parameters (TLC and K) resulted in classification into three broad groups corresponding to the clay micas, smectites and vermiculites (87.7% fit). Further analysis using the remaining four parameters resulted in subgroup-level classification with an 85–95% fit between predicted and experimental results. The three analyses yielded D2 Mahalanobis distances, which quantify chemical similarities and differences between the broad groups, within members of a subgroup and also between the subgroups. Classification functions derived here can be used as an aid for classification of 2:1 minerals.
During a survey of soil nematodes in Iran, a population of a species belonging to the order Mononchida was recovered. The new species, Paramylonchulus iranicus sp. n. is characterized by body length (1292–1535 μm in females and 1476–1670 μm in males), c (20.2–29.0 in females and 19.9–27.4 in males), buccal cavity length (23.0–26.0 μm), post vulval uterine sac length (135–162 μm), spicule length (46.0–50.0 μm), gubernaculum length (8.0–11.0 μm), and tail length (49.0–70.0 μm in females and 55.0–73.0 μm in males). Canonical discriminant analysis clearly separated P. iranicus sp. n. from the closely related species Paramylonchulus based on the important morphometric characters of females and males. A molecular study of the 18S rDNA region of P. iranicus sp. n. places this population in a well-supported clade with other species of the genus.
The aim of the present study was to evaluate the physiological and morphological parameters of pregnant does for early prediction of prenatal litter size. In total, 33 does were screened using ultrasonography and further categorized into three groups based on does bearing twins (n = 12), a single fetus (n = 12), or non-pregnant does (n = 9). The rectal temperature °F (RT) and respiration rate (RR) as physiological parameters, while abdominal girth in cm (AG) and udder circumference in cm (UC) as morphological parameters were recorded at different gestation times, i.e. 118, 125, 132 and 140 days. In addition to this, age (years) and weight at service (kg) were also used. The statistical analyses included analysis of variance (ANOVA) and linear discriminant analysis (LDA). The results indicated that groups had significant (P < 0.05) differences among morphological parameters at each gestation time, with higher AG and UC in does bearing twins followed by a single fetus and non-pregnant does. However, both physiological parameters were non-significantly (P > 0.05) associated with litter size groups. It was also revealed that the studied parameters showed increasing trends over gestation time in single and twin fetus categories, but they were on par among non-pregnant does. The results of the LDA revealed that estimated function based on age, weight at service, RR, RT, AG and UC had greater (ranging from 75.00 to 91.70%) accuracy, sensitivity and specificity at different gestation times. It was concluded that using an estimated function, future pregnant does may be identified in advance for single or twin litter size, with greater accuracy.
In this work we report a lipidomics approach to study the effects of two diet systems on the composition of ovine milk. Milk from two groups of Sarda sheep grazing on 40% (P40) and 60% (P60) of pasture were analyzed by a UHPLC-QTOF-MS analytical platform and data submitted to multivariate statistical analysis. Pairwise partial least square discriminant analysis of the lipid profile of the data was carried out to classify samples and to find discriminant lipids. The two dietary groups were characterized by differences in triacylglycerols, phosphocholines and phosphatidylethanolamines levels. Discriminants of the P40 group were TG and PC containing in their backbone saturated medium chain FA thus suggesting greater de novo fatty synthesis in the mammary gland. On the other hand, the P60 group was characterized by TG and PC formed by unsaturated long chain FA originating from the diet or from lipid mobilization.
The process of discriminant analysis has been applied to major and trace elements in igneous and sedimentary rocks to seek to identify the original tectonic setting in which the rocks formed. A ‘training set’ of data from known environments is used to construct a discrimination diagram which is then used with data from unknown sources. Normally, the discrimination diagrams are based upon immobile trace element data and they have been applied predominantly to mafic igneous rocks, although there are also applications to felsic rocks and sediments. In the past, diagrams of this type have been used indiscriminately and here a robust approach is advocated for statistical analysis. Some of the diagrams presented are based upon elemental data, while others are based upon calculated discriminant functions and require some specific pre-calculation. Diagrams of this type are paradoxically accurate and at the same time geochemically opaque.
The statistical analysis of geochemical data employs the main statistical techniques of averaging, probability distributions, correlation, regression, multivariate analysis and discriminant analysis. A particular problem with major element geochemical data is that it is constrained; that is, the compositions sum to 100% and the data are ‘closed’. A related problem arises when ternary plots are used to display geochemical data. Techniques are described to accommodate the problems associated with compositional data which include log-ratio conversions and the biplot diagram. Further statistical problems arise in the area of ratio correlation as advocated in Pearce element ratio diagrams, which is not recommended. Applications to trace elements and radiogenic isotope correlations are discussed. The details of discriminant analysis are outlined as a prelude to a more detailed discussion of tectonic discrimination diagrams considered in Chapter 5.
This textbook is a complete rewrite, and expansion of Hugh Rollinson's highly successful 1993 book Using Geochemical Data: Evaluation, Presentation, Interpretation. Rollinson and Pease's new book covers the explosion in geochemical thinking over the past three decades, as new instruments and techniques have come online. It provides a comprehensive overview of how modern geochemical data are used in the understanding of geological and petrological processes. It covers major element, trace element, and radiogenic and stable isotope geochemistry. It explains the potential of many geochemical techniques, provides examples of their application, and emphasizes how to interpret the resulting data. Additional topics covered include the critical statistical analysis of geochemical data, current geochemical techniques, effective display of geochemical data, and the application of data in problem solving and identifying petrogenetic processes within a geological context. It will be invaluable for all graduate students, researchers, and professionals using geochemical techniques.
This chapter compares two major families of ordination methods, the unconstrained and constrained ordination. We start by describing the tasks achieved with the help of unconstrained ordination and illustrate how to interpret the resulting ordination diagrams. The methods of constrained ordination allow us to build and test statistical models describing the effects of predictors (such as environmental descriptors) on multivariate response data (such as the composition of biotic communities). We discuss linear discriminant analysis separately, which aims to use a set of numerical variables to predict the membership of observations in a priori defined classes. The methods described in this chapter are accompanied by a carefully-explained guide to the R code needed for their use, in this case employing the vegan package.
Resistant starch can alter the intestinal nutrient availability and bulk of digesta, thereby modulating the substrate available for microbial metabolic activity along the gastrointestinal tract. This study elucidated the effect of transglycosylated starch (TGS) on the retention of digesta in the upper digestive tract, ileal flow and hindgut disappearance of nutrients, and subsequent bacterial profiles in pigs. Fourteen ileal-cannulated growing pigs were fed either the TGS or control (CON) diet in a complete crossover design. Each period consisted of a 10-d adaptation to the diets, followed by 3-d collection of faeces and ileal digesta. Consumption of TGS decreased the retention of digesta in the stomach and small intestine, and increased ileal DM, starch, Ca and P flow, leading to enhanced starch fermentation in the hindgut compared with CON-fed pigs. TGS increased ileal and faecal total SCFA, especially ileal and faecal acetate and faecal butyrate. Gastric retention time positively correlated to Klebsiella, which benefitted together with Selenomonas, Lactobacillus, Mitsuokella and Coriobacteriaceae from TGS feeding and ileal starch flow. Similar relationships existed in faeces with Coriobacteriaceae, Veillonellaceae and Megasphaera benefitting most, either directly or indirectly via cross-feeding, from TGS residuals in faeces. TGS, in turn, depressed genera within Ruminococcaceae, Clostridiales and Christensenellaceae compared with the CON diet. The present results demonstrated distinct ileal and faecal bacterial community and metabolite profiles in CON- and TGS-fed pigs, which were modulated by the type of starch, intestinal substrate flow and retention of digesta in the upper digestive tract.
To develop a new predictive equation for fat mass percentage (%FM) based on anthropometric measurements and to assess its ability to discriminate between obese and non-obese individuals.
Design
Cross-sectional study.
Setting
Mexican adults.
Participants
Adults (n 275; 181 women) aged 20–63 years with BMI between 17·4 and 42·4 kg/m2.
Results
Thirty-seven per cent of our sample was obese using %FM measured by air-displacement plethysmography (BOD POD®; Life Measurement Instruments). The fat mass was computed from the difference between weight and fat-free mass (FFM). FFM was estimated using an equation obtained previously in the study from weight, height and sex of the individuals. The %FM estimated from the obtained FFM showed a sensitivity of 90·3 (95 % CI 86·8, 93·8) % and a specificity of 58·0 (95 % CI 52·1, 63·8) % in the diagnosis of obesity. Ninety-three per cent of participants with obesity and 65 % of participants without obesity were correctly classified.
Conclusions
The anthropometry-based equation obtained in the present study could be used as a screening tool in clinical and epidemiological studies not only to estimate the %FM, but also to discriminate the obese condition in populations with similar characteristics to the participant sample.
The aim of this study was to analyse the morphology and allometry of larvae belonging to five potamodromous species. Five breeding species belonging to the order Characiformes [Salminus brasiliensis (Cuvier, 1816), Leporinus steindachneri, Eigenmann, 1907, Prochilodus lineatus (Valenciennes, 1837), Prochilodus vimboides (Kner,1859) and Brycon insignis, Steindachner, 1877] were used to obtain larvae samples during the pre-flexing, post-flexing, and juvenile developmental stages. When we observed the degree-hour (DH) amplitude time values, we found three developmental groups based on allometry and morphometrics within the period between the pre-flexing and post-flexing phases. Group 1 consists of the species S. brasiliensis and B. insignis, Group 2 consists of P. lineatus and P. vimboides, and Group 3 consists of L. steindachneri. Group 1 requires less development time and has more slender larvae. Group 2 has a moderate development time and larvae with a more rounded shape. Group 3 presents a greater development time and an intermediate larval morphology. It was possible to classify the larvae through cross-validated discriminant analyses based on seven morphometric variables with 90% accuracy in B. insignis, 83% in L. steindachneri, 91% in P. lineatus, 80% in P. vimboides, and 96% in S. brasiliensis. These results indicate larval characteristics that can be used for the taxonomic identification of the icthyoplankton.
Italian ryegrass is a major weed problem in wheat production worldwide. Field studies were conducted at Fayetteville, AR, to assess morphological characteristics of ryegrass accessions from Arkansas and differences among other Lolium spp.: Italian, rigid, poison, and perennial ryegrass. Plant height, plant growth habit, plant stem color, and node color were recorded every 2 wk until maturity. The number of tillers per plant, spikes per plant, and seeds per plant were recorded at maturity. All ryegrass accessions from Arkansas were identified as Italian ryegrass, which had erect to prostrate growth habit, green to red stem color, green to red nodes, glume (10 mm) shorter than spikelet (19 mm), and medium seed size (5 to 7 mm) with 1 to 3 mm awns. However, significant variability in morphological characteristics was found among Arkansas ryegrass accessions. When Lolium species at the seedling stage (1- to 2-wk-old plants) were compared, poison ryegrass was characterized as having a large main-stem diameter and wide droopy leaves, whereas perennial ryegrass exhibited a short and a very narrow leaf blade. These two can be distinguished from Italian and rigid ryegrass, which have leaf blades wider than perennial ryegrass but narrower than poison ryegrass. Italian and rigid ryegrass are difficult to distinguish at the seedling stage but are distinct at the reproductive stage. At maturity, Italian ryegrass and poison ryegrass seeds are awned, but perennial and rigid ryegrass seeds are awnless. Poison ryegrass awns were at least 4-fold longer than Italian ryegrass awns. Perennial ryegrass flowered 3 wk later than the other species. Poison ryegrass glumes were longer than the spikelets, whereas Italian ryegrass glumes were shorter than the spikelets. Morphological traits indicate that some Italian ryegrass populations are potentially more competitive and more fecund than others.
Field experiments were conducted to evaluate the potential of hyperspectral reflectance data collected with a hand-held spectroradiometer to discriminate soybean intermixed with pitted morningglory and weed-free soybean in conventional till and no-till plots containing rye, hairy vetch, or no cover crop residue. Pitted morningglory was in the cotyledon to six-leaf growth stage. Seven 50-nm spectral bands (one ultraviolet, two visible, four near-infrared) derived from each hyperspectral reflectance measurement were used as discrimination variables. Pitted morningglory plant size had more influence on discriminant capabilities than tillage or cover crop residue systems. Across all tillage and residue systems, discrimination accuracy was 71 to 95%, depending on the size of pitted morningglory plants at the time of data acquisition. The versatility of the seven 50-nm bands was tested by using a discriminant model developed for one experiment location to test discriminant capabilities for the other experiment, with discrimination accuracy across all tillage and residue systems of 55 to 73%, depending on pitted morningglory plant size.
Field experiments were conducted in 1999 at Stoneville, MS, to determine the potential of multispectral imagery for late-season discrimination of weed-infested and weed-free soybean. Plant canopy composition for soybean and weeds was estimated after soybean or weed canopy closure. Weed canopy estimates ranged from 30 to 36% for all weed-infested soybean plots, and weeds present were browntop millet, barnyardgrass, and large crabgrass. In each experiment, data were collected for the green, red, and near-infrared (NIR) spectrums four times after canopy closure. The red and NIR bands were used to develop a normalized difference vegetation index (NDVI) for each plot, and all spectral bands and NDVI were used as classification features to discriminate between weed-infested and weed-free soybean. Spectral response for all bands and NDVI were often higher in weed-infested soybean than in weed-free soybean. Weed infestations were discriminated from weed-free soybean with at least 90% accuracy. Discriminant analysis models formed from one image were 78 to 90% accurate in discriminating weed infestations for other images obtained from the same and other experiments. Multispectral imagery has the potential for discriminating late-season weed infestations across a range of crop growth stages by using discriminant models developed from other imagery data sets.