We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In practice it may happen that a first-try econometric model is not appropriate because it violates one or more of the key assumptions that are needed to obtain valid results. In case there is something wrong with the variables, such as measurement error or strong collinearity, we may better modify the estimation method or change the model. In the present chapter we deal with endogeneity, which can, for example, be caused by measurement error, and which implies that one or more regressors are correlated with the unknown error term. This is of course not immediately visible because the errors are not known beforehand and are estimated jointly with the unknown parameters. Endogeneity can thus happen when a regressor is measured with error, and, as we see, when the data are aggregated at too low a frequency. Another issue is called multicollinearity, in which it is difficult to disentangle (the statistical significance of) the separate effects. This certainly holds for levels and squares of the same variable. Finally, we deal with the interpretation of model outcomes.
Currently we may have access to large databases, sometimes coined as Big Data, and for those large datasets simple econometric models will not do. When you have a million people in your database, such as insurance firms or telephone providers or charities, and you have collected information on these individuals for many years, you simply cannot summarize these data using a small-sized econometric model with just a few regressors. In this chapter we address diverse options for how to handle Big Data. We kick off with a discussion about what Big Data is and why it is special. Next, we discuss a few options such as selective sampling, aggregation, nonlinear models, and variable reduction. Methods such as ridge regression, lasso, elastic net, and artificial neural networks are also addressed; these latter concepts are nowadays described as so-called machine learning methods. We see that with these methods the number of choices rapidly increases, and that reproducibility can reduce. The analysis of Big Data therefore comes at a cost of more analysis and of more choices to make and to report.
The components or functions derived from an eigenanalysis are linear combinations of the original variables. Principal components analysis (PCA) is a very common method that uses these components to examine patterns among the objects, often in a plot termed an ordination, and identify which variables are driving those patterns. Correspondence analysis (CA) is a related method used when the variables represent counts or abundances. Redundancy analysis and canonical CA are constrained versions of PCA and CA, respectively, where the components are derived after taking into account the relationships with additional explanatory variables. Finally, we introduce linear discriminant function analysis as a way of identifying and predicting membership of objects to predefined groups.
Relatively little is known about how the diet of chronically undernourished children may impact cardiometabolic biomarkers. The objective of this exploratory study was to characterise relationships between dietary patterns and the cardiometabolic profile of 153 3–5-year-old Peruvian children with a high prevalence of chronic undernutrition. We collected monthly dietary recalls from children when they were 9–24 months old. At 3–5 years, additional dietary recalls were collected, and blood pressure, height, weight, subscapular skinfolds and fasting plasma glucose, insulin and lipid profiles were assessed. Nutrient intakes were expressed as average density per 100 kcals (i) from 9 to 24 months and (ii) at follow-up. The treelet transform and sparse reduced rank regress'ion (RRR) were used to summarize nutrient intake data. Linear regression models were then used to compare these factors to cardiometabolic outcomes and anthropometry. Linear regression models adjusting for subscapular skinfold-for-age Z-scores (SSFZ) were then used to test whether observed relationships were mediated by body composition. 26 % of children were stunted at 3–5 years old. Both treelet transform and sparse RRR-derived child dietary factors are related to protein intake and associated with total cholesterol and SSFZ. Associations between dietary factors and insulin were attenuated after adjusting for SSFZ, suggesting that body composition mediated these relationships. Dietary factors in early childhood, influenced by protein intake, are associated with cholesterol profiles, fasting glucose and body fat in a chronically undernourished population.
Reading difficulties are prevalent worldwide, including in economically developed countries, and are associated with low academic achievement and unemployment. Longitudinal studies have identified several early childhood predictors of reading ability, but studies frequently lack genotype data that would enable testing of predictors with heritable influences. The National Child Development Study (NCDS) is a UK birth cohort study containing direct reading skill variables at every data collection wave from age 7 years through to adulthood with a subsample (final n = 6431) for whom modern genotype data are available. It is one of the longest running UK cohort studies for which genotyped data are currently available and is a rich dataset with excellent potential for future phenotypic and gene-by-environment interaction studies in reading. Here, we carry out imputation of the genotype data to the Haplotype Reference Panel, an updated reference panel that offers greater imputation quality. Guiding phenotype choice, we report a principal components analysis of nine reading variables, yielding a composite measure of reading ability in the genotyped sample. We include recommendations for use of composite scores and the most reliable variables for use during childhood when conducting longitudinal, genetically sensitive analyses of reading ability.
As the world’s population is ageing, improving the physical performance (PP) of the older population is becoming important. Although diets are fundamental to maintaining and improving PP, few studies have addressed the role of these factors in adults aged ≥ 85 years, and none have been conducted in Asia. This study aimed to determine the dietary patterns (DP) and examine their relationship with PP in this population.
Design:
This cross-sectional study (Kawasaki Aging and Wellbeing Project) estimated food consumption using a brief-type self-administered diet history questionnaire. The results were adjusted for energy after aggregating into thirty-three groups, excluding possible over- or underestimation. Principal component analysis was used to identify DP, and outcomes included hand grip strength (HGS), timed up-and-go test, and usual walking speed.
Setting:
This study was set throughout several hospitals in Kawasaki city.
Participants:
In total, 1026 community-dwelling older adults (85–89 years) were enrolled.
Results:
Data of 1000 participants (median age: 86·9 years, men: 49·9 %) were included in the analysis. Three major DP (DP1: various foods, DP2: red meats and coffee, DP3: bread and processed meats) were identified. The results of multiple regression analysis showed that the trend of DP2 was negatively associated with HGS (B, 95 % CI –0·35, –0·64, –0·06).
Conclusions:
This study suggests a negative association between HGS and DP characterised by red meats and coffee in older adults aged ≥ 85 years in Japan.
In this chapter, we study risks associated with movements of interest rates in financial markets. We begin with a brief discussion of the term structure of interest rates. We then discuss commonly used interest rate sensitive securities. This is followed by the study of different measures of sensitivity to interest rates, including duration and convexity. We consider mitigating interest rate risk through hedging and immunization. Finally, we take a more in-depth look at the drivers of interest rate term structure dynamics.
Dietary pattern analysis is typically based on dimension reduction and summarises the diet with a small number of scores. We assess ‘joint and individual variance explained’ (JIVE) as a method for extracting dietary patterns from longitudinal data that highlights elements of the diet that are associated over time. The Auckland Birthweight Collaborative Study, in which participants completed an FFQ at ages 3·5 (n 549), 7 (n 591) and 11 (n 617), is used as an example. Data from each time point are projected onto the directions of shared variability produced by JIVE to yield dietary patterns and scores. We assess the ability of the scores to predict future BMI and blood pressure measurements of the participants and make a comparison with principal component analysis (PCA) performed separately at each time point. The diet could be summarised with three JIVE patterns. The patterns were interpretable, with the same interpretation across age groups: a vegetable and whole grain pattern, a sweets and meats pattern and a cereal v. sweet drinks pattern. The first two PCA-derived patterns were similar across age groups and similar to the first two JIVE patterns. The interpretation of the third PCA pattern changed across age groups. Scores produced by the two techniques were similarly effective in predicting future BMI and blood pressure. We conclude that when data from the same participants at multiple ages are available, JIVE provides an advantage over PCA by extracting patterns with a common interpretation across age groups.
The present study investigated the association between dietary patterns and hypertension applying the Chinese Dietary Balance Index-07 (DBI-07).
Design:
A cross-sectional study on adult nutrition and chronic disease in Inner Mongolia. Dietary data were collected using 24 h recall over three consecutive days and weighing method. Dietary patterns were identified using principal components analysis. Generalized linear models and multivariate logistic regression models were used to examine the associations between DBI-07 and dietary patterns, and between dietary patterns and hypertension.
Setting:
Inner Mongolia (n 1861).
Participants:
A representative sample of adults aged ≥18 years in Inner Mongolia.
Results:
Four major dietary patterns were identified: ‘high protein’, ‘traditional northern’, ‘modern’ and ‘condiments’. Generalized linear models showed higher factor scores in the ‘high protein’ pattern were associated with lower DBI-07 (βLBS = −1·993, βHBS = −0·206, βDQD = −2·199; all P < 0·001); the opposite in the ‘condiments’ pattern (βLBS = 0·967, βHBS = 0·751, βDQD = 1·718; all P < 0·001). OR for hypertension in the highest quartile of the ‘high protein’ pattern compared with the lowest was 0·374 (95 % CI 0·244, 0·573; Ptrend < 0·001) in males. OR for hypertension in the ‘condiments’ pattern was 1·663 (95 % CI 1·113, 2·483; Ptrend < 0·001) in males, 1·788 (95 % CI 1·155, 2·766; Ptrend < 0·001) in females.
Conclusions:
Our findings suggested a higher-quality dietary pattern evaluated by DBI-07 was related to decreased risk for hypertension, whereas a lower-quality dietary pattern was related to increased risk for hypertension in Inner Mongolia.
Tognini-Bonelli (2001) made the following distinction between corpus-based and corpus-driven studies. While corpus-based studies start with pre-existing theories which are tested using corpus data, in corpus driven studies the hypothesis is derived by examination of the corpus evidence. This chapter will give an overview of the two different families of statistical tests which are suited for these two approaches. For corpus-based approaches, we use more traditional statistics, such as the t-test, or ANOVA which return a value called a p-value to tell us to what extent we should accept or reject the initial hypothesis. Multi-level modelling (also known as mixed modelling) is a new technique which shows considerable promise for corpus-based studies, and will also be described here to analyse the ENNTT subset of Europarl corpus. Multi-level modelling is useful for the examination of hierarchically structured or “nested” data, where for example translations may be “nested” together in a class if they have the same language of origin. A multi-level model takes account both of the variation between individual translations and the variation between classes. For example, we might expect the scores (such as vocabulary richness or readability scores) of two translations in the same class to be more similar to each other than two translations in different classes.
To describe the relationship between adherence to distinct dietary patterns and nutrition literacy.
Design:
We identified distinct dietary patterns using principal covariates regression (PCovR) and principal components analysis (PCA) from the Diet History Questionnaire II. Nutrition literacy was assessed using the Nutrition Literacy Assessment Instrument (NLit). Cross-sectional relationships between dietary pattern adherence and global and domain-specific NLit scores were tested by multiple linear regression. Mean differences in diet pattern adherence among three predefined nutrition literacy performance categories were tested by ANOVA.
Setting:
Metropolitan Kansas City, USA.
Participants:
Adults (n 386) with at least one of four diet-related diseases.
Results:
Three diet patterns of interest were derived: a PCovR prudent pattern and PCA-derived Western and Mediterranean patterns. After controlling for age, sex, BMI, race, household income, education level and diabetes status, PCovR prudent pattern adherence positively related to global NLit score (P < 0·001, β = 0·36), indicating more intake of prudent diet foods with improved nutrition literacy. Validating the PCovR findings, PCA Western pattern adherence inversely related to global NLit (P = 0·003, β = −0·13) while PCA Mediterranean pattern positively related to global NLit (P = 0·02, β = 0·12). Using predefined cut points, those with poor nutrition literacy consumed more foods associated with the Western diet (fried foods, sugar-sweetened beverages, red meat, processed foods) while those with good nutrition literacy consumed more foods associated with prudent and Mediterranean diets (vegetables, olive oil, nuts).
Conclusions:
Nutrition literacy predicted adherence to healthy/unhealthy diet patterns. These findings warrant future research to determine if improving nutrition literacy effectively improves eating patterns.
Data on the combination of foods consumed simultaneously at specific eating occasions are scarce, primarily due to a lack of assessment tools. We applied a recently developed meal coding system to multiple-day dietary intake data for assessing its ability to estimate food and nutrient intakes and characterise meal-based dietary patterns in the Japanese context. A total of 242 Japanese adults completed sixteen non-consecutive-day weighed dietary records, including 14 734 eating occasions (3788 breakfasts, 3823 lunches, 3856 dinners and 3267 snacks). Common food group combinations were identified by meal type to identify a range of generic meals. Dietary intake was calculated on the basis of not only the standard food composition database but also the substituted generic meal database. In total, eighty generic meals (twenty-three breakfasts, twenty-one lunches, twenty-four dinners and twelve snacks) were identified. The Spearman correlation coefficients between food group intakes calculated based on the standard food composition database and the substituted generic meal database ranged from 0·26 to 0·85 (median 0·69). The corresponding correlations for nutrient intakes ranged from 0·17 to 0·82 (median 0·61). A total of eleven meal patterns were established using principal components analysis, and these accounted for 39·1 % of total meal variance. Considerable variation in patterns was seen in meal type inclusion and choice of staple foods (bread, rice and noodles) and drinks, and also in meal constituents. In conclusion, this study demonstrated the usefulness of a meal coding system for assessing habitual diet, providing a scientific basis towards the development of simple meal-based dietary assessment tools.
There is evidence to suggest that individual components of dietary intake are associated with depressive symptoms. Studying the whole diet, through dietary patterns, has become popular as a way of overcoming intercorrelations between individual dietary components; however, there are conflicting results regarding associations between dietary patterns and depressive symptoms. We examined the associations between dietary patterns extracted using principal component analysis and depressive symptoms, taking account of potential temporal relationships.
Design
Depressive symptoms in parents were assessed using the Edinburgh Postnatal Depression Scale (EPDS) when the study child was 3 and 5 years of age. Scores >12 were considered indicative of the presence of clinical depressive symptoms. Diet was assessed via FFQ when the study child was 4 years of age.
Setting
Longitudinal population-based birth cohort.
Subjects
Mothers and fathers taking part in the Avon Longitudinal Study of Parents and Children when their study child was 3–5 years old.
Results
Unadjusted results suggested that increased scores on the ‘processed’ and ‘vegetarian’ patterns in women and the ‘semi-vegetarian’ pattern in men were associated with having EPDS scores ≥13. However, after adjustment for confounders all results were attenuated. This was the case for all those with available data and when considering a sub-sample who were ‘disease free’ at baseline.
Conclusions
We found no association between dietary patterns and depressive symptoms after taking account of potential confounding factors and the potential temporal relationship between them. This suggests that previous studies reporting positive associations may have suffered from reverse causality and/or residual confounding.
Leaf cuticle micromorphology has been cited as an important set of taxonomic characters in gymnosperms, but previous studies have largely been based on small sample sizes. The premise of this study was to understand whether external factors affect cuticular micromorphology of Podocarpaceae. Two example species, Prumnopitys andina and Podocarpus salignus, were studied. Of 21 sampled characters, nine (c.43% of the total) were visually assessed as being moderately reliable or highly reliable for taxonomic discrimination for both species, with an additional six (c.29%) being moderately reliable or highly reliable for only one or other of the example species, and six characters (c.29%) unreliable for both. Seven of the most variable stomatal characters were selected for further analysis to establish whether environmental factors affect them. The relationship between these seven stomatal characters, the environment and climate was analysed using the R ‘vegan’ package and climate data gathered from WorldClim. Our results showed that both species had larger stomata in moist and shady conditions, and a higher density of (smaller) stomata in sunny and drier conditions. An additional novel finding was the presence of stomata on the adaxial leaf surface in 46% of samples of Prumnopitys andina: the first record of adaxial stomata in this species, highlighting the necessity of studying multiple samples of a given species. In conclusion, these results indicate that larger sample sizes than have hitherto been employed in cuticle micromorphological studies are necessary to fully document the amount of phenotypic variation that exists.
Knowledge of the influence of environmental factors on weed populations is important in developing sustainable turfgrass management practices. Studies were conducted to evaluate the relationship of green and false-green kyllinga population densities with elevation and edaphic factors in turfgrass systems. Studies were conducted on five different golf courses in North Carolina, three affected by green kyllinga, and two affected by false-green kyllinga. According to Spearman correlation coefficients, both green and false-green kyllinga were correlated with increasing soil volumetric water content, whereas correlation of other edaphic variables varied among sites and species. Stepwise logistic regression confirmed the correlation of volumetric water with green kyllinga presence, but model components varied among sites for false-green kyllinga. Increasing green kyllinga populations correlated with increasing soil sodium; however, sodium did not reach a level believed to be detrimental to turfgrass growth. No other variables correlated with green or false-green kyllinga across all sites. We hypothesized that the lack of significant correlations was due to the overall influence of relative elevation on edaphic variables. According to principal components analysis (PCA), relative elevation had a profound impact on the measured edaphic variables at all sites. However, results of PCA at one site differed sharply from other sites. Results from that site demonstrate the potentially strong effects of management practices to alter edaphic trends normally observed with topography.
Field pea seed from bin cleaning operations stored overwinter on nearby cropland was observed to correlate with weed and crop growth suppression for up to three subsequent years. To explore the phenomenon more explicitly, plant growth suppression trials were undertaken with soil sampled 18 mo apart from two locations that had contained field pea seed residues. Test plant species grown in the residue-affected and nearby residue-free soils were compared in greenhouse experiments. Germination was either fully inhibited or emergence was delayed by more than one week. Dry matter accumulation of test species grown in residue-affected soil was significantly reduced compared to dry matter of these test species grown in residue-free soil (P < 0.0001). Canola and field pea were inhibited more than wheat and green foxtail over both years. Greenhouse trials also revealed that germination of wild oat was inhibited in the residue-affected soils, although wheat and grassy weeds were less suppressed than dicots overall. Significant reductions of weed species diversity and abundance were correlated to residue-affected soils (P < 0.0001) when compared to residue-free soils using multi-response permutations procedures. Germination of wheat and canola seed was inhibited, using aqueous extracts of weathered pea seeds or extracts of the residue-affected soil in bioassays in sterile media. An allelopathic response was proposed to explain the above results, indicating a need for further research on this system. Weed management strategies could be developed with field pea seed residues to provide innovative weed control techniques.
The genetic variation of quackgrass as a species and the array of environments in which it is found indicate that selection in these different environments could lead to differentiation among quackgrass populations. Yet, a highly diverse environment might not promote the genetic divergence of quackgrass if it poses contradictory selection pressures. To assess the extent of divergence among quackgrass populations, this study compared the morphology of populations of quackgrass for 1 yr in Rosemount, MN, in a “common garden” study. The quackgrass was initially collected from three different farming systems in southeast Minnesota: corn–soybean (CS), oats–hay–corn (OHC), and permanent pasture (PP). The systems represent pasture or arable land and differ in cropping rotations and levels of disturbance. Although no differences among farming systems were detected in multivariate or univariate comparisons, a significant farming system effect was detected between CS and PP systems when the most diversified system, OHC, was excluded from the analysis. Consistent with this result, a principal components analysis suggested that plants from two of the three farming systems exemplified contrasting modes of perennial plant growth. Relative to each other, the CS plants showed more features of the “guerrilla” growth mode (longer intra-ramet distances, sparse, large patches), whereas PP plants showed more “phalanx” mode features (short intra-ramet distances, dense, smaller patches). Plants from the most diversified system, OHC, did not fit into either growth form, and for this farming system, the variation among populations was the highest. The results suggest that the CS and PP systems selected for distinct growth forms, whereas the diversified OHC system did not. This is consistent with the hypothesis that diversification of a farming system and weed management decreases the risk of evolution of a weed population highly adapted to control measures used in that farming system.
Microtubule analysis is of significant value for a better understanding of normal and pathological cellular processes. Although immunofluorescence microscopic techniques have proven useful in the study of microtubules, comparative results commonly rely on a descriptive and subjective visual analysis. We developed an objective and quantitative method based on image processing and analysis of fluorescently labeled microtubular patterns in cultured cells. We used a multi-parameter approach by analyzing four quantifiable characteristics to compose our quantitative feature set. Then we interpreted specific changes in the parameters and revealed the contribution of each feature set using principal component analysis. In addition, we verified that different treatment groups could be clearly discriminated using principal components of the multi-parameter model. High predictive accuracy of four commonly used multi-classification methods confirmed our method. These results demonstrated the effectiveness and efficiency of our method in the analysis of microtubules in fluorescence images. Application of the analytical methods presented here provides information concerning the organization and modification of microtubules, and could aid in the further understanding of structural and functional aspects of microtubules under normal and pathological conditions.
To examine the association between cardiorespiratory fitness and dietary patterns in adolescents.
Design
Food choice was assessed using the validated New Zealand Adolescent FFQ. Principal components analysis was used to determine dietary patterns. Trained research assistants measured participants’ height and body mass. Cardiorespiratory fitness was assessed in a subset of participants using the multistage 20 m shuttle run. The level and stage were recorded, and the corresponding VO2max was calculated. Differences in mean VO2max according to sex and BMI were assessed using t tests, while associations between cardiorespiratory fitness and dietary patterns were examined using linear regression analyses adjusted for age, sex, school attended, socio-economic deprivation and BMI.
Setting
Secondary schools in Otago, New Zealand.
Subjects
Students (n 279) aged 14–18 years who completed an online lifestyle survey during a class period.
Results
Principal components analysis produced three dietary patterns: ‘Treat Foods’, ‘Fruits and Vegetables’ and ‘Basic Foods’. The 279 participants who provided questionnaire data and completed cardiorespiratory fitness testing had a mean age of 15·7 (sd 0·9) years. Mean VO2max was 45·8 (sd 6·9) ml/kg per min. The ‘Fruits and Vegetables’ pattern was positively associated with VO2max in the total sample (β=0·04; 95 %CI 0·02, 0·07), girls (β=0·06; 95 % CI 0·03, 0·10) and boys (β=0·03; 95 % CI 0·01, 0·05).
Conclusions
These results indicate that increase in cardiorespiratory fitness was associated with a healthier dietary pattern, suggesting both should be targeted as part of a global lifestyle approach. Longitudinal studies are needed to confirm this association in relation to health outcomes in New Zealand adolescents.
Despite differences in obesity and ill health between urban and rural areas in the UK being well documented, very little is known about differences in dietary patterns across these areas. The present study aimed to examine whether urban/rural status is associated with dietary patterns in a population-based UK cohort study of children.
Design
Dietary patterns were obtained using principal components analysis and cluster analysis of 3 d diet records collected from children at 10 years of age. Rurality was obtained from the 2001 UK Census urban/rural indicator at the time of dietary assessment. General linear models were used to examine the relationship between rurality and dietary pattern scores from principal components analysis; multinomial logistic regression was used to assess the association between rurality and dietary clusters.
Setting
The Avon Longitudinal Study of Parents and Children (ALSPAC), South West England.
Subjects
Children (n 5677) aged 10 years (2817 boys and 2860 girls).
Results
After adjustment, increases in rurality were associated with increased scores on the ‘health awareness’ dietary pattern (β=0·35; 95 % CI 0·14, 0·56; P<0·001 for the most rural compared with the most urban group) and lower scores on the ‘packed lunch/snack’ dietary pattern (β=−0·39; 95 % CI −0·59, −0·19; P<0·001 for the most rural compared with the most urban group). The odds ratio for participants being in the ‘healthy’ compared with the ‘processed’ dietary cluster for the most rural areas was 1·61 (95 % CI 1·05, 2·49; P=0·02) compared with those in the most urban areas.
Conclusions
There is evidence to suggest that differences exist in dietary patterns between rural and urban areas. Similar results were found using two different methods of dietary pattern analysis, showing that children residing in rural households were more likely to consume healthier diets than those in urban households.