We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Our study aimed to develop and validate a nomogram to assess talaromycosis risk in hospitalized HIV-positive patients. Prediction models were built using data from a multicentre retrospective cohort study in China. On the basis of the inclusion and exclusion criteria, we collected data from 1564 hospitalized HIV-positive patients in four hospitals from 2010 to 2019. Inpatients were randomly assigned to the training or validation group at a 7:3 ratio. To identify the potential risk factors for talaromycosis in HIV-infected patients, univariate and multivariate logistic regression analyses were conducted. Through multivariate logistic regression, we determined ten variables that were independent risk factors for talaromycosis in HIV-infected individuals. A nomogram was developed following the findings of the multivariate logistic regression analysis. For user convenience, a web-based nomogram calculator was also created. The nomogram demonstrated excellent discrimination in both the training and validation groups [area under the ROC curve (AUC) = 0.883 vs. 0.889] and good calibration. The results of the clinical impact curve (CIC) analysis and decision curve analysis (DCA) confirmed the clinical utility of the model. Clinicians will benefit from this simple, practical, and quantitative strategy to predict talaromycosis risk in HIV-infected patients and can implement appropriate interventions accordingly.
This chapter examines the conceptualization and measurement of contact phenomena in the context of bilingualism across various languages. The goal of the chapter is to account for various phonetic contact phenomena in sociolinguistic analysis, as well as providing context for elaborating on quantitative methodologies in sociophonetic contact linguistics. More specifically, the chapter provides a detailed account of global phenomena in modern natural speech contexts, as well as an up-to-date examination of quantitative methods in the field of sociolinguistics. The first section provides a background of theoretical concepts important to the understanding of sociophonetic contact in the formation of sound systems. The following sections focus on several key social factors that play a major part in the sociolinguistic approach to bilingual phonetics and phonology, including language dominance and age of acquisition at the segmental and the suprasegmental levels, as well as topics of language attitudes and perception, and typical quantitative methods used in sociolinguistics.
Taking a simplified approach to statistics, this textbook teaches students the skills required to conduct and understand quantitative research. It provides basic mathematical instruction without compromising on analytical rigor, covering the essentials of research design; descriptive statistics; data visualization; and statistical tests including t-tests, chi-squares, ANOVAs, Wilcoxon tests, OLS regression, and logistic regression. Step-by-step instructions with screenshots are used to help students master the use of the freely accessible software R Commander. Ancillary resources include a solutions manual and figure files for instructors, and datasets and further guidance on using STATA and SPSS for students. Packed with examples and drawing on real-world data, this is an invaluable textbook for both undergraduate and graduate students in public administration and political science.
Access to waste management services is crucial for urban sustainability, impacting public health, environmental well-being, and overall quality of life. This study employs logistic regression analysis on survey data collected from 1,032 household heads residing in Nouakchott, the capital of Mauritania. The survey investigated key household factors that determine access to waste management services. The findings reveal a significant interplay among waste service provision, the presence of cisterns, housing type and size, and access to electricity. Socioeconomic disparity in service access, with poorer housing formats like shacks receiving substandard services. In contrast, areas with robust electrification report better service access, although inconsistencies remain amid power outages. The research highlights the challenges faced by Riyadh municipality, particularly rapid growth and inadequate infrastructure, which hinder waste management efficiency. Overall, the results not only illuminate Nouakchott’s unique challenges in service provision but also propose actionable recommendations for a sustainable urban future. These recommendations aim to inform and guide targeted policies for improving living conditions and environmental sustainability in urban Mauritania.
Many of the preceding chapters involved optimization formulations: linear least squares, Procrustes, low-rank approximation, multidimensional scaling. All these have analytical solutions, like the pseudoinverse for minimum-norm least squares problems and the truncated singular value decomposition for low-rank approximation. But often we need iterative optimization algorithms, for example if no closed-form minimizer exists, or if the analytical solution requires too much computation and/or memory (e.g., singular value decomposition for large problems. To solve an optimization problem via an iterative method, we start with some initial guess and then the algorithm produces a sequence that hopefully converges to a minimizer. This chapter describes the basics of gradient-based iterative optimization algorithms, including preconditioned gradient descent (PGD) for the linear LS problem. PGD uses a fixed step size, whereas preconditioned steepest descent uses a line search to determine the step size. The chapter then considers gradient descent and accelerated versions for general smooth convex functions. It applies gradient descent to the machine learning application of binary classification via logistic regression. Finally, it summarizes stochastic gradient descent.
It is January 28, 1986. While the world was watching, just 73 seconds after take-off, the Challenger Space Shuttle exploded, killing all seven astronauts on board. The crew included the teacher Christa McAuliffe who would have lectured schoolchildren from space. An important factor that contributed to the disaster was the extremely low temperature at launch. “Extreme” here means “well below temperatures experienced at previous launches”. In this chapter, we give a short overview of the errors that contributed to the explosion. These errors range from purely managerial errors to technical as well as statistical errors. Our discussion includes a statistical analysis of the malfunctioning of so-called rubber O-rings as a function of temperature at launch. As a prime example of efficient risk communication we also recall the press conference at which the physics Nobel Prize winner, Richard Feynman, made his famous “piece-of-rubber-in-ice-water” presentation. This exposed the cause of the accident in all clarity.
The germination percentage (GP) is commonly employed to estimate the viability of a seed population. Statistical methods such as analysis of variance (ANOVA) and logistic regression are frequently used to analyse GP data. While ANOVA has a long history of usage, logistic regression is considered more suitable for GP data due to its binomial nature. However, both methods have inherent issues that require attention. In this study, we address previously unexplored challenges associated with these methods and propose the utilization of a likelihood ratio test as a solution. We demonstrate the advantages of employing the likelihood ratio test for GP data analysis through simulations and real data analysis.
Alternating Dat-Nom/Nom-Dat verbs in Icelandic are notorious for instantiating two diametrically opposed argument structures: the Dat-Nom and the Nom-Dat construction. We conduct a systematic study of the relevant verbs to uncover the factors steering the alternation. This involves a comparison of 15 verbs, five alternating ones, and as a control, five Nom-Dat verbs and five non-alternating Dat-Nom verbs. Our findings show that alternating verbs instantiate the Nom-Dat construction 54% of the time and the Dat-Nom construction 46% of the time on average for four of five verbs when both arguments are full NPs. However, in configurations with a nominative pronoun, the Nom-Dat construction takes precedence over the Dat-Nom construction. Also, for the double-NP configuration, a logistic regression analysis identifies indefiniteness and length as two key predictors, apart from nominative case marking. We demonstrate that the latter systematically correlates with discourse-prominence, which we show, upon closer inspection, correlates with topicality.
Chapter 3 demonstrates how the mathematics of turning Ordinary Least Squares (OLS) regression inside out can be generalized to Generalized Linear Models (GLM) including logistic, Poisson, negative binomial, random intercept, and fixed effects models.
As mentioned in the previous chapter, the perceptron does not perform smooth updates during training, which may slow down learning, or cause it to miss good solutions entirely in real-world situations. In this chapter, we will discuss logistic regression, a machine learning algorithm that elegantly addresses this problem. We also extend the vanilla logistic regression, which was designed for binary classification, to handle multiclass classification. Through logistic regression, we introduce the concept of cost function (i.e., the function we aim to minimize during training), and gradient descent, the algorithm that implements this minimization procedure.
Climate models are primary tools for investigating processes in the climate system, projecting future changes, and informing decision makers. The latest generation of models provides increasingly complex and realistic representations of the real climate system, while there is also growing awareness that not all models produce equally plausible or independent simulations. Therefore, many recent studies have investigated how models differ from observed climate and how model dependence affects model output similarity, typically drawing on climatological averages over several decades. Here, we show that temperature maps of individual days drawn from datasets never used in training can be robustly identified as “model” or “observation” using the CMIP6 model archive and four observational products. An important exception is a prototype storm-resolving simulation from ICON-Sapphire which cannot be unambiguously assigned to either category. These results highlight that persistent differences between simulated and observed climate emerge at short timescales already, but very high-resolution modeling efforts may be able to overcome some of these shortcomings. Moreover, temporally out-of-sample test days can be assigned their dataset name with up to 83% accuracy. Misclassifications occur mostly between models developed at the same institution, suggesting that effects of shared code, previously documented only for climatological timescales, already emerge at the level of individual days. Our results thus demonstrate that the use of machine learning classifiers, once trained, can overcome the need for several decades of data to evaluate a given model. This opens up new avenues to test model performance and independence on much shorter timescales.
Elephant ranges in Asia overlap with human-use areas, leading to frequent and often negative two-way interactions, a fraction of which result in human fatalities. Minimizing such negative interactions rests on gaining a mechanistic understanding of their patterns and underlying processes. In Chhattisgarh (India), a rewilding population of 250–300 elephants that have recently expanded their range from neighbouring states through dispersal has been causing annual losses of >60 human lives. Using logistic regression models, we examined the influences of eight plausible predictors of the occurrence of elephant-related human fatality incidents. We found that 70% of incidents occurred in areas with high-intensity habitat use by elephants; the other 30% were in areas of intermediate and sporadic elephant habitat use. The probability of human fatalities was high along the roads connecting settlements and in areas with frequent house break-ins by elephants, and this probability was also affected by the spatial geometry of forest patches. Immediate practical options to minimize fatal interactions include community-based early-warning systems and the use of portable barriers around settlements. Judicious landscape-level land-use planning aimed at maintaining the resilience of remnant intact elephant habitats will be critical to preventing the dispersal of elephants into suboptimal habitats, which can create complex conflict situations.
Consider the problem of determining the Bayesian credibility mean $E(X_{n+1}|X_1,\cdots, X_n),$ whenever the random claims $X_1,\cdots, X_n,$ given parameter vector $\boldsymbol{\Psi},$ are sampled from the K-component mixture family of distributions, whose members are the union of different families of distributions. This article begins by deriving a recursive formula for such a Bayesian credibility mean. Moreover, under the assumption that using additional information $Z_{i,1},\cdots,Z_{i,m},$ one may probabilistically determine a random claim $X_i$ belongs to a given population (or a distribution), the above recursive formula simplifies to an exact Bayesian credibility mean whenever all components of the mixture distribution belong to the exponential families of distributions. For a situation where a 2-component mixture family of distributions is an appropriate choice for data modelling, using the logistic regression model, it shows that: how one may employ such additional information to derive the Bayesian credibility model, say Logistic Regression Credibility model, for a finite mixture of distributions. A comparison between the Logistic Regression Credibility (LRC) model and its competitor, the Regression Tree Credibility (RTC) model, has been given. More precisely, it shows that under the squared error loss function, it shows the LRC’s risk function dominates the RTC’s risk function at least in an interval which about $0.5.$ Several examples have been given to illustrate the practical application of our findings.
Under supervised learning, when the output variable is discrete or categorical instead of continuous, one has a classification problem instead of a regression problem. Several classification methods are covered: linear discriminant analysis, logistic regression, naive Bayes classifier, K-nearest neighbours, extreme learning machine classifier and multi-layer perceptron classifier. In classification, the cross-entropy objective function is often used in place of the mean squared error function.
Acquired immune deficiency syndrome (UNAIDS) has risen as the serious public health problem across the world. Knowledge about HIV/AIDS is the cornerstone for prevention and treatment. Research is needed to explore the attitude and the effect of different demographic, geographic, and socioeconomic and media exposure factors on males knowledge about HIV in Pakistan. In this study, latest secondary data are used from Pakistan Demographic and Health Survey 2017-18. Sample results show that the majority of the respondents (70%) have knowledge about AIDS. Regression Modeling reveals that man’s knowledge about HIV/AIDS is associated with age, place of residence, educational level, wealth index, ethnicity and media exposure factors. Males of age group 35-39, with higher education, belonging to Pukthon ethnicity, having exposure to mass media on a daily basis and belonging to richest wealth quintile has high Knowledge of HIV/AIDS. For example, the regression model predicts that men between the ages of 35 and 39 from Islamabad who live in urban areas, have higher education, are of Pukhtoon ethnicity, are the head of the household, belong to the richest quintile, work in professional occupations, and use media exposure factors on a daily basis would have probability of 97% of having knowledge of HIV/AIDS. But there is still need to focus to increase the men’s knowledge of HIV/AIDS.
Previous research has established that higher levels of trait Honesty-Humility (HH) are associated with less dishonest behavior in cheating paradigms. However, only imprecise effect size estimates of this HH-cheating link are available. Moreover, evidence is inconclusive on whether other basic personality traits from the HEXACO or Big Five models are associated with unethical decision making and whether such effects have incremental validity beyond HH. We address these issues in a highly powered reanalysis of 16 studies assessing dishonest behavior in an incentivized, one-shot cheating paradigm (N = 5,002). For this purpose, we rely on a newly developed logistic regression approach for the analysis of nested data in cheating paradigms. We also test theoretically derived interactions of HH with other basic personality traits (i.e., Emotionality and Conscientiousness) and situational factors (i.e., the baseline probability of observing a favorable outcome) as well as the incremental validity of HH over demographic characteristics. The results show a medium to large effect of HH (odds ratio = 0.53), which was independent of other personality, situational, or demographic variables. Only one other trait (Big Five Agreeableness) was associated with unethical decision making, although it failed to show any incremental validity beyond HH.
The authors apply logistic regression, multinomial regression, classification trees and random forests to a ternary outcome variable: the variation between the ’s-genitive, the of-genitive and functionally equivalent noun + noun combinations. The statistical approaches discussed fall into regression models on the one hand and classification trees on the other. Specifically, as an alternative to successive binomial regression analyses, the authors implement a multinomial model, which can analyse the entire dataset with three outcome categories simultaneously. Further, a basic classification tree is calculated alongside a more complex (and more robust) random forest. The chapter does not only weigh advantages and shortcomings of all four models, but it also explicates the different rationales and interpretations that come with them. As a major insight, it emerges that the nature of the dataset, the analytic purpose and the statistical model are interdependent and condition each other in several non-trivial respects.
Corpus linguistics continues to be a vibrant methodology applied across highly diverse fields of research in the language sciences. With the current steep rise in corpus sizes, computational power, statistical literacy and multi-purpose software tools, and inspired by neighbouring disciplines, approaches have diversified to an extent that calls for an intensification of the accompanying critical debate. Bringing together a team of leading experts, this book follows a unique design, comparing advanced methods and approaches current in corpus linguistics, to stimulate reflective evaluation and discussion. Each chapter explores the strengths and weaknesses of different datasets and techniques, presenting a case study and allowing readers to gauge methodological options in practice. Contributions also provide suggestions for further reading, and data and analysis scripts are included in an online appendix. This is an important and timely volume, and will be essential reading for any linguist interested in corpus-linguistic approaches to variation and change.
Although there has been significant research on the relationship between alcohol consumption and demographic and psychological influences, this does not consider the effect of social influence among older drinkers and if these effects differ between men and women. One aspect of social influence is social capital. The aim of this paper is to examine whether relational and cognitive social capital are associated with higher or lower risk of alcohol use among adults aged 50 years or older and to assess the extent to which this relationship differs between men and women. To investigate this, data were collected from a cross-sectional questionnaire survey of adults over the age of 50 in the United Kingdom who were recruited from general practitioners. The sample consisted of 9,984 individuals whose mean age was 63.87 years. From these data, we developed proxy measures of social capital and associate these with the respondent's level of alcohol consumption as measured on the Alcohol Use Disorders Identification Test (AUDIT-10) scale. In the sample, just over 20 per cent reported an increasing risk or dependency on alcohol. Using two expressions of social capital – relational (social relationships) and cognitive (knowledge acquisition and understanding) – we found that greater levels of both are associated with a reduced risk of higher drinking risk. Being female had no significant effect when combined with relational capital but did have a significant effect when combined with cognitive capital. It is argued that interventions to enhance social relations among older people and education to help understand alcohol risks would be helpful to protect older people from the damaging effects of excessive alcohol consumption.
In this introductory chapter, we outline the ways in which various problems in data analysis can be formulated as optimization problems. Specifically, we discuss least squares problems, problems in matrix optimization (particularly those involving low-rank matrices), linear and kernel support vector machines, binary and multiclass logistic regression, and deep learning. We also outline the scope of the remainder of the book.