We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
If the results of a study reveal an interesting association between an exposure and a health outcome, there is a natural tendency to assume that it is real. (Note: we are considering whether two things are associated. This does not imply that one causes the other to occur.) However, before we can even contemplate this possibility we have to try to rule out other possible explanations for the results. There are three main ‘alternative explanations’ that we have to consider whenever we analyse epidemiological data or read the reports of others, whatever the study design; namely, could the results be due to chance, bias or error, or confounding? We discuss the first of these, chance, in this chapter and cover bias and confounding in Chapters 7 and 8, respectively.
Taking a simplified approach to statistics, this textbook teaches students the skills required to conduct and understand quantitative research. It provides basic mathematical instruction without compromising on analytical rigor, covering the essentials of research design; descriptive statistics; data visualization; and statistical tests including t-tests, chi-squares, ANOVAs, Wilcoxon tests, OLS regression, and logistic regression. Step-by-step instructions with screenshots are used to help students master the use of the freely accessible software R Commander. Ancillary resources include a solutions manual and figure files for instructors, and datasets and further guidance on using STATA and SPSS for students. Packed with examples and drawing on real-world data, this is an invaluable textbook for both undergraduate and graduate students in public administration and political science.
Analysis of Variance (ANOVA) is a commonly used test in public administration research when the dependent variable is measured at the interval level and the independent variable is measured at the nominal level with more than two categories.The chapter covers when and how to use ANOVA along with the assumptions of the ANOVA test.Conducting ANOVA in the R Commander and interpreting the output and statistical significance of the test are the main foci of the chapter.
Students are introduced the logic, foundation, and basics of statistical inference. The need for samples is first discussed and then how samples can be used to make inferences about the larger population. The normal distribution is then discussed, along with Z-scores to illustrate basic probability and the logic of statistical significance.
This chapter focuses on the most common statistical tests used when the dependent variable is measured at the interval level and the independent variable is nominal with two categories.The one-sample t-test is first introduced so that students can understand comparing a value of the dependent variable to another value.Then, the independent samples t-test is discussed, followed by the dependent samples t-test when the unit of analysis is compared over time, especially in a pre-post setting.All of these tests are also illustrated with R Commander instruction and interpretation of the t-statistic and statistical significance.
This chapter is devoted to extensive instruction regarding bivariate regression, also known as ordinary least squares regression (OLS).Students are presented with a scatterplot of data with a best-fitting line drawn through it.They are instructed on how to calculate the equation of this line (least squares line) by hand and with the R Commander.Interpretation of the statistical output of the y-intercept, beta coefficient, and R-squared value are discussed.Statistical significance of the beta coefficient and its implications for the relationship between an independent and dependent variable are described.Finally, the use of the regression equation for prediction is illustrated.
The final chapter of the textbook covers logistic regression, a statistical test used when the dependent variable is dichotomous or binary.OLS regression should not be used when the dependent variable is binary.The first discussion focuses on the limitations of OLS in this situation.The logit equation is presented and then steps for conducting a logistic regression in the R Commander are explained.Interpretation of the logistic regression output using odds ratios, percent change in odds, and predicted probabilities is discussed.Applied examples are used to better illustrate when to use logistic regression.
The chi-square test is used when both the dependent and independent variables are measured at the nominal level.The first step to running a chi-square test is to construct a contingency table.Students are instructed on how to do so by hand and with the R Commander.Assumptions of the chi-square test follow.Running the chi-square test in the R Commander is then discussed along with interpretation and statistical significance.The chapter concludes with limitations of the chi-square test.
When the effect of one independent variable on the dependent variable is conditional upon values of another dependent variable, we have an interactive relationship.If the effect of one variable on the dependent variable changes across various values of a second independent variable, we have an interactive relationship.This chapter provides examples of interactive relationships and how to model them using an interaction term in a linear regression.Attention is given to how to interpret interaction terms in linear regression and statistical significance for both interactions with interval level variables and dummy variables.Marginal effects graphs are illustrated to further explain interactive relationships.
The chapter begins with an applied example describing the limitations of bivariate regression and the need to include multiple independent variables in a regression model to explain the dependent variable.The logic of multivariate regression is discussed as it compares to bivariate regression.Running a multivariate regression in the R Commander and interpretation of the results are the main foci of the chapter, with particular attention to the beta coefficients, y-intercept, and adjusted R-squared.Generating the multivariate regression equation from the R Commander output is covered, along with engaging in prediction using this equation.Finally, interpretation of dummy independent variables in a multivariate regression is covered.
The chapter covers the use of ordinal dependent variables like Likert scale measures for research hypotheses.The Wilcoxon Rank Sum test is described using a public administration example.Students learn how to conduct the rank sum test by hand and with the R Commander.Interpretation and statistical significance are the foci of the R Commander output.The Wilcoxon Signed Rank test is explained as is how it differs from the Rank Sum Test.Instructions for conducting and interpreting the Signed Rank test in the R Commander are included.
Chapter 4 explores methods for turning model variance inside out. We discuss the logic and limitations of decomposing variance to the level of cases and introduce two distinct approaches to obtaining case-level contributions to the model variance: the squared residuals approach and the leave-one-out approach.
At the core of epidemiology is the use of quantitative methods to study health, and how it may be improved, in populations. It is important to note that epidemiology concerns not only the study of diseases but also of all health-related events. Rational health-promoting public policies require a sound understanding of causation. The epidemiological analysis of a disease or activity from a population perspective is vital in order to be able to organize and monitor effective preventive, curative and rehabilitative services. All health professionals and health-service managers need an awareness of the principles of epidemiology. They need to go beyond questions relating to individuals to challenging fundamentals such as ‘Why did this person get this disease at this time?’, ‘Is the occurrence of the disease increasing and, if so, why?’ and ‘What are the causes or risk factors for this disease?’
For this book, we assume you’ve had an introductory statistics or experimental design class already! This chapter is a mini refresher of some critical concepts we’ll be using and lets you check you understand them correctly. The topics include understanding predictor and response variables, the common probability distributions that biologists encounter in their data, the common techniques, particularly ordinary least squares (OLS) and maximum likelihood (ML), for fitting models to data and estimating effects, including their uncertainty. You should be familiar with confidence intervals and understand what hypothesis tests and P-values do and don’t mean. You should recognize that we use data to decide, but these decisions can be wrong, so you need to understand the risk of missing important effects and the risk of falsely claiming an effect. Decisions about what constitutes an “important” effect are central.
Many dependent variables analyzed in the social sciences are not continuous, but are dichotomous, with a yes/no response. A dichotomous dependent variable takes on only two values; the value 1 represents yes, and the value 0, no. The independent variables in the regression model are then used to predict whether the subjects fall into one of the two dependent variable categories. In this chapter we discuss the modeling of a dichotomous dependent variable and show why ordinary least squares regression is not appropriate. We discuss the logistic regression model. We fit a logistic regression equation and address several statistical concepts and issues: log likelihoods, the likelihood ratio chi-squared statistic, Pseudo R2, model adequacy, and statistical significance. We then discuss the interpretation of logit coefficients, odds ratios, standardized logit coefficients, and standardized odds ratios. We show how to use “margins” in the interpretation of logit models with predicted probabilities. The last sections deal with testing and evaluating nested logit models, and with comparing logit models with probit models.
A major concern in the social sciences is understanding and explaining the relationship between two variables. We showed in Chapter 5 how to address this issue using tabular presentations. In this chapter we show how to address the issue statistically via regression and correlation. We first cover the two concepts of regression and correlation. We then turn to the issue of statistical inference and ways of evaluating the statistical significance of our results. Since most social science research is undertaken using sample data, we need to determine whether the regression and correlation coefficients we calculate using the sample data are statistically significant in the larger population from which the sample data were drawn.
p-values are measures of random error, or chance mistakes. They are meant to be used mainly in RCTs, where confounding bias has been removed. They are misused in observational studies.
A one-sample t-test is an NHST procedure that is appropriate when a z-test cannot be performed because the population standard deviation is unknown. The one-sample t-test follows all of the eight steps of the z-test, but requires modifications to accommodate the unknown sample standard deviation. First, the formulas that used σy now use the estimated population standard deviation based on sample data instead. Second, degrees of freedom must be calculated. Finally, t-tests use a new probability distribution called a t-distribution.
This chapter also explains more about p-values. First, when p is lower than α, the null hypothesis is always rejected. Second, when p is higher than α, the null hypothesis is always retained. Therefore, we can determine whether p is smaller or larger than α by determining whether the null hypothesis was retained or rejected for α. This chapter also discusses confidence intervals (CIs), which are a range of plausible values for a population parameter. CIs can vary in width, which the researcher chooses. The 95% CI width is most common in social science research. Finally, one-sample t-tests can be used to test the hypothesis that the sample mean is equal to any value (not the population mean).
Dr Nick Martin has made enormous contributions to the field of behavior genetics over the past 50 years. Of his many seminal papers that have had a profound impact, we focus on his early work on the power of twin studies. He was among the first to recognize the importance of sample size calculation before conducting a study to ensure sufficient power to detect the effects of interest. The elegant approach he developed, based on the noncentral chi-squared distribution, has been adopted by subsequent researchers for other genetic study designs, and today remains a standard tool for power calculations in structural equation modeling and other areas of statistical analysis. The present brief article discusses the main aspects of his seminal paper, and how it led to subsequent developments, by him and others, as the field of behavior genetics evolved into the present era.