Search

25 - Combining Statistical and Causal Mediation Analysis
from Part IV - Understanding What Your Data Are Telling You About Psychological Processes
- By Amanda Kay Montoya
Edited by Harry T. Reis, University of Rochester, New York, Tessa West, New York University, Charles M. Judd, University of Colorado Boulder
Book:

Handbook of Research Methods in Social and Personality Psychology

Published online:

12 December 2024

Print publication:

19 December 2024, pp 622-652
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Mediation analysis practices in social and personality psychology would benefit from the integration of practices from statistical mediation analysis, which is currently commonly implemented in social and personality psychology, and causal mediation analysis, which is not frequently used in psychology. In this chapter, I briefly describe each method on its own, then provide recommendations for how to integrate practices from each method to simultaneously evaluate statistical inference and causal inference as part of a single analysis. At the end of the chapter, I describe additional areas of recent development in mediation analysis that that social and personality psychologists should also consider adopting I order to improve the quality of inference in their mediation analysis: latent variables and longitudinal models. Ultimately, this chapter is meant to be a kind introduction to causal inference in the context of mediation with very practical recommendations for how one can implement these practices in one’s own research.

14 - Bivariate Regression
from Part 4
Daniel S. Scheller, Texas Tech University
Book:

Elementary Statistics for Public Administration

Published online:

01 November 2024

Print publication:

12 September 2024, pp 271-298
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter is devoted to extensive instruction regarding bivariate regression, also known as ordinary least squares regression (OLS).Students are presented with a scatterplot of data with a best-fitting line drawn through it.They are instructed on how to calculate the equation of this line (least squares line) by hand and with the R Commander.Interpretation of the statistical output of the y-intercept, beta coefficient, and R-squared value are discussed.Statistical significance of the beta coefficient and its implications for the relationship between an independent and dependent variable are described.Finally, the use of the regression equation for prediction is illustrated.

5 - Linear Least-Squares Regression and Binary Classification
Jeffrey A. Fessler, University of Michigan, Ann Arbor, Raj Rao Nadakuditi, University of Michigan, Ann Arbor
Book:

Linear Algebra for Data Science, Machine Learning, and Signal Processing

Published online:

01 November 2024

Print publication:

16 May 2024, pp 143-196
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Many applications require solving a system of linear equations 𝑨𝒙 = 𝒚 for 𝒙 given 𝑨 and 𝒚. In practice, often there is no exact solution for 𝒙, so one seeks an approximate solution. This chapter focuses on least-squares formulations of this type of problem. It briefly reviews the 𝑨𝒙 = 𝒚 case and then motivates the more general 𝑨𝒙 ≈ 𝒚 cases. It then focuses on the over-determined case where 𝑨 is tall, emphasizing the insights offered by the SVD of 𝑨. It introduces the pseudoinverse, which is especially important for the under-determined case where 𝑨 is wide. It describes alternative approaches for the under-determined case such as Tikhonov regularization. It introduces frames, a generalization of unitary matrices. It uses the SVD analysis of this chapter to describe projection onto a subspace, completing the subspace-based classification ideas introduced in the previous chapter, and also introduces a least-squares approach to binary classifier design. It introduces recursive least-squares methods that are important for streaming data.

8 - Implementing Classical Methods: k-Means and Linear Regression
from Part III - Machine Learning for Big Data
Isaac Triguero, University of Nottingham, Mikel Galar, Public University of Navarre
Book:

Large-Scale Data Analytics with Python and Spark

Published online:

15 December 2023

Print publication:

23 November 2023, pp 254-304
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In this chapter we cover clustering and regression, looking at two traditional machine learning methods: k-means and linear regression. We briefly discuss how to implement these methods in a non-distributed manner first, to then carefully analyze the bottlenecks of these methods when manipulating big data. This enables us to design global-based solutions based on the DataFrame API of Spark. The key focus is on the principles for designing solutions effectively. Nevertheless, some of the challenges in this chapter are to investigate tools from Spark to speed up the processing even further. k-means is an example of an iterative algorithm, and how to exploit caching in Spark, and we analyze its implementation with both RDD and DataFrame APIs. For linear regression, we first implement the closed form, which involves numerous matrix multiplications and outer products, to simplify the processing in big data. Then, we look at gradient descent. These examples give us the opportunity to expand on the principles of designing a global solution, and also allow us to show how knowing the underlying platform, Spark in this case, well is essential to really maximize the performance.

9 - Machine Learning
Marco P. Schoen, Idaho State University
Book:

Introduction to Intelligent Systems, Control, and Machine Learning using MATLAB

Published online:

27 November 2023

Print publication:

16 November 2023, pp 323-374
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In this chapter, we introduce some of the more popular ML algorithms. Our objective is to provide the basic concepts and main ideas, how to utilize these algorithms using Matlab, and offer some examples. In particular, we discuss essential concepts in feature engineering and how to apply them in Matlab. Support vector machines (SVM), K-nearest neighbor (KNN), linear regression, Naïve Bayes algorithm, and decision trees are introduced and the fundamental underlying mathematics is explained while using Matlab’s corresponding Apps to implement each of these algorithms. A special section on reinforcement learning is included, detailing the key concepts and basic mechanism of this third ML category. In particular, we showcase how to implement reinforcement learning in Matlab as well as make use of some of the Python libraries available online and show how to use reinforcement learning for controller design.

How to improve the substantive interpretation of regression results when the dependent variable is logged
Oliver Rittmann, Marcel Neunhoeffer, Thomas Gschwend
Journal:

Political Science Research and Methods , First View

Published online by Cambridge University Press:

10 August 2023, pp. 1-9
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Regression models with log-transformed dependent variables are widely used by social scientists to investigate nonlinear relationships between variables. Unfortunately, this transformation complicates the substantive interpretation of estimation results and often leads to incomplete and sometimes even misleading interpretations. We focus on one valuable but underused method, the presentation of quantities of interest such as expected values or first differences on the original scale of the dependent variable. The procedure to derive these quantities differs in seemingly minor but critical aspects from the well-known procedure based on standard linear models. To improve empirical practice, we explain the underlying problem and develop guidelines that help researchers to derive meaningful interpretations from regression results of models with log-transformed dependent variables.

5 - Linear Regression
William W. Hsieh, University of British Columbia, Vancouver
Book:

Introduction to Environmental Data Science

Published online:

23 March 2023

Print publication:

23 March 2023, pp 137-172
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Simple linear regression is extended to multiple linear regression (for multiple predictor variables) and to multivariate linear regression for (multiple response variables). Regression with circular data and/or categorical data is covered. How to select predictors and how to avoid overfitting with techniques such as ridge regression and lasso are followed by quantile regression. The assumption of Gaussian noise or residual is removed in generalized least squares, with applications to optimal fingerprinting in climate change.

Computing quantities of interest and their uncertainty using Bayesian simulation
Andreas Murr, Richard Traunmüller, Jeff Gill
Journal:

Political Science Research and Methods / Volume 11 / Issue 3 / July 2023

Published online by Cambridge University Press:

26 April 2022, pp. 623-632
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
When analyzing data, researchers are often less interested in the parameters of statistical models than in functions of these parameters such as predicted values. Here we show that Bayesian simulation with Markov-Chain Monte Carlo tools makes it easy to compute these quantities of interest with their uncertainty. We illustrate how to produce customary and relatively new quantities of interest such as variable importance ranking, posterior predictive data, difficult marginal effects, and model comparison statistics to allow researchers to report more informative results.

Statistically Valid Inferences from Differentially Private Data Releases, with Application to the Facebook URLs Dataset
Georgina Evans, Gary King
Journal:

Political Analysis / Volume 31 / Issue 1 / January 2023

Published online by Cambridge University Press:

20 April 2022, pp. 1-21
- Article
- - You have access
- PDF
- HTML
- Export citation
We offer methods to analyze the “differentially private” Facebook URLs Dataset which, at over 40 trillion cell values, is one of the largest social science research datasets ever constructed. The version of differential privacy used in the URLs dataset has specially calibrated random noise added, which provides mathematical guarantees for the privacy of individual research subjects while still making it possible to learn about aggregate patterns of interest to social scientists. Unfortunately, random noise creates measurement error which induces statistical bias—including attenuation, exaggeration, switched signs, or incorrect uncertainty estimates. We adapt methods developed to correct for naturally occurring measurement error, with special attention to computational efficiency for large datasets. The result is statistically valid linear regression estimates and descriptive statistics that can be interpreted as ordinary analyses of nonconfidential data but with appropriately larger standard errors.

10 - Mathematics and Bayesian Inference
from Part III - Artificial Intelligence
Edited by Liao Heng, Bill McColl
Book:

Mathematics for Future Computing and Communications

Published online:

03 December 2021

Print publication:

16 December 2021, pp 285-308
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

11 - Data Analysis: Curve Fitting and Interpolation
from Part III - Least Squares and Optimization
Kevin W. Cassel, Illinois Institute of Technology
Book:

Matrix, Numerical, and Optimization Methods in Science and Engineering

Published online:

18 February 2021

Print publication:

04 March 2021, pp 560-593
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Analysis of various data sets can be accomplished using techniques based on least-squares methods.For example, linear regression of data determines the best-fit line to the data via a least-squares approach.The same is true for polynomial and regression methods using other basis functions.Curve fitting is used to determine the best-fit line or curve to a particular set of data, while interpolation is used to determine a curve that passes through all of the data points.Polynomial and spline interpolation are discussed.State estimation is covered using techniques based on least-squares methods.

How Much Does the Cardinal Treatment of Ordinal Variables Matter? An Empirical Investigation
Jeffrey R. Bloem
Journal:

Political Analysis / Volume 30 / Issue 2 / April 2022

Published online by Cambridge University Press:

26 February 2021, pp. 197-213
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Many researchers use an ordinal scale to quantitatively measure and analyze concepts. Theoretically valid empirical estimates are robust in sign to any monotonic increasing transformation of the ordinal scale. This presents challenges for the point-identification of important parameters of interest. I develop a partial identification method for testing the robustness of empirical estimates to a range of plausible monotonic increasing transformations of the ordinal scale. This method allows for the calculation of plausible bounds around effect estimates. I illustrate this method by revisiting analysis by Nunn and Wantchekon (2011, American Economic Review, 101, 3221–3252) on the slave trade and trust in sub-Saharan Africa. Supplemental illustrations examine results from (i) Aghion et al. (2016, American Economic Review, 106, 3869–3897) on creative destruction and subjective well-being and (ii) Bond and Lang (2013, The Review of Economics and Statistics, 95, 1468–1479) on the fragility of the black–white test score gap.

Chapter 3 - Sociolinguistic Variation in Intensifier Usage in Indian and British English
- By Robert Fuchs
Edited by Tobias Bernaisch, Justus-Liebig-Universität Giessen, Germany
Book:

Gender in World Englishes

Published online:

11 December 2020

Print publication:

07 January 2021, pp 47-68
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Research on gender differences in language use previously focused mainly on affluent, especially Western societies. The present chapter extends this research to acrolectal Indian English, a postcolonial variety of English, investigating how the use of intensifiers (e.g. very, really) is affected not only by the speakers’ gender, but also their age, the gender of the other speakers in the conversation and the formality of the context. Results show some parallels with Western varieties of English, in particular a tendency for women to use more intensifiers than men in informal contexts. However, Indian women modify their usage of intensifiers with respect to the formality of the context more than British women and men, while Indian men do so less than British women and men. In mixed-sex conversations, Indian women also converge with Indian men in their intensifier usage, while neither British women nor men do so. The more flexible use of intensifiers by Indian women may be a response to societal expectations regarding their linguistic behaviour, in order to avoid censure by society. British women likewise continue to be affected by such constraints, but much less so, while the linguistic behaviour of Indian and British men is subject to less criticism.

9 - Distance Metrics and Data Transformations
from Part III - Classifiers and Tools
Jianxin Wu, Nanjing University, China
Book:

Essentials of Pattern Recognition

Published online:

08 December 2020

Print publication:

19 November 2020, pp 196-218
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter is not about one particular method (or a family of methods). Instead, it provides a set of tools useful for better pattern recognition, especially for real-world applications. They include the definition of distance metrics, vector norms, a brief introduction to the idea of distance metric learning, and power mean kernels (which is a family of useful metrics). We also establish by examples that proper normalizations of our data are essential, and introduce a few data normalization and transformation methods.

Monitoring the Impact of Air Quality on the COVID-19 Fatalities in Delhi, India: Using Machine Learning Techniques
Jasleen Kaur Sethi, Mamta Mittal
Journal:

Disaster Medicine and Public Health Preparedness / Volume 16 / Issue 2 / April 2022

Published online by Cambridge University Press:

12 October 2020, pp. 604-611
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Objective:
The focus of this study is to monitor the effect of lockdown on the various air pollutants due to the coronavirus disease (COVID-19) pandemic and identify the ones that affect COVID-19 fatalities so that measures to control the pollution could be enforced.
Methods:
Various machine learning techniques: Decision Trees, Linear Regression, and Random Forest have been applied to correlate air pollutants and COVID-19 fatalities in Delhi. Furthermore, a comparison between the concentration of various air pollutants and the air quality index during the lockdown period and last two years, 2018 and 2019, has been presented.
Results:
From the experimental work, it has been observed that the pollutants ozone and toluene have increased during the lockdown period. It has also been deduced that the pollutants that may impact the mortalities due to COVID-19 are ozone, NH3, NO2, and PM10.
Conclusions:
The novel coronavirus has led to environmental restoration due to lockdown. However, there is a need to impose measures to control ozone pollution, as there has been a significant increase in its concentration and it also impacts the COVID-19 mortality rate.

Updating maintenance energy requirement for the current sheep flocks and the associated effect of nutritional and animal factors
C. T. Yang, C. M. Wang, Y. G. Zhao, T. B. Chen, A. Aubry, A. W. Gordon, T. Yan
Journal:

animal / Volume 14 / Issue 2 / February 2020

Published online by Cambridge University Press:

26 September 2019, pp. 295-302

Print publication:

February 2020
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
There is evidence indicating that using the current UK energy feeding system to ration the present sheep flocks may underestimate their nutrient requirements. The objective of the present study was to address this issue by developing updated maintenance energy requirements for the current sheep flocks and evaluating if these requirements were influenced by a range of dietary and animal factors. Data (n = 131) used were collated from five experiments with sheep (5 to 18 months old and 29.0 to 69.8 kg BW) undertaken at the Agri-Food and Biosciences Institute of the UK from 2013 to 2017. The trials were designed to evaluate the effects of dietary type, genotype, physiological stage and sex on nutrient utilization and energetic efficiencies. Energy intake and output data were measured in individual calorimeter chambers. Energy balance (Eg) was calculated as the difference between gross energy intake and a sum of fecal energy, urine energy, methane energy and heat production. Data were analysed using the restricted maximum likelihood analysis to develop the linear relationship between Eg or heat production and metabolizable energy (ME) intake, with the effects of a range of dietary and animal factors removed. The net energy (NEm) and ME (MEm) requirements for maintenance derived from the linear relationship between Eg and ME intake were 0.358 and 0.486 MJ/kg BW0.75, respectively, which are 40% to 53% higher than those recommended in energy feeding systems currently used to ration sheep in the USA and the UK. Further analysis of the current dataset revealed that concentrate supplement, sire type or physiological stage had no significant effect on the derived NEm values. However, female lambs had a significantly higher NEm (0.352 v. 0.306 or 0.288 MJ/kg BW0.75) or MEm (0.507 v. 0.441 or 0.415 MJ/kg BW0.75) than those for male or castrated lambs. The present results indicate that using present energy feeding systems in the UK developed over 40 years ago to ration the current sheep flocks could underestimate maintenance energy requirements. There is an urgent need to update these systems to reflect the higher metabolic rates of the current sheep flocks.

Ecological Regression with Partial Identification
Wenxin Jiang, Gary King, Allen Schmaltz, Martin A. Tanner
Journal:

Political Analysis / Volume 28 / Issue 1 / January 2020

Published online by Cambridge University Press:

02 August 2019, pp. 65-86
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Ecological inference (EI) is the process of learning about individual behavior from aggregate data. We relax assumptions by allowing for “linear contextual effects,” which previous works have regarded as plausible but avoided due to nonidentification, a problem we sidestep by deriving bounds instead of point estimates. In this way, we offer a conceptual framework to improve on the Duncan–Davis bound, derived more than 65 years ago. To study the effectiveness of our approach, we collect and analyze 8,430 $2\times 2$ EI datasets with known ground truth from several sources—thus bringing considerably more data to bear on the problem than the existing dozen or so datasets available in the literature for evaluating EI estimators. For the 88% of real data sets in our collection that fit a proposed rule, our approach reduces the width of the Duncan–Davis bound, on average, by about 44%, while still capturing the true district-level parameter about 99% of the time. The remaining 12% revert to the Duncan–Davis bound.

3 - Statistics for Corpus-Based and Corpus-Driven Approaches to Empirical Translation Studies
- By Michael Oakes
Edited by Meng Ji, University of Sydney, Michael Oakes, University of Wolverhampton
Book:

Advances in Empirical Translation Studies

Published online:

10 June 2019

Print publication:

13 June 2019, pp 28-52
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Tognini-Bonelli (2001) made the following distinction between corpus-based and corpus-driven studies. While corpus-based studies start with pre-existing theories which are tested using corpus data, in corpus driven studies the hypothesis is derived by examination of the corpus evidence. This chapter will give an overview of the two different families of statistical tests which are suited for these two approaches. For corpus-based approaches, we use more traditional statistics, such as the t-test, or ANOVA which return a value called a p-value to tell us to what extent we should accept or reject the initial hypothesis. Multi-level modelling (also known as mixed modelling) is a new technique which shows considerable promise for corpus-based studies, and will also be described here to analyse the ENNTT subset of Europarl corpus. Multi-level modelling is useful for the examination of hierarchically structured or “nested” data, where for example translations may be “nested” together in a class if they have the same language of origin. A multi-level model takes account both of the variation between individual translations and the variation between classes. For example, we might expect the scores (such as vocabulary richness or readability scores) of two translations in the same class to be more similar to each other than two translations in different classes.

2 - Covariance-Based Approaches
Robin Crockett, University of Northampton
Book:

A Primer on Fourier Analysis for the Geosciences

Published online:

01 February 2019

Print publication:

14 February 2019, pp 9-30
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Review of correlation and simple linear regression. Introduction to lagged (cross-) correlation for identifying recurrent and periodic features in common between pairs of time-series, statistical evidence of possible causal relationships. Introduction to (lagged) autocorrelation for identifying recurrent and periodic features in time-series. Use of correlation and simple linear regression for statistical comparison of time-series to reference datasets, with focus on periodic (sinusoidal) reference datasets. Interpretation of statistical effect-size and significance (p-value).

A NEW APPROACH TO SELECT THE BEST SUBSET OF PREDICTORS IN LINEAR REGRESSION MODELLING: BI-OBJECTIVE MIXED INTEGER LINEAR PROGRAMMING
Part of
- Operations research and management science
- Linear inference, regression
HADI CHARKHGARD, ALI ESHRAGH
Journal:

The ANZIAM Journal / Volume 61 / Issue 1 / January 2019

Published online by Cambridge University Press:

11 January 2019, pp. 64-75
- Article
- - You have access
- PDF
- Export citation
We study the problem of choosing the best subset of $p$ features in linear regression, given $n$ observations. This problem naturally contains two objective functions including minimizing the amount of bias and minimizing the number of predictors. The existing approaches transform the problem into a single-objective optimization problem. We explain the main weaknesses of existing approaches and, to overcome their drawbacks, we propose a bi-objective mixed integer linear programming approach. A computational study shows the efficacy of the proposed approach.

Search Results

Refine search

Refine search

Actions for selected content:

31 results

25 - Combining Statistical and Causal Mediation Analysis

Summary

14 - Bivariate Regression

Summary

5 - Linear Least-Squares Regression and Binary Classification

Summary

8 - Implementing Classical Methods: k-Means and Linear Regression

Summary

9 - Machine Learning

Summary

How to improve the substantive interpretation of regression results when the dependent variable is logged

5 - Linear Regression

Summary

Computing quantities of interest and their uncertainty using Bayesian simulation

Statistically Valid Inferences from Differentially Private Data Releases, with Application to the Facebook URLs Dataset

10 - Mathematics and Bayesian Inference

11 - Data Analysis: Curve Fitting and Interpolation

Summary

How Much Does the Cardinal Treatment of Ordinal Variables Matter? An Empirical Investigation

Chapter 3 - Sociolinguistic Variation in Intensifier Usage in Indian and British English

Summary

9 - Distance Metrics and Data Transformations

Summary

Monitoring the Impact of Air Quality on the COVID-19 Fatalities in Delhi, India: Using Machine Learning Techniques

Updating maintenance energy requirement for the current sheep flocks and the associated effect of nutritional and animal factors

Ecological Regression with Partial Identification

3 - Statistics for Corpus-Based and Corpus-Driven Approaches to Empirical Translation Studies

Summary

2 - Covariance-Based Approaches

Summary

A NEW APPROACH TO SELECT THE BEST SUBSET OF PREDICTORS IN LINEAR REGRESSION MODELLING: BI-OBJECTIVE MIXED INTEGER LINEAR PROGRAMMING

Search Results

Refine search

Refine search

Actions for selected content:

Save Search

31 results

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary