Book contents
- Data and Methods in Corpus Linguistics
- Data and Methods in Corpus Linguistics
- Copyright page
- Contents
- Figures
- Tables
- Contributors
- Acknowledgements
- Introduction: Comparative Approaches to Data and Methods in Corpus Linguistics
- Part I Corpus Dimensions and the Viability of Methodological Approaches
- Part II Selection, Calibration and Preparation of Corpus Data
- Part III Perspectives on Multifactorial Methods
- 6 Comparing Generalised Linear Mixed-Effects Models, Generalised Linear Mixed-Effects Model Trees and Random Forests
- 7 Comparing Logistic Regression, Multinomial Regression, Classification Trees and Random Forests Applied to Ternary Variables
- 8 Comparing Bayesian and Frequentist Models of Language Variation
- 9 Comparing Methods for the Evaluation of Cluster Structures in Multidimensional Analyses
- Part IV Applications of Classification-Based Approaches
- Index
- References
6 - Comparing Generalised Linear Mixed-Effects Models, Generalised Linear Mixed-Effects Model Trees and Random Forests
Filled and Unfilled Pauses in Varieties of English
from Part III - Perspectives on Multifactorial Methods
Published online by Cambridge University Press: 06 May 2022
- Data and Methods in Corpus Linguistics
- Data and Methods in Corpus Linguistics
- Copyright page
- Contents
- Figures
- Tables
- Contributors
- Acknowledgements
- Introduction: Comparative Approaches to Data and Methods in Corpus Linguistics
- Part I Corpus Dimensions and the Viability of Methodological Approaches
- Part II Selection, Calibration and Preparation of Corpus Data
- Part III Perspectives on Multifactorial Methods
- 6 Comparing Generalised Linear Mixed-Effects Models, Generalised Linear Mixed-Effects Model Trees and Random Forests
- 7 Comparing Logistic Regression, Multinomial Regression, Classification Trees and Random Forests Applied to Ternary Variables
- 8 Comparing Bayesian and Frequentist Models of Language Variation
- 9 Comparing Methods for the Evaluation of Cluster Structures in Multidimensional Analyses
- Part IV Applications of Classification-Based Approaches
- Index
- References
Summary
In a comparison of generalised linear mixed-effects models, generalised linear mixed-effects model trees and random forests, the author applies the three methodologies to a binary variable from the field of interactional pragmatics, the choice between filled and unfilled pauses across varieties of English represented by components of the International Corpus of English. Based on a large number of examples annotated for linguistic and extralinguistic factors the steps and decisions involved in the analyses are demonstrated. Though different in essence, the three resulting models share central trends. A more fine-grained evaluation of results and interpretations shows, however, that the three approaches differ in their systematicity of handling multiple observations from the same source, in that only the mixed-effects models explicitly account for and systematically partial out the relatedness of data points contributed by the same speaker. As to the way the approaches balance researcher involvement and control of the outcome, the approaches also differ substantially. A modelling choice can thus lead to notably different perspectives on an identical set of data and variables.
Keywords
- Type
- Chapter
- Information
- Data and Methods in Corpus LinguisticsComparative Approaches, pp. 163 - 193Publisher: Cambridge University PressPrint publication year: 2022
References
Further Reading
References
- 2
- Cited by