Skip to main content Accessibility help
×
Hostname: page-component-cd9895bd7-jn8rn Total loading time: 0 Render date: 2024-12-27T12:36:55.593Z Has data issue: false hasContentIssue false

6 - Comparing Generalised Linear Mixed-Effects Models, Generalised Linear Mixed-Effects Model Trees and Random Forests

Filled and Unfilled Pauses in Varieties of English

from Part III - Perspectives on Multifactorial Methods

Published online by Cambridge University Press:  06 May 2022

Ole Schützler
Affiliation:
Universität Leipzig
Julia Schlüter
Affiliation:
Universität Bamberg
Get access

Summary

In a comparison of generalised linear mixed-effects models, generalised linear mixed-effects model trees and random forests, the author applies the three methodologies to a binary variable from the field of interactional pragmatics, the choice between filled and unfilled pauses across varieties of English represented by components of the International Corpus of English. Based on a large number of examples annotated for linguistic and extralinguistic factors the steps and decisions involved in the analyses are demonstrated. Though different in essence, the three resulting models share central trends. A more fine-grained evaluation of results and interpretations shows, however, that the three approaches differ in their systematicity of handling multiple observations from the same source, in that only the mixed-effects models explicitly account for and systematically partial out the relatedness of data points contributed by the same speaker. As to the way the approaches balance researcher involvement and control of the outcome, the approaches also differ substantially. A modelling choice can thus lead to notably different perspectives on an identical set of data and variables.

Type
Chapter
Information
Data and Methods in Corpus Linguistics
Comparative Approaches
, pp. 163 - 193
Publisher: Cambridge University Press
Print publication year: 2022

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Further Reading

Levshina, Natalia. 2015. How to Do Linguistics with R: Data Exploration and Statistical Analysis. Amsterdam: John Benjamins. Chapter 14.CrossRefGoogle Scholar
Gries, Stefan Th. 2020. On Classification Trees and Random Forests in Corpus Linguistics: Some Words of Caution and Suggestions for Improvement. Corpus Linguistics and Linguistic Theory 16(3). 617–47.Google Scholar
Field, Andy, Miles, Jeremy and Field, Zoe. 2012. Discovering Statistics Using R. London: Sage. Chapter 19.Google Scholar

References

Anthony, Lawrence. 2017. AntConc (Version 3.5.7). Computer software. Tokyo: Waseda University.Google Scholar
Bernaisch, Tobias. 2015. The Lexis and Lexicogrammar of Sri Lankan English. Amsterdam: John Benjamins.Google Scholar
Bernaisch, Tobias, Gries, Stefan Th and Joybrato, Mukherjee. 2014. The Dative Alternation in South Asian Englishes: Modelling Predictors and Predicting Prototypes. English World-Wide 35(1). 731.Google Scholar
Breiman, Leo. 2001. Random Forests. Machine Learning 45. 532.CrossRefGoogle Scholar
Clark, Herbert, and Fox Tree, Jean E.. 2002. Using Uh and Um in Spontaneous Speaking. Cognition 84. 73111.Google Scholar
Ephratt, Michal. 2008. The Functions of Silence. Journal of Pragmatics 40(11). 1909–38.CrossRefGoogle Scholar
Field, Andy, Miles, Jeremy and Field, Zoe. 2012. Discovering Statistics Using R. London: Sage.Google Scholar
Fokkema, Marjolein, Edbrooke-Childs, Julian and Wolpert, Miranda. 2021. Generalized Linear Mixed-Model (GLMM) Trees: A Flexible Decision-Tree Method for Multilevel and Longitudinal Data. Psychotherapy Research 31(3). 329–41.CrossRefGoogle Scholar
Fokkema, Marjolein, Smits, Niels, Zeileis, Achim, Hothorn, Torsten and Kelderman, Henk. 2018. Detecting Treatment-Subgroup Interactions in Clustered Data with Generalized Linear Mixed-Effects Model Trees. Behavior Research Methods 50(5). 2016–34.Google Scholar
Gilquin, Gaëtanelle. 2008. Hesitation Markers among EFL Learners: Pragmatic Deficiency or Difference. In Romero-Trillo, Jesús, ed. Pragmatics and Corpus Linguistics. Berlin: Mouton de Gruyter. 119–43.Google Scholar
Greenbaum, Sidney. 1991. The Development of the International Corpus of English. In Aijmer, Karin and Alternberg, Bengt, eds. English Corpus Linguistics: Studies in Honour of Jan Svartvik. London: Longman. 8391.Google Scholar
Gries, Stefan Th. 2013. Statistics for Linguistics with R: A Practical Introduction. 2nd ed. Berlin: Mouton de Gruyter.Google Scholar
Gries, Stefan Th. 2018. On Over- and Underuse in Learner Corpus Research and Multifactoriality in Corpus Linguistics More Generally. Journal of Second Language Studies 1(2). 276308.Google Scholar
Gries, Stefan Th. 2020. On Classification Trees and Random Forests in Corpus Linguistics: Some Words of Caution and Suggestions for Improvement. Corpus Linguistics and Linguistic Theory 16(3). 617–47.CrossRefGoogle Scholar
Gries, Stefan Th., and Deshors, Sandra C.. 2014. Using Regressions to Explore Deviations between Corpus Data and a Standard/Target: Two Suggestions. Corpora 9(1). 109–36.Google Scholar
Harrell, Frank. 2015. Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis. 2nd ed. New York: Springer.Google Scholar
Heller, Benedikt. 2017. Stability and Fluidity in Syntactic Variation World-Wide: The Genitive Alternation across Varieties of English. Doctoral thesis. KU Leuven.CrossRefGoogle Scholar
Heller, Benedikt, Bernaisch, Tobias and Gries, Stefan Th.. 2017. Empirical Perspectives on Two Potential Epicenters: The Genitive Alternation in Asian Englishes. ICAME Journal 41. 111–44.Google Scholar
Hothorn, Torsten and Zeileis, Achim. 2015. Partykit: A Modular Toolkit for Recursive Partytioning in R. Journal of Machine Learning Research 16. 3905–9.Google Scholar
James, Gareth, Witten, Daniela, Hastie, Trevor and Tibshirani, Robert. 2015. An Introduction to Statistical Learning with Applications in R. New York: Springer.Google Scholar
Kuhn, Max. 2008. Building Predictive Models in R Using the Caret Package. Journal of Statistical Software 28(5). 126.Google Scholar
Kuhn, Max, and Johnson, Kjell. 2016. Applied Predictive Modeling. New York: Springer.Google Scholar
Kuhn, Max. Contributions from Jed Wing, Steve Weston, Andre Williams, Chris Keefer, Allan Engelhardt, Tony Cooper, Zachary Mayer, Brenton Kenkel, the R Core Team, Michael Benesty, Reynald Lescarbeau, Andrew Ziem, Luca Scrucca, Yuan Tang, Can Candan and Tyler Hunt. 2018. Caret: Classification and regression training [R package version 6.0–81]. https://CRAN.R-project.org/package=caret.Google Scholar
Lange, Claudia. 2012. The Syntax of Spoken Indian English. Amsterdam: John Benjamins.Google Scholar
Maclay, Howard, and Osgood, Charles E.. 1959. Hesitation Phenomena in Spontaneous English Speech. Word 15(1). 1944.Google Scholar
Mukherjee, Joybrato. 2000. Speech Is Silver, but Silence Is Golden: Some Remarks on the Function(s) of Pauses. Anglia 118(4). 57184.Google Scholar
Mukherjee, Joybrato. 2010. The Development of English in India. In Kirkpatrick, Andy, ed. The Routledge Handbook of World Englishes. London: Routledge. 167–80.Google Scholar
Oviatt, Sharon. 1995. Predicting Spoken Disfluencies During Human-Computer Interaction. Computer Speech & Language 9(1). 1935.Google Scholar
Revis, Melanie, and Bernaisch, Tobias. 2020. The Pragmatic Nativisation of Pauses in Asian Englishes. World Englishes 39(1). 135–53.Google Scholar
Rayson, Paul, Leech, Geoffrey N. and Hodges, Mary. 1997. Social Differentiation in the Use of English Vocabulary: Some Analyses of the Conversational Component of the British National Corpus. International Journal of Corpus Linguistics 2(1). 133–52.Google Scholar
Stenström, Anna-Brita. 1990. Pauses in Monologue and Dialogue. In Svartvik, Jan, ed. The London-Lund Corpus of Spoken English: Description and Research. Lund: Lund University Press. 211–52.Google Scholar
Strobl, Carolin, Malley, James D. and Tutz, Gerhard. 2009. An Introduction to Recursive Partitioning: Rationale, Application, and Characteristics of Classification and Regression Trees, Bagging, and Random Forests. Psychological Methods 14(4). 323–48.Google Scholar
Tottie, Gunnel. 2014a. On the Use of Uh and Um in American English. Functions of Language 21(1). 629.Google Scholar
Tottie, Gunnel. 2014b. Turn Management and the Fillers Uh and Um. In Aijmer, Karin and Rühlemann, Christoph, eds. Corpus Pragmatics: A Handbook. Cambridge: Cambridge University Press. 381407.Google Scholar
Venables, William N., and Ripley, Brian D.. 2002. Modern Applied Statistics with S. 4th ed. New York: Springer.Google Scholar
Zuur, Alain F., Ieno, Elena N., Walker, Neil and Saveliev, Anatoly A.. 2009. Mixed Effects Models and Extensions in Ecology with R. Berlin: Springer.Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×