Book contents
- Data and Methods in Corpus Linguistics
- Data and Methods in Corpus Linguistics
- Copyright page
- Contents
- Figures
- Tables
- Contributors
- Acknowledgements
- Introduction: Comparative Approaches to Data and Methods in Corpus Linguistics
- Part I Corpus Dimensions and the Viability of Methodological Approaches
- Part II Selection, Calibration and Preparation of Corpus Data
- 3 Comparing Approaches to (Sub-)Register Variation
- 4 Comparing Baselines for Corpus Analysis
- 5 Comparing Study Designs and Down-Sampling Strategies in Corpus Analysis
- Part III Perspectives on Multifactorial Methods
- Part IV Applications of Classification-Based Approaches
- Index
- References
4 - Comparing Baselines for Corpus Analysis
Research into the Get-Passive in Speech and Writing
from Part II - Selection, Calibration and Preparation of Corpus Data
Published online by Cambridge University Press: 06 May 2022
- Data and Methods in Corpus Linguistics
- Data and Methods in Corpus Linguistics
- Copyright page
- Contents
- Figures
- Tables
- Contributors
- Acknowledgements
- Introduction: Comparative Approaches to Data and Methods in Corpus Linguistics
- Part I Corpus Dimensions and the Viability of Methodological Approaches
- Part II Selection, Calibration and Preparation of Corpus Data
- 3 Comparing Approaches to (Sub-)Register Variation
- 4 Comparing Baselines for Corpus Analysis
- 5 Comparing Study Designs and Down-Sampling Strategies in Corpus Analysis
- Part III Perspectives on Multifactorial Methods
- Part IV Applications of Classification-Based Approaches
- Index
- References
Summary
The authors review different baselines for the study of alternant choices, emphasizing that normalization to a standard number of words – while straightforward in its application – will in many cases not provide a meaningful measure of frequency. Instead, it is argued, we need a baseline indicating opportunities of use, such as phrase or sentence counts. Exemplifying their proposal with reference to get- and be-passives and the presence or absence of agentive by-phrases, the authors demonstrate a sequence of measures taken to make the quantities that are compared more meaningful and defensible, based on linguistically informed selections of baseline quantities (number of main verbs, passives or potentially alternating passives). Crucially, this process must involve a categorization of observations by the researcher to ensure that mutual substitution is plausible in each case. To calibrate this manual data verification exercise to a manageable level, the authors apply a method of uneven category sub-sampling to the data, and use it to adjust variance estimates and confidence intervals in their analysis.
Keywords
- Type
- Chapter
- Information
- Data and Methods in Corpus LinguisticsComparative Approaches, pp. 101 - 126Publisher: Cambridge University PressPrint publication year: 2022