Book contents
- Data and Methods in Corpus Linguistics
- Data and Methods in Corpus Linguistics
- Copyright page
- Contents
- Figures
- Tables
- Contributors
- Acknowledgements
- Introduction: Comparative Approaches to Data and Methods in Corpus Linguistics
- Part I Corpus Dimensions and the Viability of Methodological Approaches
- Part II Selection, Calibration and Preparation of Corpus Data
- 3 Comparing Approaches to (Sub-)Register Variation
- 4 Comparing Baselines for Corpus Analysis
- 5 Comparing Study Designs and Down-Sampling Strategies in Corpus Analysis
- Part III Perspectives on Multifactorial Methods
- Part IV Applications of Classification-Based Approaches
- Index
- References
3 - Comparing Approaches to (Sub-)Register Variation
The ‘Press Editorials’ Sections in the British, Canadian and Jamaican Components of ICE
from Part II - Selection, Calibration and Preparation of Corpus Data
Published online by Cambridge University Press: 06 May 2022
- Data and Methods in Corpus Linguistics
- Data and Methods in Corpus Linguistics
- Copyright page
- Contents
- Figures
- Tables
- Contributors
- Acknowledgements
- Introduction: Comparative Approaches to Data and Methods in Corpus Linguistics
- Part I Corpus Dimensions and the Viability of Methodological Approaches
- Part II Selection, Calibration and Preparation of Corpus Data
- 3 Comparing Approaches to (Sub-)Register Variation
- 4 Comparing Baselines for Corpus Analysis
- 5 Comparing Study Designs and Down-Sampling Strategies in Corpus Analysis
- Part III Perspectives on Multifactorial Methods
- Part IV Applications of Classification-Based Approaches
- Index
- References
Summary
Two methods are applied to detect differences between corpus (sub )registers, exemplified by the press editorials sections in the British, Canadian and Jamaican components of the International Corpus of English. By design, these methods are apt to target differences between varieties that are represented by putatively comparable corpus material, but it turns out that many of the observed differences can in fact be laid at the door of different sampling strategies applied by corpus compilers. In the example at hand, contrasts can be traced back to the division into institutional and personal editorials. This finding gives rise to a call for a higher granularity of sampling schemes, richer metadata (e.g. on the situational characteristics of the language samples included), and better documentation. As for the methods chosen, the author demonstrates that corpus-driven profiling based either on POS monograms or on higher-level multi-dimensional analysis performs reasonably well, with smaller differences in robustness and computational expense.
Keywords
- Type
- Chapter
- Information
- Data and Methods in Corpus LinguisticsComparative Approaches, pp. 75 - 100Publisher: Cambridge University PressPrint publication year: 2022
References
Further Reading
References
- 1
- Cited by