We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In Chapter 4, the author introduces the concept of validity. The chapter begins with an exploration of approaches to defining a construct. These approaches include using language theory, a language needs analysis, corpora, and curriculum objectives to help test developers determine what specific language ability they desire to measure. The chapter emphasizes the importance of alignment, which relates to how well the test content and test taker response processes match the construct’s content and the response processes that the test aims to measure. The author uses a detailed example of assessing children’s ability to communicate on a playground in a second language. The major point of the example is that the assessment should require children to use the same kinds of language they use when they communicate on the playground. This alignment helps ensure that the assessment measures the targeted language ability and will lead to positive washback on teaching and learning.
Far from being cut-down versions of the adult form, children’s dictionaries constitute a distinct genre with their own history and methodology. The chapter charts the development of children’s dictionaries, from Renaissance bilingual dictionaries to the present day, showing how they have evolved to reflect changing perceptions of childhood. It discusses the bewildering range of dictionaries now available for children as they progress from ABCs and picture dictionaries to those for school use and creative writing, including innovative subgenres based on fictional worlds and dictionaries supporting language revitalisation. Drawing on historical and contemporary examples, the chapter explores content and page design adapted to engage young readers. It considers how lexicographers aim to reflect the world as experienced by children, from the selection of headwords to the framing of definitions, using dedicated corpora and reading programmes. The tension between descriptive and prescriptive approaches is often acute in children’s dictionaries, for example over the inclusion of slang and taboo words, and lexicographers aim to balance young dictionary users’ needs against adult perceptions of what a children’s dictionary is for.
This chapter starts by exploring basic questions related to the learning of second language vocabulary such as What is a word? and What does it mean to know a word? It discusses form–meaning mapping as well as a word’s grammatical features and its collocations. The chapter focuses on different types of vocabulary knowledge including receptive knowledge, productive knowledge, breath and depth of knowledge, and knowledge related to multi-word units. The chapter refers to corpora as a way to understand how language functions in the real world. It also discusses different ways of learning new words, that is, incidental and intentional learning. The chapter moves onto issues related to the teaching of vocabulary, starting with explicit instruction involving memorization, and then moving on to more implicit activities (e.g., extensive reading). More specific techniques are reviewed such as glossing, corpus-based instruction, form-focused instruction, and strategy instruction.
The chapter gives a state-of- the-art overview of the themes and issues in corpus pragmatics and describes new directions in the field represented by empirical corpus studies where synchronic pragmatic variation and change are analyzed in a broader social and cultural perspective. The interaction between corpora and pragmatics implies both challenges and possibilities. Corpora are ideal for studying the relationship between form and function.This is illustrated by studies using corpora for the purpose of investigating the functions of pragmatic markers, interjections, address forms, and pauses. Nowadays there is also a great deal of interest in finding strategies, making it possible to study the linguistic realisations of functions such as speech acts, hedging, and politeness. Pragmatic annotation systems are expected to be interesting from this perspective. New developments in corpus pragmatics are characterized by alliances between corpus pragmatics and other fields such as variational pragmatics and sociopragmatics with a shared interest in the influence of context on language. Pragmatic markers are, for example, now studied on the basis of corpora with respect to macro-sociolinguistic variables such as region, genre, and the age, gender, and social class of the speakers. Attention is also given to a new discipline of historical corpus pragmatics emerging at the intersection between historical linguistics, pragmatics, and corpus linguistics.
This chapter formulates some relatively new lines of enquiry for research in historical orthography, which stem from the concept of a community of practice. The authors propose the idea that communities of practice represent a key bridge across material which inevitably stimulates divergent research interests in the field. They suggest that communities of book producers in England and the Low Countries were not self-standing entities, but were engaged in more or less loose, professional and social interactions, forming networks of practice. The respective histories of English and Dutch had some fundamental similarities with reference to early book production and local organization, and there were links existing even between those working on manuscripts and printed material. This chapter provides useful background information on early book production and large-scale professional networks, with a view to inspiring future researchers to explore the intricate correlation between professional organization, culture and society in the complex framework of early modern Europe.
This chapter begins with a general discussion of potential data types in variationist linguistics. Next, we present the two main data sources we use in the study: the International Corpus of English (ICE) and the Global Corpus of Web-Based English (GloWbE). The former comprises a set of parallel, balanced corpora representative of language usage across a wide range of standard national varieties. Each ICE corpus contains 500 texts of 2000 words each, sampled from twelve spoken and written genres/registers, totaling approx. 1 million words. GloWbE contains data collected from 1.8 million English language websites – both blogs and general web pages – from twenty different countries (approx. 1.8 billion words in all). Discussion of the corpora is followed by a detailed description of the data collection, identification, and annotation procedures for our three alternations. Here we carefully define the variable context for each alternation, and outline the methods for coding various linguistic constraints that are included in our analyses.
Variation studies is an increasingly popular area in linguistics, becoming embedded in curriculum design, conferences, and research. However, the field is at risk of fragmenting into different research communities with different foci. This pioneering book addresses this by establishing a canon of state-of-the-art quantitative methods to analyze grammatical variation from a comparative perspective. It explains how to use these methods to investigate large datasets in a responsible fashion, providing a blueprint for applying techniques from corpus linguistics, variationist, and dialectometric traditions in novel ways. It specifically explores the scope and limits of syntactic variability in a global language such as English, and investigates three grammatical alternations in nine varieties of English, exploring what we can learn about the grammatical choices that people make based on both observational and experimental data. Comprehensive yet accessible, it will be of interest to academic researchers and students of sociolinguistics, corpus linguistics, and World Englishes.
Bilinguals experience processing costs when comprehending code-switches, yet the magnitude of the cost fluctuates depending on numerous factors. We tested whether switch costs vary based on the frequency of different types of code-switches, as estimated from natural corpora of bilingual speech and text. Spanish–English bilinguals in the U.S. read single-language and code-switched sentences in a self-paced task. Sentence regions containing code-switches were read more slowly than single-language control regions, consistent with the idea that integrating a code-switch poses a processing challenge. Crucially, more frequent code-switches elicited significantly smaller costs both within and across most classes of switch types (e.g., within verb phrases and when comparing switches at verb-phrase and noun-phrase sites). The results suggest that, in addition to learning distributions of syntactic and semantic patterns, bilinguals develop finely tuned expectations about code-switching behavior – representing one reason why code-switching in naturalistic contexts may not be particularly costly.
This chapter presents some of the most significant studies in the history of intercultural pragmatics (IP) research that have applied the methodology of corpus pragmatics (CP). In fact, the use of corpora has been an essential contribution to IP in crucial areas such as formulaic language, context and common ground, or politeness research, among others, with the conviction that CP has redefined the conceptualization of pragmatic competence in a globalized world. The chapter follows a topical structure in which critical areas of research from an intercultural and corpus pragmatic perspective are addressed, like the role of the lingua franca; the use of academic, professional, and scientific language; cross-cultural studies; prosody, multimodality, and computer-mediated communication and learner's corpora. In all these areas, the chapter highlights the significant research concerns and achievements that have helped to shape IP as an essential discipline in current linguistic theory. A final section with conclusions and ideas for further research will ensue.
No religious tradition or country seems to be unequivocally, inherently free from the threat of extremism. As a result of domestic and international acts of terrorism, much of the world seems occupied with the views and actions of Muslims, calling particular attention to the Salafi sect. Some groups belonging to this sect disseminate and promulgate their views through online periodicals, in order to solidify their ideological base and recruit new members. In particular, this chapter relates secular and non-secular characterizations of √KFR – the Arabic triliteral root referring to disbelievers and states of disbelief – to the characterization espoused in electronic periodicals from al-Qa’ida and Da’esh. Over one thousand tokens of derived lexemes of √KFR are extracted using AntConc from thirty issues, reduced to a taxonomy, and examined through the discursive strategies utilized.
Recent developments in the experimental syntax program have challenged some of the standard practices for collecting and analyzing linguistic evidence. In doing so, the methodological and theoretical gap between other areas of language science has begun to close. It is more common than ever before for research in theoretical syntax to incorporate multiple methodologies in the same study. Online elicitation methods, adopted from psycholinguistics, have been the most visible new addition to the theoretical syntactician’s toolbox. Yet observational data, in the form of corpora, has begun to play a larger role in contemporary syntactic investigation. The aim of this chapter is to contextualize the evolving role of corpus studies in syntactic investigation as a methodology that can be used to externally validate results from other methods as well as generate hypotheses. I highlight theoretical and practical advantages of employing corpora in tandem with other methods and point to future directions where gains can still be made.
Chapter 5 describes the fundamental research questions, empirical approaches and findings of corpus linguistics. Basically, it is an empirical approach investigating language use in its natural context with different types of corpora as its data base. Methodological issues include considerations on corpus linguistic approaches, types and criteria of corpora, steps of corpus analysis, such as tokenisation and tagging, and finally types of analysis. The chapter ends with recommendations for further reading and a list of short exercises and ideas for small research projects.
In this chapter, I discuss the main experimental issues bearing on thecomprehension of ISAs. A first question concerns differences in processingbetween direct and indirect SAs, and a second question relates to thedifferences between the indirect uses and the literal/direct uses of ISAconstructions. Another issue is whether the understanding of an utterance asan ISA necessarily implies the derivation of the direct meaning of theconstruction used, and what properties of the construction make such adirect meaning more or less likely to be inferred.
This chapter’s main focus is on fluency research ‘in the wild’, particularly looking at the challenges of developing fluency during immersion in the target language setting, e.g. during Study Abroad. The chapter includes the need for research and practice to move away from standard monolingual native speaker norms, towards use of L2 or multilingual raters as reference norms for evaluating fluency development. We refer to cross-linguistic work on fluency in languages other than English, to see how learners’ and teachers’ expectations can be more realistically framed to fit social contexts and task demands. We include evidence from learner corpora across a variety of languages, which could help develop more robust cross-linguistic theories, methods and evidence of fluency development from a wider multilingual interactional perspective. The final section explores these themes in the context of fluency development through residence abroad, even over short periods such as Study Abroad; evidence is presented from a recent case study of learners of Mandarin Chinese within a more nuanced view of specific task constraints, to highlight the varied nature of fluency development.
This chapter discusses the current role of natural language processing in lexicography, and considers how this might change in the future. It first considers the shared history of natural language processing and lexicography with respect to statistical methods. It then discusses how natural language processing is applied to pre-process corpora to support lexicographic analysis, identify collocations in corpora, automatically construct thesauri, and select good dictionary examples. It also discusses the natural language processing tasks of word sense disambiguation and induction and their relationship to lexicography, and very recent neural network-based methods for automatically generating definitions. It concludes by discussing specialised types of dictionaries that can currently be automatically constructed, and considers whether dictionary construction could ever be fully automated.
Every reputable new English dictionary, or new edition, published since 1987 has made use of a large collection of text, or ‘corpus’, for evidence of a word’s usage. In this chapter, one of the most eminent names in lexicography, Patrick Hanks, takes the reader on a journey to discover more about different kinds of dictionaries and corpora, and basic principles of corpus linguistics and lexicography. He outlines how dictionaries have made use of corpus evidence in the past, and proposes how they might make better use of them in the future.
The chapter critically examines the development of corpora that is being driven at the University of KwaZulu-Natal as one of the key agents of language intellectualisation. The chapter critically evaluates the architecture of the two types of corpora. The first corpus is the isiZulu National Corpus (INC). The INC is an organic corpus of 30 million tokens. It is designed as a monitor corpus, and an important precursor to the development of isiZulu human language technologies. It will be evinced that crucial to the development of the isiZulu spellchecker is the INC, which was used to train the checker. The second type of corpus is an English-IsiZulu Parallel Corpus (EIPC), with a modest size of fifty e-files of each natural language. A parallel corpus is a collection of identical texts in two natural languages, processed and stored in machine-readable format. The EIPC is crucial in the development of automated machine translations between English and isiZulu. The development of a machine translation tool using computational processes requires a parallel corpus such as EIPC as an agent and follows the tenets of the Data-Driven Machine Translation (DDMT) approach. The chapter outlines the imperative to develop both the INC and the EIPC. The chapter further shows that the two corpora are key components in the intellectualisation of isiZulu as a digital, scientific, natural language.
Chapter 4 discusses corpus linguistics and how electronic corpora have informed vocabulary studies. The main insights have been in the areas of frequency and phraseology. The chapter includes an extended discussion of formulaic language, which has been shown to be a major component of vocabulary knowledge.
Chapter 2 provides a historical perspective to vocabulary teaching. It covers how vocabulary was addressed in methodologies ranging from Grammar-Translation to communicative language teaching. It then looks at how vocabulary testing has developed over the last 100 years.
With the increasing recognition of the pedagogical applications of corpus linguistics, there has been a growing interest in developing teachers’ corpus literacy to popularize the use of corpora in language education. This longitudinal study investigated Arab Gulf EFL student teachers’ immediate and long-term responses to corpus literacy instruction. After teaching a corpus literacy component to two classes of student teachers in a graduate computer-assisted language learning course they attended, the author collected focus group data about their views on this instruction and their own expected future uses of corpora in language learning, teaching and research. Two years later, a group of these student teachers (n = 19) responded to a follow-up questionnaire exploring their beliefs about corpus literacy integration and their multiple uses of corpora. The student teachers reported very positive immediate and long-term perceptions of corpus literacy instruction, but it was found that such instruction has not brought about all the desired changes in their long-term uses of online corpora as a linguistic and pedagogical resource, or their attitudes towards doing corpus-based TESOL research. However, it is expected that the popularization benefits gained from corpus literacy integration could lead to better future developments in using corpora for language education and research purposes in the target context.