Hostname: page-component-745bb68f8f-b95js Total loading time: 0 Render date: 2025-01-14T04:31:38.943Z Has data issue: false hasContentIssue false

Utilizing lexical data from a Web-derived corpus to expand productive collocation knowledge

Published online by Cambridge University Press:  01 January 2010

Shaoqun Wu*
Affiliation:
Computer Science Department, University of Waikato, New Zealand (email: shaoqun@cs.waikato.ac.nz; ihw@cs.waikato.ac.nz)
Ian H. Witten*
Affiliation:
Computer Science Department, University of Waikato, New Zealand (email: shaoqun@cs.waikato.ac.nz; ihw@cs.waikato.ac.nz)
Margaret Franken*
Affiliation:
School of Education, University of Waikato, New Zealand (email: franken@waikato.ac.nz)

Abstract

Collocations are of great importance for second language learners, and a learner’s knowledge of them plays a key role in producing language fluently (Nation, 2001: 323). In this article we describe and evaluate an innovative system that uses a Web-derived corpus and digital library software to produce a vast concordance and present it in a way that helps students use collocations more effectively in their writing. Instead of live search we use an off-line corpus of short sequences of words, along with their frequencies. They are preprocessed, filtered, and organized into a searchable digital library collection containing 380 million five-word sequences drawn from a vocabulary of 145,000 words. Although the phrases are short, learners can browse more extended contexts because the system automatically locates sample sentences that contain them, either on the Web or in the British National Corpus. Two evaluations were conducted: an expert user tested the system to see if it could generate suitable alternatives for given text fragments, and students used it for a particular exercise. Both suggest that, even within the constraints of a limited study, the system could and did help students improve their writing.

Type
Research Article
Copyright
Copyright © European Association for Computer Assisted Language Learning 2010

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Benson, M., Benson, E.Ilsen, R. F. (1986) The BBI combinatory dictionary of English: A guide to word combinations. Amsterdam/Philadelphia: John Benjamins.CrossRefGoogle Scholar
Bishop, H. (2004) The effect of typographic salience on the look up and comprehension of unknown formulaic sequences. In: Schmidt, N. (ed.), Formulaic sequences: Acquisition, processing, and use. Philadelphia, PA, USA: John Benjamins Publishing Company, 227244.CrossRefGoogle Scholar
Chambers, A.O’Sullivan, Í. (2004) Corpus consultation and advanced learners’ writing skills in French. ReCALL, 16(1): 158172.CrossRefGoogle Scholar
Charles, M. (2007) Reconciling top-down and bottom-up approaches to graduate writing: Using a corpus to teach rhetorical functions. Journal of English for Academic Purposes, 6(4): 289302.CrossRefGoogle Scholar
Cobb, T. (n.d.) Compleat Lexical Tutor. http://www.lextutor.ca/Google Scholar
Cobuild Concordance and Collocations Sampler. http://www.collins.co.uk/Corpus/CorpusSearch.aspxGoogle Scholar
Fuentes, C. A. (2003) The use of corpora and IT in a comparative evaluation approach to oral business English. ReCALL, 15(2): 189201.CrossRefGoogle Scholar
Gabrielatos, C. (2005) Corpora and language teaching: Just a fling or wedding bells? Teaching English as a second or foreign language, 8(4), http://tesl-ej.org.ezproxy.waikato.ac.nz/ej32/a1.htmlGoogle Scholar
Greenstone Digital Library Software. http://www.greenstone.orgGoogle Scholar
Guo, S.Zhang, G. (2007) Building a customized Google-based collocation collection to enhance language learning. British Journal of Educational Technology, 38(4): 747750.CrossRefGoogle Scholar
Hulstijn, J. H.Laufer, B. (2001) Some empirical evidence for the involvement load hypothesis in vocabulary learning. Language Learning, 51: 539558.CrossRefGoogle Scholar
International English Language Testing System (IELTS) (1997) Specimen materials handbook. http://www.scribd.com/doc/13570277/Google Scholar
Keating, G. D. (2008) Task effectiveness and word learning in a second language: The involvement hypothesis on trial. Language Teaching Research, 12(3): 365386.CrossRefGoogle Scholar
Kilgariff, A.Grefenstette, G. (2003) Introduction to the social issue on the web as corpus. Computational Linguistics, 29(3): 333347.CrossRefGoogle Scholar
Nagy, W. E. (1997) On the role of context in first- and second-language vocabulary learning. In: Schmitt, N. and McCarthy, M. (eds.), Vocabulary description, acquisition and pedagogy. Cambridge: Cambridge University Press, 6483.Google Scholar
Nation, P. (2001) Learning vocabulary in another language. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Nesselhauf, N. (2003) The use of collocations by advanced learners of English and some implications for teaching. Applied Linguistics, 24(2): 223242.CrossRefGoogle Scholar
O’Sullivan, Í.Chambers, A. (2006) Learners’ writing skills in French: Corpus consultation and learner evaluation. Journal of Second Language Writing, 15: 4968.CrossRefGoogle Scholar
Peachey, N . (2005) Concordancers in ELT. In: British Council teaching English. http://www.teachingenglish.org.uk/think/articles/concordancers-eltGoogle Scholar
Renouf, A., Kehoe, A.Banergee, W. (2007) WebCorp: An integrated system for web text search. In: Nesselhauf, C., Hundt, M. and Biewer, C. (eds.), Corpus linguistics and the web. Amsterdam: Rodopi, 4768.Google Scholar
Rundell, M. (2000) The biggest corpus of all. Humanising Language Teaching, 2(3): http://www.hltmag.co.uk/may00/idea.htmGoogle Scholar
Shei, C. C. (2008) Discovering the hidden treasure on the Internet: using Google to uncover the veil of phraseology. Computer Assisted Language Learning, 21(1): 6785.CrossRefGoogle Scholar
Stubbs, M.Barth, I. (2003) Using recurrent phrases as text-type discriminators: A quantitative method and some findings. Functions of Language, 10(1): 61104.CrossRefGoogle Scholar
Wei, Y. (1999) Teaching collocations for productive vocabulary development. (Report No. FL 026913). Developmental Skills Department, Borough of Manhattan Community College, City University of New York. (ERIC Document Reproduction Service No. ED457690).Google Scholar
Widdowson, H. G. (2000) On the limitations of linguistics applied. Applied Linguistics, 21(1): 325.CrossRefGoogle Scholar
Wu, S., Franken, M.Witten, I. H. (2009) Refining the use of the web (and web search) as a language teaching and learning resource. Computer Assisted Language Learning, 22(3): 249268.CrossRefGoogle Scholar