Hostname: page-component-78c5997874-lj6df Total loading time: 0 Render date: 2024-11-10T14:59:49.745Z Has data issue: false hasContentIssue false

Generating example contexts to help children learn word meaning

Published online by Cambridge University Press:  12 January 2012

LIU LIU
Affiliation:
Google Pittsburgh, 6425 Penn Ave. Suite 700., Pittsburgh, PA 15206, USA e-mail: liuliu@google.com
JACK MOSTOW
Affiliation:
Project LISTEN, School of Computer Science, Carnegie Mellon University, RI-NSH 4103, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA e-mail: mostow@cs.cmu.edu
GREGORY S. AIST
Affiliation:
Applied Linguistics and Communication Studies, Iowa State University, 206 Ross Hall, Ames, IA 50011, USA e-mail: gregory.aist@alumni.cmu.edu

Abstract

This article addresses the problem of generating good example contexts to help children learn vocabulary. We describe VEGEMATIC, a system that constructs such contexts by concatenating overlapping five-grams from Google's N-gram corpus. We propose and operationalize a set of constraints to identify good contexts. VEGEMATIC uses these constraints to filter, cluster, score, and select example contexts. An evaluation experiment compared the resulting contexts against human-authored example contexts (e.g., from children's dictionaries and children's stories). Based on rating by an expert blind to source, their average quality was comparable to story sentences, though not as good as dictionary examples. A second experiment measured the percentage of generated contexts rated by lay judges as acceptable, and how long it took to rate them. They accepted only 28% of the examples, but averaged only 27 seconds to find the first acceptable example for each target word. This result suggests that hand-vetting VEGEMATIC's output may supply example contexts faster than creating them manually.

Type
Articles
Copyright
Copyright © Cambridge University Press 2012

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

This work, performed while the first author was a Master's student in the Language Technologies Institute at Carnegie Mellon University, was supported by the Institute of Education Sciences, US Department of Education, through Grant R305A080157 to Carnegie Mellon University. The opinions expressed are those of the authors and do not necessarily represent the views of the Institute or the US Department of Education. We thank Dr. Margaret McKeown for her expertise and assistance, and our lay judges for their participation.

References

Aist, G. 2001. Towards automatic glossarization: automatically constructing and administering vocabulary assistance factoids and multiple-choice assessment. International Journal of Artificial Intelligence in Education 12: 212–31.Google Scholar
Aist, G. 2002. Helping children learn vocabulary during computer-assisted oral reading. Educational Technology and Society 5 (2): 147–63. http://ifets.ieee.org/periodical/vol_2_2002/aist.htmlGoogle Scholar
Beck, I. L., McKeown, M. G., and Kucan, L. 2002. Bringing Words to Life: Robust Vocabulary Instruction. New York: Guilford.Google Scholar
Beck, I. L., McKeown, M. G., and McCaslin, E. S. 1983. Vocabulary development: all contexts are not created equal. Elementary School Journal 83: 177–81.CrossRefGoogle Scholar
Biemiller, A. 2009. Words Worth Teaching: Closing the Vocabulary Gap. Columbus, OH: SRA/McGraw-Hill.Google Scholar
Bolger, D. J., Balass, M., Landen, E. and Perfetti, C. A. 2008. Contextual variation and definitions in learning the meanings of words: an instance-based learning approach. Discourse Processes 45 (2): 122–59.CrossRefGoogle Scholar
Brants, T. and Franz, A. 2006. Web 1T 5-gram Version 1. Philadelphia, PA: Linguistic Data Consortium.Google Scholar
Brown, J. C., Frishkoff, G. A. and Eskenazi, M. 2005. Automatic question generation for vocabulary assessment. In Proceedings of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Vancouver, BC, Canada (October 6–8), pp. 819–26. Stroudsburg, PA: Association for Computational Linguistics.Google Scholar
Carlson, A. and Fette, I. 2007. Memory-based context-sensitive spelling correction at web scale. In Proceedings of the Sixth International Conference on Machine Learning and Applications (ICMLA'07), Cincinnati, OH (December 13–15), pp. 166–71. Washington, DC: IEEE Computer Society.Google Scholar
Carbonell, J. and Goldstein, J. 1998. The use of MMR diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia (August 24–28), pp. 335–6. New York, NY: ACM.CrossRefGoogle Scholar
Carbonell, J., Klein, S., Miller, D., Steinbaum, M., Grassiany, T., and Frey, J. 2006. Context-based machine translation. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas (AMTA 2006), Cambridge, Massachusetts, USA (August 8–12).Google Scholar
Chandler, D. 2004. Semiotics: The Basics, 2nd ed.New York: Routledge.Google Scholar
Church, K. W. and Hanks, P. 1990. Word association norms, mutual information, and lexicography. Computational Linguistics 16 (1): 22–9.Google Scholar
Dowding, J., Aist, G., Hockey, B. A. and Bratt, E. O. 2003. Generating canonical example sentences using candidate words. In Working Papers of the 2003 AAAI Spring Symposium on Natural Language Generation in Spoken and Written Dialogue, Palo Alto, California, USA (March 24–26), pp. 23–7. Menlo Park, CA: AAAI Press.Google Scholar
Durme, B. V., Qian, T. and Schubert, L. 2008. Class-driven attribute extraction. In 22nd International Conference on Computational Linguistics (COLING 2008), Manchester, UK (August 18–22), pp. 921–8. Stroudsburg, PA: Association for Computational Linguistics.CrossRefGoogle Scholar
Fellbaum, C. 1998. WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press.CrossRefGoogle Scholar
Finkel, J. R., Grenager, T. and Manning, C. 2005. Incorporating non-local information into information extraction systems by Gibbs sampling. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Ann Arbor, MI, USA, pp. 363–70. Stroudsburg, PA: Association for Computational Linguistics.Google Scholar
Fukkink, R. G., Blok, H. and Glopper, K. D. 2001. Deriving word meaning from written context: a multicomponential skill. Language Learning 51 (3): 477–96.CrossRefGoogle Scholar
Heiner, C., Beck, J. E. and Mostow, J. 2006. Automated vocabulary instruction in a reading tutor. In Proceedings of the 8th International Conference on Intelligent Tutoring Systems, LN CS, vol. 4053, Jhongli, Taiwan (June 26–30), pp. 741–3. Berlin: Springer Verlag.CrossRefGoogle Scholar
Herman, P. A., Anderson, R. C., Pearson, P. D. and Nagy, W. E. 1987. Incidental acquisition of word meaning from expositions with varied text features. Reading Research Quarterly 22 (3): 263–84.CrossRefGoogle Scholar
Hermjakob, U., Knight, K. and Iii, H. D. 2008. Name translation in statistical machine translation – learning when to transliterate. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Columbus, OH, USA, pp. 389–97. Stroudsburg, PA: Association for Computational Linguistics.Google Scholar
Jenkins, J. R., Stein, M. and Wysocki, K. 1984. Learning vocaulary through reading. American Educational Research Journal 21: 767–87.Google Scholar
Kuhn, M. R. and Stahl, S. A. 1998. Teaching children to learn word meaning from context: a synthesis and some questions. Journal of Literacy Research 30 (1): 119–38.CrossRefGoogle Scholar
Landis, J. R. and Koth, G. G. 1977. The measurement of observer agreement for categorical data. Biometrics 33 (1): 159–74.CrossRefGoogle ScholarPubMed
Liu, C.-L., Wang, C.-H., Gao, Z.-M., and Huang, S.-M. 2005. Applications of lexical information for algorithmically composing multiple-choice cloze items. In Proceedings of the Second Workshop on Building Educational Applications Using NLP, Ann Arbor, MI, USA (June 29), pp. 18. Stroudsburg, PA: Association for Computational Linguistics.Google Scholar
McKeown, M. G. 1985. The acquisition of word meaning from context by children of high and low ability. Reading Research Quarterly 20: 482–96.CrossRefGoogle Scholar
Mostow, J. 1983. Machine transformation of advice into a heuristic search procedure. In Michalski, R. S., Carbonell, J. G., and Mitchell, T. M. (eds.), Machine Learning, pp. 367403. Palo Alto, CA: Tioga.Google Scholar
Mostow, J. and Aist, G. S. 1999. Giving help and praise in a reading tutor with imperfect listening – because automated speech recognition means never being able to say you're certain. CALICO Journal 16 (3): 407–24.CrossRefGoogle Scholar
Mostow, J. and Duan, W. 2011. Generating example contexts to illustrate a target word sense. In Proceedings of the 6th Workshop on Innovative Use of NLP for Building Educational Applications, Portland, OR, USA (June 24), pp. 105–10. Stroudsburg, PA: Association for Computational Linguistics.Google Scholar
Nagy, W. E., Anderson, R. C. and Herman, P. A. 1987. Learning word meanings from context during normal reading. American Educational Research Journal 24 (2): 237–70.CrossRefGoogle Scholar
Nagy, W. E., Herman, P. A. and Anderson, R. C. 1985. Learning words from context. Reading Research Quarterly 20 (2): 233–53.CrossRefGoogle Scholar
National Reading Panel 2000. Report of the National Reading Panel. Teaching children to read: an evidence-based assessment of the scientific research literature on reading and its implications for reading instruction (publication no. 00-4769). National Institute of Child Health & Human Development, Washington, DC. www.nichd.nih.gov/publications/nrppubskey.cfmGoogle Scholar
Oberlander, J., Karakatsiotis, G., Isard, A. and Androutsopoulos, I. 2008. Building an adaptive museum gallery in Second Life. In Proceedings of Museums and the Web: The International Conference for Culture and Heritage On-line, Montréal, Québec, Canada (April 9–12), pp. 749–53.Google Scholar
Paynter, D. E., Bodrova, E. and Doty, J. K. 2005. For the Love of Words: Vocabulary Instruction that Works, Grades K-6. San Francisco: Jossey-Bass.Google Scholar
Pino, J., Heilman, M. and Eskenazi, M. 2008. A selection strategy to improve cloze question quality. In Proceedings of the Workshop on Intelligent Tutoring Systems for Ill-Defined Domains. 9th International Conference on Intelligent Tutoring Systems, Montreal, Canada (June 23), pp. 2234.Google Scholar
Reiter, E. and Dale, R. 2000. Building Natural Language Generation Systems. Cambridge, UK: Cambridge University Press.CrossRefGoogle Scholar
Schatz, E. K. and Baldwin, R. S. 1986. Context clues are unreliable predictors of word meanings. Reading Research Quarterly 21: 439–53.CrossRefGoogle Scholar
Schwanenflugel, P. J., Stahl, S. A. and McFalls, E. L. 1997. Partial word knowledge and vocabulary growth during reading comprehension. Journal of Literacy Research 29 (4): 531–53.CrossRefGoogle Scholar
Sleator, D. and Temperley, D. 1993. Parsing English with a link grammar. Proceedings of the Third International Workshop on Parsing Technologies, Tilburg, Netherlands (August 10–13).Google Scholar
Stanovich, K., West, R. and Cunningham, A. E. 1991. Beyond phonological processes: print exposure and orthographic processing. In Brady, S. and Shankweiler, D. (eds.), Phonological Processes in Literacy. pp. 219235. Hillsdale, NJ: Lawrence Erlbaum.Google Scholar
Toutanova, K., Klein, D., Manning, C. and Singer, Y. 2003. Feature-rich part-of-speech tagging with a cyclic dependency network. In Proceedings of the Human Language Technology Conference (HLT-NAACL 2003), Edmonton, Canada (May 27–June 1), pp. 252259. Stroudsburg, PA: Association for Computational Linguistics.Google Scholar
Toutanova, K. and Manning, C. D. 2000. Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-2000), Hong Kong, pp. 6370. Stroudsburg, PA: Association for Computational Linguistics.Google Scholar
Yu, L.-C., Wu, C.-H., Philpot, A., and Hovy, E. 2007. OntoNotes: sense pool verification using Google N-gram and statistical tests. In Proceedings of the OntoLex Workshop at the 6th International Semantic Web Conference (ISWC 2007) (November 11), Busan, Korea.Google Scholar