Hostname: page-component-cd9895bd7-hc48f Total loading time: 0 Render date: 2024-12-27T13:03:27.844Z Has data issue: false hasContentIssue false

Recognizing entailment in intelligent tutoring systems*

Published online by Cambridge University Press:  16 September 2009

RODNEY D. NIELSEN
Affiliation:
Boulder Language Technologies, 2960 Center Green Ct, Boulder, CO 80301, USA Department of Computer Science, Institute of Cognitive Science and The Center for Computational Language and Education Research, University of Colorado, Campus Box 594, Boulder, CO 80309-0594, USA e-mails: Rodney.Nielsen@Colorado.edu, Wayne.Ward@Colorado.edu, James.Martin@Colorado.edu
WAYNE WARD
Affiliation:
Boulder Language Technologies, 2960 Center Green Ct, Boulder, CO 80301, USA Department of Computer Science, Institute of Cognitive Science and The Center for Computational Language and Education Research, University of Colorado, Campus Box 594, Boulder, CO 80309-0594, USA e-mails: Rodney.Nielsen@Colorado.edu, Wayne.Ward@Colorado.edu, James.Martin@Colorado.edu
JAMES H. MARTIN
Affiliation:
Department of Computer Science, Institute of Cognitive Science and The Center for Computational Language and Education Research, University of Colorado, Campus Box 594, Boulder, CO 80309-0594, USA e-mails: Rodney.Nielsen@Colorado.edu, Wayne.Ward@Colorado.edu, James.Martin@Colorado.edu

Abstract

This paper describes a new method for recognizing whether a student's response to an automated tutor's question entails that they understand the concepts being taught. We demonstrate the need for a finer-grained analysis of answers than is supported by current tutoring systems or entailment databases and describe a new representation for reference answers that addresses these issues, breaking them into detailed facets and annotating their entailment relationships to the student's answer more precisely. Human annotation at this detailed level still results in substantial interannotator agreement (86.2%), with a kappa statistic of 0.728. We also present our current efforts to automatically assess student answers, which involves training machine learning classifiers on features extracted from dependency parses of the reference answer and student's response and features derived from domain-independent lexical statistics. Our system's performance, as high as 75.5% accuracy within domain and 68.8% out of domain, is very encouraging and confirms the approach is feasible. Another significant contribution of this work is that it represents a significant step in the direction of providing domain-independent semantic assessment of answers. No prior work in the area of tutoring or educational assessment has attempted to build such domain-independent systems. They have virtually all required hundreds of examples of learner answers for each new question in order to train aspects of their systems or to hand-craft information extraction templates.

Type
Papers
Copyright
Copyright © Cambridge University Press 2009

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Agichtein, E., and Gravano, L. 2000. Snowball: extracting relations from large plain-text collections. In Proceedings of the 5th ACM ICDL, Kyoto, Japan.Google Scholar
Aleven, V., Popescu, O., and Koedinger, K. R. 2001. A tutorial dialogue system with knowledge-based understanding and classification of student explanations. In IJCAI Workshop on Knowledge and Reasoning in Practical Dialogue Systems, Seattle, WA.Google Scholar
Bar-Haim, R., Szpektor, I., and Glickman, O. 2005. Definition and analysis of intermediate entailment levels. In Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment, Ann Arbor, MI.Google Scholar
Barzilay, R., and Lee, L. 2003. Learning to paraphrase: an unsupervised approach using multiple-sequence alignment. In Proceedings of the HLT-NAACL, Edmonton, Canada, pp. 1623.Google Scholar
Barzilay, R., and McKeown, K. 2001. Extracting paraphrases from a parallel corpus. In Proceedings of the ACL/EACL, Toulouse, France, pp. 50–7.Google Scholar
Braz, R. S., Girju, R., Punyakanok, V., Roth, D., and Sammons, M. 2005. An inference model for semantic entailment in natural language. In Proceedings of the PASCAL Recognizing Textual Entailment Challenge Workshop, Southampton, UK.Google Scholar
Burger, J., and Ferro, L. 2005. Generating an entailment corpus from news headlines. In Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment, Ann Arbor, MI, pp. 4954.CrossRefGoogle Scholar
Callear, D., Jerrams-Smith, J., and Soh, V. 2001. CAA of short non-MCQ answers. In Proceedings of the 5th International CAA Conference, Loughborough.Google Scholar
Cohen, J. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20: 3746.CrossRefGoogle Scholar
Dagan, I., Glickman, O., and Magnini, B. 2005. The PASCAL Recognizing Textual Entailment Challenge. In Proceedings of the PASCAL RTE Challenge Workshop, Southampton, UK.Google Scholar
Dolan, W. B., Quirk, C., and Brockett, C. 2004. Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources. In Proceedings of COLING 2004, Geneva, Switzerland.Google Scholar
Giampiccolo, D., Magnini, B., Dagan, I., and Dolan, B. 2007. The Third PASCAL Recognizing Textual Entailment Challenge. In Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, Prague, Czech Republic.Google Scholar
Gildea, D., and Jurafsky, D. 2002. Automatic labeling of semantic roles. Computational Linguistics 28 (3): 245–88.Google Scholar
Glickman, O., and Dagan, I, 2003. Identifying lexical paraphrases from a single corpus: a case study for verbs. In Proceedings of RANLP, Borovets, Bulgaria.Google Scholar
Glickman, O., Dagan, I., and Koppel, M. 2005. Web based probabilistic textual entailment. In Proceedings of the PASCAL RTE Challenge Workshop, Southampton, UK.Google Scholar
Graesser, A. C., Hu, X., Susarla, S., Harter, D., Person, N. K., Louwerse, M., Olde, B., and the Tutoring Research Group. 2001. AutoTutor: an intelligent tutor and conversational tutoring scaffold. In Proceedings of the 10th International Conference of Artificial Intelligence in Education, San Antonio, TX, pp. 47–9.Google Scholar
Grice, H. P. 1975. Logic and conversation. In Cole, P. and Morgan, J. (eds.), Syntax and Semantics, Vol 3, Speech Acts, 4358. Academic Press, New York.Google Scholar
Hickl, A., and Bensley, J. 2007. A discourse commitment-based framework for recognizing textual entailment. In Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, Southampton, UK.Google Scholar
Kipper, K., Dang, H. T., and Palmer, M. 2000. Class-based construction of a verb lexicon. In AAAI Seventeenth National Conference on Artificial Intelligence, Austin, TX.Google Scholar
Landauer, T. K., and Dumais, S. T. 1997. A solution to Plato's problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Journal of Psychological Review 104 (2): 211240.Google Scholar
Lawrence Hall of Science. 2005. Full Option Science System (FOSS). Nashua, NH: University of California at Berkeley, Delta Education.Google Scholar
Leacock, C., and Chodorow, M. 2003. C-rater: automated scoring of short-answer questions. Computers and the Humanities 37 (4): 389405.CrossRefGoogle Scholar
Lin, D., and Pantel, P. 2001. Discovery of inference rules for question answering. Natural Language Engineering 7 (4): 343–60.CrossRefGoogle Scholar
Long, K., Malone, L., and De Lucchi, L. 2008. Assessing science knowledge: Seeing more through the formative assessment lens. In Coffey, J., Douglas, R. and Stearns, C. (eds.), Assessing science learning: Perspectives from research and practice, Arlington, VA: National Science Teachers Association, pp. 167–90.Google Scholar
MacCartney, B., Grenager, T., de Marneffe, M., Cer, D., and Manning, C. 2006. Learning to recognize features of valid textual entailments. In Proceedings of HLT-NAACL, New York, NY.Google Scholar
Makatchev, M., Jordan, P., and VanLehn, K. 2004. Abductive theorem proving for analyzing student explanations and guiding feedback in intelligent tutoring systems. Journal of Automated Reasoning (special issue on automated reasoning and theorem proving in education) 32 (3): 187226.Google Scholar
Mitchell, T., Russell, T., Broomhead, P., and Aldridge, N. 2002. Towards robust computerized marking of free-text responses. In Proceedings of 6th International Computer Aided Assessment Conference, Loughborough.Google Scholar
Nielsen, R. D., and Ward, W. 2007. A corpus of fine-grained entailment relations. In Proceedings of the ACL Workshop on Textual Entailment and Paraphrasing, Prague, Czech Republic.Google Scholar
Nielsen, R. D., Ward, W., and Martin, J. H. 2006. Toward dependency path based entailment. In Proceedings of the 2nd PASCAL RTE Challenge Workshop, Venice, Italy.Google Scholar
Nielsen, R. D., Ward, W., and Martin, J. H. 2007. Soft computing in intelligent tutoring systems and educational assessment. In Soft Computing Applications in Business, Springer-Verlag, Heidelberg, Germany, pp. 201–30.Google Scholar
Nivre, J., Hall, J., Nilsson, J., Chanev, A., Eryigit, G., Kubler, S., Marinov, S., and Marsi, E. 2007. MaltParser: a language-independent system for data-driven dependency parsing. Natural Language Engineering 13 (2): 95135.CrossRefGoogle Scholar
Pang, B., Knight, K., and Marcu, D. 2003 Syntax-based alignment of multiple translations: extracting paraphrases and generating sentences. In Proceedings of the HLT/NAACL, Edmonton, Canada.Google Scholar
Pon-Barry, H., Clark, B., Schultz, K., Bratt, E. O., and Peters, S. 2004 Contextualizing learning in a reflective conversational tutor. In Proceedings of the 4th IEEE International Conference on Advanced Learning Technologies, Joensuu, Finland.Google Scholar
Quinlan, J. R. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA.Google Scholar
Raina, R., Haghighi, A., Cox, C., Finkel, J., Michels, J., Toutanova, K., MacCartney, B., de Marneffe, M. C., Manning, C. D., and Ng, A. Y. 2005. Robust textual inference using diverse knowledge sources. In Proceedings of the PASCAL RTE Challenge Workshop, Southampton, UK.Google Scholar
Ravichandran, D., and Hovy, E. 2002. Learning surface text patterns for a question answering system. In Proceedings of the 40th ACL Conference, Philadelphia, PA.Google Scholar
Rosé, C. P., Roque, A., Bhembe, D., and VanLehn, K. 2003. A hybrid text classification approach for analysis of student essays. In Proceedings of the HLT-NAACL03 Workshop on Building Educational Applications Using Natural Language Processing, Sapporo, Japan, pp. 6875.Google Scholar
Sudo, K., Sekine, S., and Grishman, R. 2001. Automatic pattern acquisition for Japanese information extraction. In Proceedings of HLT, San Diego, CA.Google Scholar
Sukkarieh, J. Z., Pulman, S. G., and Raikes, N. 2003. Auto-marking: using computational linguistics to score short, free text responses. In Proceedings of the 29th Conference of the International Association for Educational Assessment, Manchester, UK.Google Scholar
Tatu, M., and Moldovan, D. 2007. COGEX at RTE 3. In Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, Prague.Google Scholar
Turney, P. D. 2001. Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In Proceedings of 12th European Conference on Machine Learning, Freiburg, Germany, pp. 491502.Google Scholar
Vanderwende, L., Coughlin, D., and Dolan, W. B. (2005) What syntax can contribute in the entailment task. In Proceedings of the PASCAL Workshop for Recognizing Textual Entailment, Southampton, UK.Google Scholar