Recognizing entailment in intelligent tutoring systems*

RODNEY D. NIELSEN; WAYNE WARD; JAMES H. MARTIN

doi:10.1017/S135132490999012X

Recognizing entailment in intelligent tutoring systems*

Published online by Cambridge University Press: 16 September 2009

RODNEY D. NIELSEN ,

WAYNE WARD and

JAMES H. MARTIN

Show author details

RODNEY D. NIELSEN: Affiliation:
Boulder Language Technologies, 2960 Center Green Ct, Boulder, CO 80301, USA Department of Computer Science, Institute of Cognitive Science and The Center for Computational Language and Education Research, University of Colorado, Campus Box 594, Boulder, CO 80309-0594, USA e-mails: Rodney.Nielsen@Colorado.edu, Wayne.Ward@Colorado.edu, James.Martin@Colorado.edu
WAYNE WARD: Affiliation:
Boulder Language Technologies, 2960 Center Green Ct, Boulder, CO 80301, USA Department of Computer Science, Institute of Cognitive Science and The Center for Computational Language and Education Research, University of Colorado, Campus Box 594, Boulder, CO 80309-0594, USA e-mails: Rodney.Nielsen@Colorado.edu, Wayne.Ward@Colorado.edu, James.Martin@Colorado.edu
JAMES H. MARTIN: Affiliation:
Department of Computer Science, Institute of Cognitive Science and The Center for Computational Language and Education Research, University of Colorado, Campus Box 594, Boulder, CO 80309-0594, USA e-mails: Rodney.Nielsen@Colorado.edu, Wayne.Ward@Colorado.edu, James.Martin@Colorado.edu

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

This paper describes a new method for recognizing whether a student's response to an automated tutor's question entails that they understand the concepts being taught. We demonstrate the need for a finer-grained analysis of answers than is supported by current tutoring systems or entailment databases and describe a new representation for reference answers that addresses these issues, breaking them into detailed facets and annotating their entailment relationships to the student's answer more precisely. Human annotation at this detailed level still results in substantial interannotator agreement (86.2%), with a kappa statistic of 0.728. We also present our current efforts to automatically assess student answers, which involves training machine learning classifiers on features extracted from dependency parses of the reference answer and student's response and features derived from domain-independent lexical statistics. Our system's performance, as high as 75.5% accuracy within domain and 68.8% out of domain, is very encouraging and confirms the approach is feasible. Another significant contribution of this work is that it represents a significant step in the direction of providing domain-independent semantic assessment of answers. No prior work in the area of tutoring or educational assessment has attempted to build such domain-independent systems. They have virtually all required hundreds of examples of learner answers for each new question in order to train aspects of their systems or to hand-craft information extraction templates.

Information

Type: Papers
Information: Natural Language Engineering , Volume 15 , Special Issue 4: Textual Entailment , October 2009 , pp. 479 - 501

DOI: https://doi.org/10.1017/S135132490999012X [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2009

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Agichtein, E., and Gravano, L. 2000. Snowball: extracting relations from large plain-text collections. In Proceedings of the 5th ACM ICDL, Kyoto, Japan.Google Scholar

Aleven, V., Popescu, O., and Koedinger, K. R. 2001. A tutorial dialogue system with knowledge-based understanding and classification of student explanations. In IJCAI Workshop on Knowledge and Reasoning in Practical Dialogue Systems, Seattle, WA.Google Scholar

Bar-Haim, R., Szpektor, I., and Glickman, O. 2005. Definition and analysis of intermediate entailment levels. In Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment, Ann Arbor, MI.Google Scholar

Barzilay, R., and Lee, L. 2003. Learning to paraphrase: an unsupervised approach using multiple-sequence alignment. In Proceedings of the HLT-NAACL, Edmonton, Canada, pp. 16–23.Google Scholar

Barzilay, R., and McKeown, K. 2001. Extracting paraphrases from a parallel corpus. In Proceedings of the ACL/EACL, Toulouse, France, pp. 50–7.Google Scholar

Braz, R. S., Girju, R., Punyakanok, V., Roth, D., and Sammons, M. 2005. An inference model for semantic entailment in natural language. In Proceedings of the PASCAL Recognizing Textual Entailment Challenge Workshop, Southampton, UK.Google Scholar

Burger, J., and Ferro, L. 2005. Generating an entailment corpus from news headlines. In Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment, Ann Arbor, MI, pp. 49–54.CrossRef Google Scholar

Callear, D., Jerrams-Smith, J., and Soh, V. 2001. CAA of short non-MCQ answers. In Proceedings of the 5th International CAA Conference, Loughborough.Google Scholar

Cohen, J. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20: 37–46.CrossRef Google Scholar

Dagan, I., Glickman, O., and Magnini, B. 2005. The PASCAL Recognizing Textual Entailment Challenge. In Proceedings of the PASCAL RTE Challenge Workshop, Southampton, UK.Google Scholar

Dolan, W. B., Quirk, C., and Brockett, C. 2004. Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources. In Proceedings of COLING 2004, Geneva, Switzerland.Google Scholar

Giampiccolo, D., Magnini, B., Dagan, I., and Dolan, B. 2007. The Third PASCAL Recognizing Textual Entailment Challenge. In Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, Prague, Czech Republic.Google Scholar

Gildea, D., and Jurafsky, D. 2002. Automatic labeling of semantic roles. Computational Linguistics 28 (3): 245–88.Google Scholar

Glickman, O., and Dagan, I, 2003. Identifying lexical paraphrases from a single corpus: a case study for verbs. In Proceedings of RANLP, Borovets, Bulgaria.Google Scholar

Glickman, O., Dagan, I., and Koppel, M. 2005. Web based probabilistic textual entailment. In Proceedings of the PASCAL RTE Challenge Workshop, Southampton, UK.Google Scholar

Graesser, A. C., Hu, X., Susarla, S., Harter, D., Person, N. K., Louwerse, M., Olde, B., and the Tutoring Research Group. 2001. AutoTutor: an intelligent tutor and conversational tutoring scaffold. In Proceedings of the 10th International Conference of Artificial Intelligence in Education, San Antonio, TX, pp. 47–9.Google Scholar

Grice, H. P. 1975. Logic and conversation. In Cole, P. and Morgan, J. (eds.), Syntax and Semantics, Vol 3, Speech Acts, 43–58. Academic Press, New York.Google Scholar

Hickl, A., and Bensley, J. 2007. A discourse commitment-based framework for recognizing textual entailment. In Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, Southampton, UK.Google Scholar

Kipper, K., Dang, H. T., and Palmer, M. 2000. Class-based construction of a verb lexicon. In AAAI Seventeenth National Conference on Artificial Intelligence, Austin, TX.Google Scholar

Landauer, T. K., and Dumais, S. T. 1997. A solution to Plato's problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Journal of Psychological Review 104 (2): 211–240.Google Scholar

Lawrence Hall of Science. 2005. Full Option Science System (FOSS). Nashua, NH: University of California at Berkeley, Delta Education.Google Scholar

Leacock, C., and Chodorow, M. 2003. C-rater: automated scoring of short-answer questions. Computers and the Humanities 37 (4): 389–405.CrossRef Google Scholar

Lin, D., and Pantel, P. 2001. Discovery of inference rules for question answering. Natural Language Engineering 7 (4): 343–60.CrossRef Google Scholar

Long, K., Malone, L., and De Lucchi, L. 2008. Assessing science knowledge: Seeing more through the formative assessment lens. In Coffey, J., Douglas, R. and Stearns, C. (eds.), Assessing science learning: Perspectives from research and practice, Arlington, VA: National Science Teachers Association, pp. 167–90.Google Scholar

MacCartney, B., Grenager, T., de Marneffe, M., Cer, D., and Manning, C. 2006. Learning to recognize features of valid textual entailments. In Proceedings of HLT-NAACL, New York, NY.Google Scholar

Makatchev, M., Jordan, P., and VanLehn, K. 2004. Abductive theorem proving for analyzing student explanations and guiding feedback in intelligent tutoring systems. Journal of Automated Reasoning (special issue on automated reasoning and theorem proving in education) 32 (3): 187–226.Google Scholar

Mitchell, T., Russell, T., Broomhead, P., and Aldridge, N. 2002. Towards robust computerized marking of free-text responses. In Proceedings of 6th International Computer Aided Assessment Conference, Loughborough.Google Scholar

Nielsen, R. D., and Ward, W. 2007. A corpus of fine-grained entailment relations. In Proceedings of the ACL Workshop on Textual Entailment and Paraphrasing, Prague, Czech Republic.Google Scholar

Nielsen, R. D., Ward, W., and Martin, J. H. 2006. Toward dependency path based entailment. In Proceedings of the 2nd PASCAL RTE Challenge Workshop, Venice, Italy.Google Scholar

Nielsen, R. D., Ward, W., and Martin, J. H. 2007. Soft computing in intelligent tutoring systems and educational assessment. In Soft Computing Applications in Business, Springer-Verlag, Heidelberg, Germany, pp. 201–30.Google Scholar

Nivre, J., Hall, J., Nilsson, J., Chanev, A., Eryigit, G., Kubler, S., Marinov, S., and Marsi, E. 2007. MaltParser: a language-independent system for data-driven dependency parsing. Natural Language Engineering 13 (2): 95–135.CrossRef Google Scholar

Pang, B., Knight, K., and Marcu, D. 2003 Syntax-based alignment of multiple translations: extracting paraphrases and generating sentences. In Proceedings of the HLT/NAACL, Edmonton, Canada.Google Scholar

Pon-Barry, H., Clark, B., Schultz, K., Bratt, E. O., and Peters, S. 2004 Contextualizing learning in a reflective conversational tutor. In Proceedings of the 4th IEEE International Conference on Advanced Learning Technologies, Joensuu, Finland.Google Scholar

Quinlan, J. R. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA.Google Scholar

Raina, R., Haghighi, A., Cox, C., Finkel, J., Michels, J., Toutanova, K., MacCartney, B., de Marneffe, M. C., Manning, C. D., and Ng, A. Y. 2005. Robust textual inference using diverse knowledge sources. In Proceedings of the PASCAL RTE Challenge Workshop, Southampton, UK.Google Scholar

Ravichandran, D., and Hovy, E. 2002. Learning surface text patterns for a question answering system. In Proceedings of the 40th ACL Conference, Philadelphia, PA.Google Scholar

Rosé, C. P., Roque, A., Bhembe, D., and VanLehn, K. 2003. A hybrid text classification approach for analysis of student essays. In Proceedings of the HLT-NAACL03 Workshop on Building Educational Applications Using Natural Language Processing, Sapporo, Japan, pp. 68–75.Google Scholar

Sudo, K., Sekine, S., and Grishman, R. 2001. Automatic pattern acquisition for Japanese information extraction. In Proceedings of HLT, San Diego, CA.Google Scholar

Sukkarieh, J. Z., Pulman, S. G., and Raikes, N. 2003. Auto-marking: using computational linguistics to score short, free text responses. In Proceedings of the 29th Conference of the International Association for Educational Assessment, Manchester, UK.Google Scholar

Tatu, M., and Moldovan, D. 2007. COGEX at RTE 3. In Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, Prague.Google Scholar

Turney, P. D. 2001. Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In Proceedings of 12th European Conference on Machine Learning, Freiburg, Germany, pp. 491–502.Google Scholar

Vanderwende, L., Coughlin, D., and Dolan, W. B. (2005) What syntax can contribute in the entailment task. In Proceedings of the PASCAL Workshop for Recognizing Textual Entailment, Southampton, UK.Google Scholar

Article contents

Recognizing entailment in intelligent tutoring systems*

Abstract

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests