Hostname: page-component-cd9895bd7-p9bg8 Total loading time: 0 Render date: 2024-12-26T09:11:46.866Z Has data issue: false hasContentIssue false

Learning human multimodal dialogue strategies

Published online by Cambridge University Press:  22 April 2009

V. RIESER
Affiliation:
School of Informatics, University of Edinburgh, Edinburgh, EH9 8AB, GB e-mail: vrieser@inf.ed.ac.uk
O. LEMON
Affiliation:
School of Informatics, University of Edinburgh, Edinburgh, EH9 8AB, GB e-mail: olemon@inf.ed.ac.uk

Abstract

We investigate the use of different machine learning methods in combination with feature selection techniques to explore human multimodal dialogue strategies and the use of those strategies for automated dialogue systems. We learn policies from data collected in a Wizard-of-Oz study where different human ‘wizards’ decide whether to ask a clarification request in a multimodal manner or else to use speech alone. We first describe the data collection, the coding scheme and annotated corpus, and the validation of the multimodal annotations. We then show that there is a uniform multimodal dialogue strategy across wizards, which is based on multiple features in the dialogue context. These are generic features, available at runtime, which can be implemented in dialogue systems. Our prediction models (for human wizard behaviour) achieve a weighted f-score of 88.6 per cent (which is a 25.6 per cent improvement over the majority baseline). We interpret and discuss the learned strategy. We conclude that human wizard behaviour is not optimal for automatic dialogue systems, and argue for the use of automatic optimization methods, such as Reinforcement Learning. Throughout the investigation we also discuss the issues arising from using small initial Wizard-of-Oz data sets, and we show that feature engineering is an essential step when learning dialogue strategies from such limited data.

Type
Papers
Copyright
Copyright © Cambridge University Press 2009

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Carletta, J. 1996. Assessing agreement on classification tasks: the kappa statistic. Computational Linguistic 2 (22): 249254.Google Scholar
Carletta, J., Evert, S., Heid, U., Kilgour, J., Robertson, J., and Voormann, H. 2003. The NITE XML Toolkit: flexible annotation for multi-modal language data. Behavior Research Methods, Instruments, and Computers, special issue on Measuring Behavior 35 (3): 353363.CrossRefGoogle Scholar
Carletta, J., Isard, A., Isard, S., Kowtko, J. C., Doherty-Sneddon, G., and Anderson, A. H. 1997. The reliability of a dialogue structure coding scheme. Computational Linguistics 1 (23): 1331.Google Scholar
Clark, H. 1996. Using Language. Cambridge University Press, Cambridge.CrossRefGoogle Scholar
Cohen, W. W. 1995. Fast effective rule induction. In Proceedings of the 12th ICML-95.CrossRefGoogle Scholar
Craggs, R., and McGee-Wood, M. 2005. Evaluating discourse and dialogue coding schemes. Computational Linguistics 31 (3): 289296.CrossRefGoogle Scholar
Daelemans, W., Hoste, V., DeMeulder, F. Meulder, F., and Naudts, B. 2003. Combined optimization of feature selection and algorithm parameter interaction in machine learning of language. In Proceedings of the 14th ECML-03.CrossRefGoogle Scholar
Fayyad, U., and Irani, K. 1993. Multi-interval discretization of continuous valued attributes for classification learning. In Proc. IJCAI-93.Google Scholar
Hall, M. 2000. Correlation-based feature selection for discrete and numeric class machine learning. In Proc. 17th Int Conf. on Machine Learning.Google Scholar
Henderson, J., Lemon, O., and Georgila, K. 2008. Hybrid reinforcement/supervised learning of dialogue policies from fixed datasets. Computational Linguistics 34 (4): 487513.CrossRefGoogle Scholar
John, G., and Langley, P. 1995. Estimating continuous distributions in bayesian classifiers. In Proceedings of the 11th UAI-95. Morgan Kaufmann.Google Scholar
Kruijff-Korbayová, I., Becker, T., Blaylock, N., Gerstenberger, C., Kaisser, M., Poller, P., Rieser, V., and Schehl, J. 2006a. The SAMMIE corpus of multimodal dialogues with an MP3 player. In Proceedings the 5th International Conference on Language Resources and Evaluation (LREC).Google Scholar
Kruijff-Korbayová, I., Blaylock, N., Gerstenberger, C., Rieser, V., Becker, T., Kaisser, M., Poller, P., and Schehl, J. 2005. An experiment setup for collecting data for adaptive output planning in a multimodal dialogue system. In 10th European Workshop on NLG.Google Scholar
Kruijff-Korbayová, I., Rieser, V., Gerstenberger, C., Schehl, J., and Becker, T. 2006b. The Sammie multimodal dialogue corpus meets the Nite XML Toolkit. In Proceedings of the Fifth Workshop on multi-dimensional Markup in Natural Language Processing.CrossRefGoogle Scholar
Langley, P., and Sage, S. 1994. Induction of selective Bayesian classifiers. In Proceedings of the 10th UAI-94.CrossRefGoogle Scholar
Le, Z. 2003. Maximum Entropy Modeling Toolkit for Python and C++. homepages.inf.ed.ac.uk/s0450736/maxent_toolkit.html.Google Scholar
Lemon, O., Georgila, K., and Henderson, J. 2006. Evaluating effectiveness and portability of reinforcement learned dialogue strategies with real users: the TALK TownInfo evaluation. In IEEE/ACL Spoken Language Technology.CrossRefGoogle Scholar
Lemon, O., Georgila, K., Henderson, J., Gabsdil, M., Meza-Ruiz, I., and Young, S. 2005. Deliverable D4.1: integration of learning and adaptivity with the ISU approach. Technical report, TALK Project, www.talk-project.org.Google Scholar
Mattes, S. 2003. The lane-change-task as a tool for driver distraction evaluation. In Proc. of IGfA.Google Scholar
Oviatt, S. 2002. Breaking the robustness barrier: recent progress on the design of robust multimodal systems. In Advances in Computers, vol. 56, Academic Press, London.Google Scholar
Oviatt, S., Coulston, R., and Lunsford, R. 2004. When do we interact Multimodally? Cognitive load and multimodal communication patterns. In Proceedings of the 6th ICMI-04.CrossRefGoogle Scholar
Purver, M., Ginzburg, J., and Healey, P. 2003. On the means for clarification in dialogue. In Smith, R., and van Kuppevelt, J. (eds.), Current and New Directions in Discourse and Dialogue, Dordrecht, The Netherlands.Google Scholar
Quinlan, R. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo.Google Scholar
Rieser, V. 2008. Bootstrapping Reinforcement Learning-based Dialogue Strategies from Wizard-of-Oz data. Ph.D. thesis, Saarbruecken Dissertations in Computational Linguistics and Language Technology, Vol 28.Google Scholar
Rieser, V., Kruijff-Korbayová, I., and Lemon, O. 2005. A corpus collection and annotation framework for learning multimodal clarification strategies. In Proceedings of the 6th SIGdial Workshop on Discourse and Dialogue.Google Scholar
Rieser, V., and Lemon, O. 2006. Utilising machine learning to explore human multimodal clarification strategies. In Proceedings of the 44rd Annual Meeting of the Association for Computational Linguistics, COLING/ACL.CrossRefGoogle Scholar
Rieser, V., and Lemon, O. 2008. Learning effective multimodal dialogue strategies from Wizard-of-Oz data: bootstrapping and evaluation. In Proceedings of ACL.Google Scholar
Rieser, V., and Lemon, O. 2009. Natural language generation as planning under uncertainty for spoken dialogue system. In Proceedings of EACL.CrossRefGoogle Scholar
Rieser, V., and Moore, J. 2005. Implications for generating clarification requests in task-oriented dialogues. In Proceedings of the 43rd ACL.CrossRefGoogle Scholar
Rodriguez, K., and Schlangen, D. 2004. Form, intonation and function of clarification requests in German task-orientaded spoken dialogues. In Proceedings of the Eighth Workshop on Formal Semantics and Dialogue.Google Scholar
Salmen, A. 2002. Multimodale Menüausgabe im Fahrzeug (Multimodal Menu-based Interaction in the Vehicle). Ph.D. thesis, University of Regensburg.Google Scholar
Schlangen, D., and Fernandez, R. 2007. Speaking through a noisy channel: experiments on inducing clarification behaviour in human–human dialogue. In Interspeech.CrossRefGoogle Scholar
Skantze, G. 2005. Exploring human error recovery strategies: implications for spoken dialogue systems. Speech Communication 43 (3): 325341.CrossRefGoogle Scholar
Stuttle, M. N., Williams, J. D., and Young, S. 2004. A framework for dialogue data collection with a simulated ASR Channel. In ICSLP.CrossRefGoogle Scholar
Walker, M., Whittaker, S., Stent, A., Maloor, P., Moore, J., Johnston, M., and Vasireddy, G. 2004. User tailored generation in the MATCH multimodal dialogue system. Cognitive Science, 28: 811840.CrossRefGoogle Scholar
Winterboer, A., Hu, J., Moore, J. D., and Nass, C. 2007. The influence of user tailoring and cognitive load on user performance in spoken dialogue systems. in Proc. ICSLP.CrossRefGoogle Scholar
Witten, I. H., and Frank, E 2005. Data Mining: Practical Machine Learning Tools and Techniques (2nd Edition). Morgan Kaufmann, San Francisco.Google Scholar