Hostname: page-component-745bb68f8f-5r2nc Total loading time: 0 Render date: 2025-01-23T07:09:37.979Z Has data issue: false hasContentIssue false

A survey on metrics for the evaluation of user simulations

Published online by Cambridge University Press:  28 November 2012

Olivier Pietquin
Affiliation:
SUPELEC – IMS-MaLIS Research Group, UMI 2958 (GeorgiaTech – CNRS), 2 rue Edouard Belin, 57070 Metz, France; e-mail: olivier.pietquin@supelec.fr
Helen Hastie
Affiliation:
School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh EH14 4AS, UK; e-mail: h.hastie@hw.ac.uk

Abstract

User simulation is an important research area in the field of spoken dialogue systems (SDSs) because collecting and annotating real human–machine interactions is often expensive and time-consuming. However, such data are generally required for designing, training and assessing dialogue systems. User simulations are especially needed when using machine learning methods for optimizing dialogue management strategies such as Reinforcement Learning, where the amount of data necessary for training is larger than existing corpora. The quality of the user simulation is therefore of crucial importance because it dramatically influences the results in terms of SDS performance analysis and the learnt strategy. Assessment of the quality of simulated dialogues and user simulation methods is an open issue and, although assessment metrics are required, there is no commonly adopted metric. In this paper, we give a survey of User Simulations Metrics in the literature, propose some extensions and discuss these metrics in terms of a list of desired features.

Type
Articles
Copyright
Copyright © Cambridge University Press 2012

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Ai, H., Litman, D. 2008. Assessing dialog system user simulation evaluation measures using human judges. In Proceedings of the 46th Meeting of the Association for Computational Linguistics, Columbus, OH, USA, 622–629.Google Scholar
Ai, H., Litman, D. 2009. Setting up user action probabilities in user simulations for dialog system development. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL), Singapore.CrossRefGoogle Scholar
Anderson, T. 1962. On the distribution of the two-sample Cramér-von Mises criterion. Annals of Mathematical Statistics 33(3), 11481159.CrossRefGoogle Scholar
Carletta, J. 1996. Assessing agreement on classification tasks: the kappa statistic. Computational Linguistics 22(2), 249254.Google Scholar
Chandramohan, S., Geist, M., Lefèvre, F., Pietquin, O. 2011. User Simulation in Dialogue Systems using Inverse Reinforcement Learning. In Proceedings of Interspeech 2011, Florence, Italy.CrossRefGoogle Scholar
Cramer, H. 1928. On the composition of elementary errors. Second paper: statistical applications. Skandinavisk Aktuarietidskrift 11, 171180.Google Scholar
Cuayahuitl, H., Renals, S., Lemon, O., Shimodaira, H. 2005. Human–computer dialogue simulation using hidden Markov models. In Proceedings of ASRU, 290–295. Cancun, MexicoCrossRefGoogle Scholar
Cuayahuitl, H. 2009. Hierarchical Reinforcement Learning for Spoken Dialogue Systems. PhD thesis, University of Edinburgh, UK.Google Scholar
Doddington, G. 2002. Automatic evaluation of machine translation quality using n-gram cooccurrence statistics. In Proceedings of the Human Language Technology Conference (HLT), San Diego, CA, USA, 128–132.Google Scholar
Eckert, W., Levin, E., Pieraccini, R. 1997. User modeling for spoken dialogue system evaluation. In Proceedings of ASRU'97. Santa Barbara, USA.Google Scholar
Frampton, M., Lemon, O. 2010. Recent research advances in reinforcement learning in spoken dialogue systems. The Knowledge Engineering Review 24(4), 375408.CrossRefGoogle Scholar
Georgila, K., Henderson, J., Lemon, O. 2005. Learning user simulations for information state update dialogue systems. In Proceedings of Interspeech 2005. Lisboa, Portugal.CrossRefGoogle Scholar
Georgila, K., Henderson, J., Lemon, O. 2006. User simulation for spoken dialogue systems: learning and evaluation. In Proceedings of Interspeech'06. Pittsburg, USA.CrossRefGoogle Scholar
Janarthanam, S., Lemon, O. 2009a. A data-driven method for adaptive referring expression generation in automated dialogue systems: maximising expected utility. In Proceedings of PRE-COGSCI 09. Boston, USA.CrossRefGoogle Scholar
Janarthanam, S., Lemon, O. 2009b. A two-tier user simulation model for reinforcement learning of adaptive referring expression generation policies. In Proceedings of SIGDIAL. London, UK.CrossRefGoogle Scholar
Janarthanam, S., Lemon, O. 2009c. Learning adaptive referring expression generation policies for spoken dialogue systems using reinforcement learning. In Proceedings of SEMDIAL. Stockholm, Sweden.CrossRefGoogle Scholar
Janarthanam, S., Lemon, O. 2009d. A Wizard-of-Oz environment to study referring expression generation in a situated spoken dialogue task. In Proceedings of ENLG, 2009. Athens, Greece.CrossRefGoogle Scholar
Jung, S., Lee, C., Kim, K., Jeong, M., Lee, G. G. 2009. Data-driven user simulation for automated evaluation of spoken dialog systems. Computer Speech & Language 23(4), 479509.CrossRefGoogle Scholar
Kullback, S., Leiber, R. 1951. On information and sufficiency. Annals of Mathematical Statistics 22, 7986.CrossRefGoogle Scholar
Levin, E., Pieraccini, R., Eckert, W. 1997. Learning dialogue strategies within the Markov decision process framework. In Proceedings of ASRU'97. Santa Barbara, USA.Google Scholar
Levin, E., Pieraccini, R., Eckert, W. 2000. A stochastic model of human–machine interaction for learning dialog strategies. IEEE Transactions on Speech and Audio Processing 8(1), 1123.CrossRefGoogle Scholar
López-Cózar, R., de la Torre, A., Segura, J., Rubio, A. 2003. Assesment of dialogue systems by means of a new simulation technique. Speech Communication 40(3), 387407.CrossRefGoogle Scholar
Ng, A. Y., Russell, S. 2000. Algorithms for inverse reinforcement learning. In Proceedings of 17th International Conference on Machine Learning. Morgan Kaufmann, 663–670.Google Scholar
Paek, T., Pieraccini, R. 2008. Automating spoken dialogue management design using machine learning: an industry perspective. Speech Communication 50, 716729.CrossRefGoogle Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, 311–318.Google Scholar
Pietquin, O., Dutoit, T. 2006. A probabilistic framework for dialog simulation and optimal strategy learning. IEEE Transactions on Audio, Speech and Language Processing 14(2), 589599.CrossRefGoogle Scholar
Pietquin, O., Rossignol, S., Ianotto, M. 2009. Training Bayesian networks for realistic man–machine spoken dialogue simulation. In Proceedings of the 1st International Workshop on Spoken Dialogue Systems Technology, Irsee, Germany, 4.Google Scholar
Pietquin, O. 2004. A Framework for Unsupervised Learning of Dialogue Strategies. PhD thesis, Faculté Polytechnique de Mons (FPMs), Belgium.Google Scholar
Pietquin, O. 2006. Consistent goal-directed user model for realisitc man–machine task-oriented spoken dialogue simulation. In Proceedingsof ICME'06. Toronto, Canada.CrossRefGoogle Scholar
Rieser, V. 2008. Bootstrapping Reinforcement Learning-based Dialogue Strategies from Wizard-of-Oz data. PhD thesis, Saarland University, Department of Computational Linguistics.Google Scholar
Rieser, V., Lemon, O. 2006. Simulations for learning dialogue strategies. In Proceedings of Interspeech 2006, Pittsburg, USA.CrossRefGoogle Scholar
Rieser, V., Lemon, O. 2008. Learning effective multimodal dialogue strategies from Wizard-of-Oz data: bootstrapping and evaluation. In Proceedings of ACL, 2008. Colombus, Ohio.Google Scholar
Russell, S. 1998. Learning agents for uncertain environments (extended abstract). In COLT’ 98: Proceedings of the 11th Annual Conference on Computational Learning Theory. ACM, 101–103. Madisson, USA.CrossRefGoogle Scholar
Schatzmann, J., Georgila, K., Young, S. 2005a. Quantitative evaluation of user simulation techniques for spoken dialogue systems. In Proceedings of SIGdial'05. Lisbon, Portugal.CrossRefGoogle Scholar
Schatzmann, J., Stuttle, M. N., Weilhammer, K., Young, S. 2005b. Effects of the user model on simulation-based learning of dialogue strategies. In Proceedings of ASRU'05. Cancun, Mexico.CrossRefGoogle Scholar
Schatzmann, J., Thomson, B., Weilhammer, K., Ye, H., Young, S. 2007a. Agenda-based user simulation for bootstrapping a POMDP dialogue system. In Proceedings of ICASSP'07. Honolulu, USA.CrossRefGoogle Scholar
Schatzmann, J., Thomson, B., Young, S. 2007b. Statistical user simulation with a hidden agenda. In Proceedings of SigDial'07. Anvers, Belgium.Google Scholar
Schatzmann, J., Weilhammer, K., Stuttle, M., Young, S. 2006. A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies. The Knowledge Engineering Review 21(2), 97126.CrossRefGoogle Scholar
Scheffler, K., Young, S. 2001. Corpus-based dialogue simulation for automatic strategy learning and evaluation. In Proceedings of NAACL Workshop on Adaptation in Dialogue Systems. Pittsburgh, PA, USA.Google Scholar
Singh, S., Kearns, M., Litman, D., Walker, M. 1999. Reinforcement learning for spoken dialogue systems. In Proceedings of the NIPS'99. Vancouver, Canada.Google Scholar
Sutton, R., Barto, A. 1998. Reinforcement Learning: An Introduction. MIT Press.Google Scholar
van Rijsbergen, C. J. 1979. Information Retrieval, second edn.Butterworths.Google Scholar
Walker, M., Hindle, D., Fromer, J., Fabbrizio, G. D., Mestel, C. 1997a. Evaluating competing agent strategies for a voice email agent. In Proceedings of the 5th European Conference on Speech Communication and Technology (Eurospeech'97), Rhodes, Greece.CrossRefGoogle Scholar
Walker, M., Litman, D., Kamm, C., Abella, A. 1997b. Paradise: a framework for evaluating spoken dialogue agents. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, 271–280. Madrid, Spain.CrossRefGoogle Scholar
Williams, J. D., Young, S. 2007. Partially observable Markov decision processes for spoken dialog systems. Computer Speech and Language 21(2), 393422.CrossRefGoogle Scholar
Williams, J., Poupart, P., Young, S. 2005. Partially Observable Markov Decision Processes with Continuous Observations for Dialogue Management. In Proceedings of the SigDial Workshop (SigDial'06). Sydney, Australia.Google Scholar
Williams, J. 2008. Evaluating user simulations with the Cramer-von Mises divergence. Speech Communication 50, 829846.CrossRefGoogle Scholar
Zukerman, I., Albrecht, D. 2001. Predictive statistical models for user modeling. User Modeling and User-Adapted Interaction 11, 518.CrossRefGoogle Scholar