Published online by Cambridge University Press: 22 April 2009
We investigate the use of different machine learning methods in combination with feature selection techniques to explore human multimodal dialogue strategies and the use of those strategies for automated dialogue systems. We learn policies from data collected in a Wizard-of-Oz study where different human ‘wizards’ decide whether to ask a clarification request in a multimodal manner or else to use speech alone. We first describe the data collection, the coding scheme and annotated corpus, and the validation of the multimodal annotations. We then show that there is a uniform multimodal dialogue strategy across wizards, which is based on multiple features in the dialogue context. These are generic features, available at runtime, which can be implemented in dialogue systems. Our prediction models (for human wizard behaviour) achieve a weighted f-score of 88.6 per cent (which is a 25.6 per cent improvement over the majority baseline). We interpret and discuss the learned strategy. We conclude that human wizard behaviour is not optimal for automatic dialogue systems, and argue for the use of automatic optimization methods, such as Reinforcement Learning. Throughout the investigation we also discuss the issues arising from using small initial Wizard-of-Oz data sets, and we show that feature engineering is an essential step when learning dialogue strategies from such limited data.