Hostname: page-component-78c5997874-g7gxr Total loading time: 0 Render date: 2024-11-13T04:41:01.801Z Has data issue: false hasContentIssue false

Finite-state models for speech-based search on mobile devices

Published online by Cambridge University Press:  21 March 2011

TANIYA MISHRA
Affiliation:
AT&T Labs-Research, 180 Park Ave Florham Park, NJ 07932, USA email: taniya@research.att.com, srini@research.att.com
SRINIVAS BANGALORE
Affiliation:
AT&T Labs-Research, 180 Park Ave Florham Park, NJ 07932, USA email: taniya@research.att.com, srini@research.att.com

Abstract

In this paper, we present techniques that exploit finite-state models for voice search applications. In particular, we illustrate the use of finite-state models for encoding the search index in order to tightly integrate the speech recognition and the search components of a voice search system. We show that the tight integration mutually benefits Automatic Speech Recognition and improves the search. In the second part of the paper, we discuss the use of finite-state techniques for spoken language understanding, in particular, to segment an input query into its component semantic fields so as to improve search as well as to extend the functionality of the system and be able to execute the user's request against a backend database.

Type
Papers
Copyright
Copyright © Cambridge University Press 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Acero, A., Bernstein, N., Chambers, R., Ju, Y., Li, X., Odell, J., Nguyen, O. S. P., and Zweig, G. 2008. Live search for mobile: web services by voice on the cellphone. In Proceedings of ICASSP 2008, Las Vegas, NV, USA, pp. 5256–9.Google Scholar
Allauzen, C., Mohri, M., Riley, M. and Roark, B. 2004. A generalized construction of speech recognition transducers. In Proceedings of ICASSP 2004, Montreal, Canada, pp. 761–4.Google Scholar
Allauzen, C., Mohri, M. and Roark, B. 2003. Generalized algorithms for constructing statistical language models. In Proceedings of Association for Computational Linguistics, Sapporo, Japan, pp. 40–7.Google Scholar
Androutsopoulos, L. 1995. Natural language interfaces to databases - an introduction. Journal of Natural Language Engineering 1: 2981.CrossRefGoogle Scholar
Bacchiani, M., Beaufays, F., Schalkwyk, J., Schuster, M., and Strope, B. 2008. Deploying GOOG-411: early lesstons in data, measurement and testing. In Proceedings of ICASSP 2008, Las Vegas, NV, USA, pp. 5260–3.Google Scholar
Dumais, S., Banko, M., Brill, E., Lin, J., and Ng, A. 2002. Web question answering: is more always better? In SIGIR '02: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, NY, USA. ACM, pp. 291–8.Google Scholar
Goffin, V., Allauzen, C., Bocchieri, E., Hakkani-Tur, D., Ljolje, A., Parthasarathy, S., Rahim, M., Riccardi, G., and Saraclar, M. 2005. The AT&T WATSON Speech Recognizer. In Proceedings of ICASSP 2005, Philadelphia, PA.Google Scholar
Green, B., Wolf, A., Chomsky, C. and Laughery, K. 1961. Baseball, an automatic question answerer. In Proceedings of the Western Joint Computer Conference, Los Angeles, CA, pp. 219–24.Google Scholar
Hatcher, E. and Gospodnetic, O. 2004. Lucene in Action (In Action series). Greenwich, CT, USA: Manning Publications Co.Google Scholar
Jeon, J., Croft, W. B. and Lee, J. H. 2005. Finding similar questions in large question and answer archives. In CIKM '05: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, New York, NY, USA, pp. 8490. ACM.Google Scholar
Katz, B. 1997. Annotating the world wide web using natural language. In Proceedings of the 5th RIAO Conference on Computer Assisted Information Searching on the Internet (RIAO), McGill University, Montreal, Quebec, Canada.Google Scholar
Levenshtein, V. 1966. Binary codes capable of correcting deletions, insertion and reversals. Soviet Physics Doklady 10: 707–10.Google Scholar
Macherey, W., Och, F., Thayer, I. and Uszkoreit, J. 2008. Lattice-based minimum error rate training for statistical machine translation. In Proceedings of Empirical Methods in Natural Language Processing, Hawaii, USA, pp. 725–34.Google Scholar
Maybury, M. T. 2004. New Directions in Question Answering. Menlo Park, CA, USA: The AAAI Press.Google Scholar
Microsoft 2009. http://www.live.com. Microsoft.Google Scholar
Mishra, T. and Bangalore, S. 2010. Qme!: a speech-based question-answering system on mobile devices. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Los Angeles, CA, USA, pp. 5563.Google Scholar
Mohri, M., Pereira, F. and Riley, M. 1998. A rational design for a weighted finite-state transducer library, pp. 144–58. Number 1436 in Lecture notes in computer science. Berlin Heidelberg, Germany: Springer-Verlag.Google Scholar
MSN-QnA (2009). http://qna.live.com/. MSN-QnA.Google Scholar
Papineni, K., Roukos, S., Ward, T. and Zhu, W. 2002, July. Bleu: a method for automatic evaluation of machine translation. In Proceedings of 40th Annual Meeting of the Association of Computational Linguistics, Philadelphia, PA, USA, pp. 313–18.Google Scholar
Pirkola, A., Leppänen, E. and Järvelin, K. 2002. The “RATF” formula (kwok's formula): exploiting average term frequency in cross-language retrieval. Information Research 7 (2).Google Scholar
Robertson, S. (2004). Understanding inverse document frequency: on theoretical arguments for IDF. Journal of Documentation 60: 503–20.Google Scholar
Robertson, S. E. and Jones, K. S. 1997. Simple proven approaches to text retrieval. Technical report, Cambridge University, Cambridge, UK.Google Scholar
Tomuro, N. and Lytinen, S. L. 2004. Retrieval models and Q and A learning with FAQ files. In New Directions in Question Answering, pp. 183–202.Google Scholar
Waldinger, R. J., Appelt, D. E., Dungan, J. L., Fry, J., Hobbs, J. R., Israel, D. J., Jarvis, P., Martin, D. L., Riehemann, S., Stickel, M. E., and Tyson, M. 2004. Deductive question answering from multiple resources. In New Directions in Question Answering, pp. 253–62.Google Scholar
Weizenbaum, J. 1966. ELIZA - a computer program for the study of natural language communication between man and machine. Communications of the ACM 1: 3645.Google Scholar
Winograd, T. 1972. Understanding Natural Language. New York, NY, USA: Academic Press.CrossRefGoogle Scholar
Woods, W. A. 1973. Progress in natural language understanding - an application to lunar geology. In Proceedings of American Federation of Information Processing Societies (AFIPS) Conference, pp. 441–50.Google Scholar
Xue, X., Jeon, J. and Croft, W. B. 2008. Retrieval models for question and answer archives. In SIGIR '08: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, NY, USA. ACM, pp. 475–82.Google Scholar
Yang, F., Feng, J. and DiFabbrizio, G. 2006. A data driven approach to relevancy recognition for contextual question answering. In Proceedings of the Interactive Question Answering Workshop at HLT-NAACL 2006, New York, USA, pp. 3340.CrossRefGoogle Scholar
YellowPages 2009. http://www.speak4it.com. YellowPages.Google Scholar