Hostname: page-component-745bb68f8f-b6zl4 Total loading time: 0 Render date: 2025-01-14T00:04:02.236Z Has data issue: false hasContentIssue false

Syntactic error detection and correction in date expressions using finite-state transducers

Published online by Cambridge University Press:  21 March 2011

ARANTZA DÍAZ DE ILARRAZA
Affiliation:
Department of Computer Languages and Systems, University of the Basque CountryP.O. box 649, E-20080 Donostia, the Basque Country, Spain emails: a.diazdeilarraza@ehu.es, koldo.gojenola@ehu.es, maite.oronoz@ehu.es, i.alegria@ehu.es
KOLDO GOJENOLA
Affiliation:
Department of Computer Languages and Systems, University of the Basque CountryP.O. box 649, E-20080 Donostia, the Basque Country, Spain emails: a.diazdeilarraza@ehu.es, koldo.gojenola@ehu.es, maite.oronoz@ehu.es, i.alegria@ehu.es
MAITE ORONOZ
Affiliation:
Department of Computer Languages and Systems, University of the Basque CountryP.O. box 649, E-20080 Donostia, the Basque Country, Spain emails: a.diazdeilarraza@ehu.es, koldo.gojenola@ehu.es, maite.oronoz@ehu.es, i.alegria@ehu.es
IÑAKI ALEGRIA
Affiliation:
Department of Computer Languages and Systems, University of the Basque CountryP.O. box 649, E-20080 Donostia, the Basque Country, Spain emails: a.diazdeilarraza@ehu.es, koldo.gojenola@ehu.es, maite.oronoz@ehu.es, i.alegria@ehu.es

Abstract

This paper presents a set of experiments for the detection and correction of syntactic errors, exploring two alternative approaches. The first one uses an error grammar which combines a robust morphosyntactic analyser and two groups of finite-state transducers (one for the description of syntactic error patterns and the other for the correction of the detected errors). We have also experimented an alternative approach using a positive date grammar where deviations are detected by applying edit-distance techniques. The system has been tested on a corpus of real texts which contained both correct and incorrect sentences. Although the experiment was limited to one language, the results show that attainable performance is not the only criterion for preferring one solution over another.

Type
Papers
Copyright
Copyright © Cambridge University Press 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Aduriz, I., Aldezabal, I., Alegria, I., Arriola, J., Díaz de Ilarraza, A., Ezeiza, N., and Gojenola, K. 2003. Finite state applications for Basque. In Proceedings of the European Chapter of the Association for Computational Linguistics (EACL 2003). Workshop on Finite-State Methods in Natural Language Processing, Budapest, Hungary.Google Scholar
Arppe, A. 2000. Developing a grammar checker for Swedish. In Proceedings of the 12th Nordiske datalingvistikkdager, Department of Linguistics, Norwegian University of Science and Technology (NTNU), Trondheim, Norway.Google Scholar
Artola, X., Díaz de Ilarraza, A., Soroa, A., and Sologaistoa, A. 2009. Dealing with complex linguistic annotations within a language processing framework. IEEE Transactions on Audio, Speech and Language Processing 17 (5): 904–15.Google Scholar
Atwell, E. and Elliot, S. 1987. Dealing with Ill-Formed english text. In Garside, R., Sampson, G., and Leech, G. (eds.), The Computational Analysis of English: a Corpus-Based Approach. London, UK: De Longman.Google Scholar
Badia, T., Gil, A., Quixal, M. and Valentín, O. 2004. NLP-enhanced error checking for catalan unrestricted text. In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004), Lisbon, Portugal, pp. 1919–22.Google Scholar
Beesley, K. R. and Karttunen, L. 2003. Finite State Morphology. Stanford, CA, USA: CSLI Studies in Computational Linguistics.Google Scholar
Birn, J. 2000. Detecting grammar errors with Lingsoft's Swedish grammar-checker. In Proceedings of the 12th Nordiske datalingvistikkdager, Department of Linguistics, Norwegian University of Science and Technology (NTNU), Trondheim, Norway.Google Scholar
Douglas, S. and Dale, R. 1992. Towards robust PATR. In Proceedings of the 14th Conference on Computational Linguistics (COLING 1992), Nantes, France, pp. 468–74.Google Scholar
Ezeiza, N. 2002. Corpusak Ustiatzeko Tresna Linguistikoak. Euskararen Etiketatzaile Sintaktiko Sendo eta Malgua. Ph. D. thesis, University of the Basque Country, Donostia-San Sebastin, Spain.Google Scholar
Foster, J. 2010. ‘cba to check the spelling’: investigating parser performance on discussion forum posts. In Proceedings of the 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics and Human Language Technologies (NAACL-HLT 2010), CA, USA.Google Scholar
Foster, J. and Andersen, I. 2009. GenERRate: generating errors for use in grammatical error detection. In Proceedings of the NAACL Workshop on Innovative Use of NLP for Building Educational Applications, CO, USA.Google Scholar
Friburger, N. and Maurel, D. 2004. Finite-state transducer cascades to extract named entities in text. Theoretical Computer Science 313 (1): 94104.Google Scholar
Gojenola, K. and Oronoz, M. 2000. Corpus-based syntactic error detection using syntactic patterns. In Proceedings of The NAACL-ANLP00, Student Research Workshop, Seattle, WA, USA.Google Scholar
Golding, A. R. and Schabes, Y. 1996. Combining trigram-based and feature-based methods for context-sensitive spelling correction. In Joshi, A. and Palmer, M. (eds.), Proceedings of the Thirty-Fourth Annual Meeting of the Association for Computational Linguistics, Santa Cruz, CA, pp. 71–8.Google Scholar
Gross, M. 1997. Finite-State Language Processing, Chapter The Construction of Local Grammars, pp. 329–54. Cambridge, MA, USA: The MIT Press.Google Scholar
Gusfield, D. 1997. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge, UK: Cambridge University Press.CrossRefGoogle Scholar
Hashemi, S. S., Cooper, R. and Andersson, R. 2003. Positive grammar checking: a finite state approach. In Proceedings of the 4th International Conference in Computational Linguistics and Intelligent Text Processing (CICLing 2003), Volume 2588 of Lecture Notes in Computer Science, pp. 635–46. Springer-Verlag Berlin Heidelberg.Google Scholar
Heidorn, G., Jensen, K., Miller, L., Byrd, R., and Chodorow, M. 1982. The EPISTLE text-critiquing system. IBM Systems Journal 21 (3).Google Scholar
Hulden, M. 2009a. Finite-State Machine Construction Methods and Algorithms for Phonology and Morphology. Ph. D. thesis, University of Arizona.Google Scholar
Hulden, M. 2009b. Foma: a finite-state compiler and library. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguitics, Athens, Greece, pp. 2932.Google Scholar
Karlsson, F., Voutilainen, A., Heikkila, J. and Anttila, A. 1995. Constraint Grammar: Language-Independent System for Parsing Unrestricted Text. Berlin: Prentice-Hall.Google Scholar
Karttunen, L. 2006. Numbers and finnish numerals. In A Man of Measure Festschrift in Honour of Fred Karlsson on his 60th Birthday, a special supplement to SKY Journal of Linguistics 19: 407–21.Google Scholar
Karttunen, L., Gaál, T., and Kempe, A. 1997. Xerox Finite State Tool. Manual, Xerox Research Centre Europe, Grenoble, Maylan, France.Google Scholar
Koskenniemi, K. 1983. Two-level Morphology: A General Computational Model for Word-form Recognition and Production. Helsinki, Finland: University of Helsinki.Google Scholar
Kukich, K. 1992. Techniques for automatically correcting words in text. ACM Computing Surveys 24 (4): 377439.Google Scholar
Laporte, E. 1998. Lexical disambiguation for fine-grained tagsets. In J. G. et al. (ed.), The Tbilisi Symposium in Logic, Language and Computation: Selected Papers, Studies in Logic, Language and Information, pp. 203–10. Cambridge, UK: Cambridge University Press; and Stanford, CA: CSLI and FoLLI.Google Scholar
Mangu, L. and Brill, E. 1997. Automatic rule acquisition for spelling correction. In ICML '97: Proceedings of the 14th International Conference on Machine Learning, San Francisco, CA, USA, pp. 187–94. Morgan Kaufmann.Google Scholar
Min, K. and Wilson, W. H. 1998. Integrated control of chart items for error repair. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and the 17th International Conference on Computational Linguistics(ACL-COLING 1998), pp. 862–68.Google Scholar
Mohri, M., Pereira, F. C. N. and Riley, M. 2000. The design principles of a weighted finite-state transducer library. Theoretical Computer Science 231 (1): 1732.Google Scholar
Otaegi, M. 2006. Datak, orduak eta zenbakiak euskaraz. Technical report, University of the Basque Country, Donostia-San Sebastin, Spain.Google Scholar
Schmid, H. 2005. A programming language for finite-state transducers. In Yli-Jyrä, A., Karttunen, L., and Karhumäki, J. (eds.), Proceedings of Finite-State Methods and Natural Language Processing (FSMNLP 2005), Volume 4002. Springer-Verlag, Berlin Heidelberg.Google Scholar
Sjöbergh, J. and Knutsson, O. 2005. Faking errors to avoid making errors: very weakly supervised learning for error detection in writing. In Proceedings of Recent Advances in Natural Language Processing (RANLP 2005), Borovets, Bulgaria, pp. 506–12.Google Scholar
Traboulsi, H. 2009. Arabic named entity extraction: a local grammar-based approach. In Proceedings of the 2009 International Multiconference on Computer Science and Information Technology (IMCSIT 2009), Mragowo, Polland, pp. 139–43.CrossRefGoogle Scholar
Wagner, J. and Foster, J. 2009. The effect of correcting grammatical errors on parse probabilities. In Proceedings of the 11th International Conference on Parsing Technologies (IWPT'09), Paris, France.Google Scholar
Weischedel, R. and Sondheimer, N. 1983. Meta-rules as a basis for processing Ill-formed input. American Journal of Computational Linguistics 9 (3–4): 161–77.Google Scholar
Zubiri, I. 1994. Gramática Didáctica del Euskara. Bilbo: Didaktiker.Google Scholar