Hostname: page-component-cd9895bd7-jkksz Total loading time: 0 Render date: 2024-12-26T08:55:50.897Z Has data issue: false hasContentIssue false

MaltOptimizer: Fast and effective parser optimization

Published online by Cambridge University Press:  24 February 2014

MIGUEL BALLESTEROS
Affiliation:
Natural Language Processing Group, Pompeu Fabra University, Tànger 122-140, 08018 Barcelona, Spain e-mail: miguel.ballesteros@upf.edu
JOAKIM NIVRE
Affiliation:
Department of Linguistics and Philology, Uppsala University, Box 635, 75126 Uppsala, Sweden e-mail: joakim.nivre@lingfil.uu.se

Abstract

Statistical parsers often require careful parameter tuning and feature selection. This is a nontrivial task for application developers who are not interested in parsing for its own sake, and it can be time-consuming even for experienced researchers. In this paper we present MaltOptimizer, a tool developed to automatically explore parameters and features for MaltParser, a transition-based dependency parsing system that can be used to train parser's given treebank data. MaltParser provides a wide range of parameters for optimization, including nine different parsing algorithms, an expressive feature specification language that can be used to define arbitrarily rich feature models, and two machine learning libraries, each with their own parameters. MaltOptimizer is an interactive system that performs parser optimization in three stages. First, it performs an analysis of the training set in order to select a suitable starting point for optimization. Second, it selects the best parsing algorithm and tunes the parameters of this algorithm. Finally, it performs feature selection and tunes machine learning parameters. Experiments on a wide range of data sets show that MaltOptimizer quickly produces models that consistently outperform default settings and often approach the accuracy achieved through careful manual optimization.

Type
Articles
Copyright
Copyright © Cambridge University Press 2014 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Agirre, E., Atutxa, A., and Sarasola, K., 12/2012. Contribution of complex lexical information to solve syntactic ambiguity in Basque. In Proceedings of the 24th International Conference on Computational Linguistics (COLING), Mumbai, India, pp. 97113.Google Scholar
Agirre, E., Bengoetxea, K., Gojenola, K., and Nivre, J., 2011. Improving dependency parsing with semantic classes. In The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL), Stroudsburg, PA, pp. 699703.Google Scholar
Ballesteros, M., 2013a. Effective morphological feature selection with MaltOptimizer at the SPMRL 2013 shared task. In Proceedings of the Fourth Workshop on Statistical Parsing of Morphologically Rich Languages (SPMRL), Stroudsburg, PA, pp. 6370.Google Scholar
Ballesteros, M. 2013b. Exploring automatic feature selection for transition-based dependency parsing. Procesamiento del Lenguaje Natural 51: 119–26.Google Scholar
Ballesteros, M., and Nivre, J. 2012a. MaltOptimizer: a system for MaltParser optimization. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC), Paris, pp. 2757–63.Google Scholar
Ballesteros, M., Gómez-Rodríguez, C., and Nivre, J., 2012. Optimizing planar and 2-planar parsers with MaltOptimizer. Procesamiento del Lenguaje Natural 49: 171–8.Google Scholar
Ballesteros, M., Mille, S., and Burga, A. 2013. Exploring morphosyntactic annotation over a Spanish corpus for dependency parsing. In Proceedings of the Second International Conference on Dependency Linguistics (DEPLING), Prague, pp. 1322.Google Scholar
Ballesteros, M., and Nivre, J., 2012b. MaltOptimizer: an optimization tool for MaltParser. In Proceedings of the System Demonstration Session of the Thirteenth Conference of the European Chapter of the Association for Computational Linguistics (EACL), Stroudsburg, PA, pp. 5862.Google Scholar
Bohnet, B., and Nivre, J., 2012. A transition-based system for joint part-of-speech tagging and labeled non-projective dependency parsing. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Stroudsburg, PA, pp. 1455–65.Google Scholar
Buchholz, S., and Marsi, E., 2006. CoNLL-X shared task on multilingual dependency parsing. In Proceedings of the 10th Conference on Computational Natural Language Learning (CoNLL), Stroudsburg, PA, pp. 149–64.Google Scholar
Cassel, S. 2009. MaltParser and LIBLINEAR: Transition-Based Dependency Parsing with Linear Classification for Feature Model Optimization. M.Phil. thesis, Uppsala University, Uppsala, Sweden.Google Scholar
Chang, C.-C., and Lin, C.-J. 2011. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2: 127. Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm.CrossRefGoogle Scholar
Choi, J. D., and McCallum, A. 2013. Transition-based dependency parsing with selectional branching. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL), Stroudsburg, PA, pp. 1052–62.Google Scholar
Cirik, V., and Sensoy, H., 2013. The AI-KU system at the SPMRL 2013 shared task: unsupervised features for dependency parsing. In Proceedings of the Fourth Workshop on Statistical Parsing of Morphologically Rich Languages (SPMRL), Stroudsburg, PA, pp. 7985.Google Scholar
Covington, M. A., 2001. A fundamental algorithm for dependency parsing. In Proceedings of the 39th Annual ACM Southeast Conference, New York, NY, pp. 95102.Google Scholar
Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., and Singer, Y. 2006. Online passive-aggressive algorithms. Journal of Machine Learning Research, 7, 551–85. MIT Press.Google Scholar
Daelemans, W., Hoste, V., Meulder, F. De, and Naudts, B. 2003. Combined optimization of feature selection and algorithm parameters in machine learning of language. Proceedings of the 14th European Conference on Machine Learning (ECML), Berlin, pp. 8495.Google Scholar
Das, A., and Kempe, D., 2011. Submodular meets spectral: greedy algorithms for subset selection, sparse approximation and dictionary selection. In Proceedings of the 28th International Conference on Machine Learning (ICML), Madison, WI, pp. 1057–64.Google Scholar
Doraisamy, S., Golzari, S., Norowi, N. M., Sulaiman, Md N., and Udzir, N. I. 2008. A study on feature selection and classification techniques for automatic genre classification of traditional malay music. Proceedings of the Ninth International Conference on Music Information Retrieval (ISMIR), Philadelphia, PA, pp. 331–6.Google Scholar
Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., and Lin, C.-J., 2008. Liblinear: a library for large linear classification. Journal of Machine Learning Research 9: 1871–4.Google Scholar
Goenaga, I., Gojenola, K., and Ezeiza, N. 2013. Exploiting the contribution of morphological information to parsing: the BASQUE TEAM system in the SPMRL 2013 shared task. In Proceedings of the Fourth Workshop on Statistical Parsing of Morphologically Rich Languages (SPMRL), Stroudsburg, PA, pp. 71–7.Google Scholar
Gómez-Rodríguez, C., and Fernández-González, D. 2012. Dependency parsing with undirected graphs. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL), Stroudsburg, PA, pp. 6676.Google Scholar
Gómez-Rodríguez, C., and Nivre, J., 2010. A transition-based parser for 2-planar dependency structures. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL), Stroudsburg, PA, pp. 1492–501.Google Scholar
Guyon, I., and Elisseeff, A. 2003. An introduction to variable and feature selection. Journal of Machine Learning Research 3: 1157–82. MIT Press.Google Scholar
Hall, J 2008. Transition-Based Natural Language Parsing with Dependency and Constituency Representations. PhD thesis, Växjö University, Sweden.Google Scholar
Hall, J., Nilsson, J., Nivre, J., Eryiğit, G., Megyesi, B., Nilsson, M., and Saers, M., 2007. Single malt or blended? A study in multilingual parser optimization. In Proceedings of the CoNLL Shared Task at the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CONLL), Stroudsburg, PA, pp. 933–9.Google Scholar
Kool, A., Zavrel, J., and Daelemans, W. 2000. Simultaneous feature selection and parameter optimization for memory-based natural language processing. In Proceedings of the Tenth Belgian-Dutch Conference on Machine Learning (BENELEARN), Tilburg, pp. 93100.Google Scholar
Korycinski, D., Crawford, M., Barnes, J. W., and Ghosh, J. 2003. Adaptive feature selection for hyperspectral data analysis. In Proceedings of the SPIE Conference on Image and Signal Processing for Remote Sensing IX, Bellingham, WA, pp. 213–25.Google Scholar
Mambrini, F., and Passarotti, M. C. 2012. Will a parser overtake Achilles? First experiments on parsing the ancient Greek dependency treebank. In Proceedings of the Eleventh International Workshop on Treebanks and Linguistic Theories (TLT11), Sofia, pp. 133–44.Google Scholar
McCallum, A 2003. Efficiently inducing features of conditional random fields. In Proceedings of the Conference on Uncertainty in AI, Burlington, MA, pp. 403–10.Google Scholar
McDonald, R 2006. Discriminative Learning and Spanning Tree Algorithms for Dependency Parsing. PhD thesis, University of Pennsylvania, Philadelphia, PA.Google Scholar
McDonald, R., Lerman, K., and Pereira, F., 2006. Multilingual dependency analysis with a two-stage discriminative parser. In Proceedings of the 10th Conference on Computational Natural Language Learning (CoNLL), Stroudsburg, PA, pp. 216–20.Google Scholar
McDonald, R., and Nivre, J. 2007. Characterizing the errors of data-driven dependency parsing models. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Stroudsburg, PA, pp. 122–31.Google Scholar
Mitchell, T. M. 1997. Machine Learning. McGraw-Hill, New York, NY.Google Scholar
Nilsson, P., and Nugues, P. 2010. Automatic discovery of feature sets for dependency parsing. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING), Beijing, China, pp. 824–32.Google Scholar
Nivre, J. 2003. An efficient algorithm for projective dependency parsing. In Proceedings of the 8th International Workshop on Parsing Technologies (IWPT), pp. 149–60.Google Scholar
Nivre, J., 2004. Incrementality in deterministic dependency parsing. In Proceedings of the Workshop on Incremental Parsing: Bringing Engineering and Cognition Together at the 42nd Annual Meeting of the Association for Computational Linguistics (ACL), Stroudsburg, PA, pp. 50–7.Google Scholar
Nivre, J., 2007. Incremental non-projective dependency parsing. In Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT), Stroudsburg, PA, pp. 396403.Google Scholar
Nivre, J. 2008. Algorithms for deterministic incremental dependency parsing. Computational Linguistics 34 (4): 513–53. MIT Press.CrossRefGoogle Scholar
Nivre, J., 2009. Non-projective dependency parsing in expected linear time. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP (ACL-IJCNLP), Stroudsburg, PA, pp. 351–9.Google Scholar
Nivre, J., and Hall, J. 2010. A Quick Guide to MaltParser Optimization. Tech. rept. 1. MaltParser (maltparser.org).Google Scholar
Nivre, J., Hall, J., Kübler, S., McDonald, R., Nilsson, J., Riedel, S., and Yuret, D., 2007a. The CoNLL 2007 shared task on dependency parsing. In Proceedings of the CoNLL Shared Task at the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CONLL), Stroudsburg, PA, pp. 915–32.Google Scholar
Nivre, J., Hall, J., and Nilsson, J. 2006b. Maltparser: a data-driven parser-generator for dependency parsing. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC), Paris, pp. 2216–9.Google Scholar
Nivre, J., Hall, J., Nilsson, J., Chanev, A., Eryiǧit, G., Kübler, S., Marinov, S., and Marsi, E., 2007b. Maltparser: a language-independent system for data-driven dependency parsing. Natural Language Engineering 13: 95135.CrossRefGoogle Scholar
Nivre, J., Hall, J., Nilsson, J., Eryiğit, G., and Marinov, S., 2006a. Labeled pseudo-projective dependency parsing with support vector machines. In Proceedings of the 10th Conference on Computational Natural Language Learning (CoNLL), Stroudsburg, PA, pp. 221–5.Google Scholar
Nivre, J., Kuhlmann, M., and Hall, J. 2009. An improved oracle for dependency parsing with online reordering. In Proceedings of the 11th International Conference on Parsing Technologies (IWPT), Stroudsburg, PA, pp. 73–6.CrossRefGoogle Scholar
Nivre, J., and Nilsson, J., 2005. Pseudo-projective dependency parsing. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL), Stroudsburg, PA, pp. 99106.Google Scholar
Prudhvi, Kosaraju, Kesidi, S. R., Ainavolu, V. B. R. and Kukkadapu, P. 2010. Experiments on Indian language dependency parsing. In Proceedings of the ICON-2010 Tools Contest on Indian Language Dependency Parsing, Kharagpur, India, pp. 4045.Google Scholar
Seddah, D., Tsarfaty, R., Kübler, S., Candito, M., Choi, J. D., Farkas, R., Foster, J., Goenaga, I., Galletebeitia, K. G., Goldberg, Y., Green, S., Habash, N., Kuhlmann, M., Maier, W., Marton, Y., Nivre, J., Przepiórkowski, A., Roth, R., Seeker, W., Versley, Y., Vincze, V., Woliński, M., and Wróblewska, A. 2013. Overview of the SPMRL 2013 shared task: a cross-framework evaluation of parsing morphologically rich languages. In Proceedings of the Fourth Workshop on Statistical Parsing of Morphologically Rich Languages (SPMRL), Stroudsburg, PA, pp. 146–82.Google Scholar
Seraji, M., Megyesi, B., and Nivre, J., 2012. Dependency parsers for persian. In Proceedings of 10th Workshop on Asian Language Resources, at 24th International Conference on Computational Linguistics (COLING), Mumbai, India, pp. 3543.Google Scholar
Tsarfaty, R., Nivre, J., and Andersson, E., 2012. Cross-framework evaluation for statistical parsing. In Proceedings of the European Chapter of the Association for Computational Linguistics (EACL), Stroudsburg, PA, pp. 4454.Google Scholar
van den Bosch, A. 2004. Wrapped progressive sampling search for optimizing learning algorithm parameters. In Proceedings of the 16th Belgian-Dutch Conference on Artificial Intelligence, Groningen, The Netherlands, pp. 219–26.Google Scholar
Yli-Jyrä, A. M. 2003. Multiplanarity – a model for dependency structures in treebanks. In Proceedings of the Second Workshop on Treebanks and Linguistic Theories (TLT), Mathematical Modelling in Physics, Engineering and Cognitive Sciences, vol. 9, pp. 189200. Växjö, Sweden: Växjö University Press.Google Scholar
Zhang, Y., and Nivre, J., 2011. Transition-based dependency parsing with rich non-local features. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL), Stroudsburg, PA, pp. 188–93.Google Scholar