Hostname: page-component-cd9895bd7-gbm5v Total loading time: 0 Render date: 2024-12-26T09:19:38.218Z Has data issue: false hasContentIssue false

Inductive probabilistic taxonomy learning using singular value decomposition

Published online by Cambridge University Press:  05 January 2011

FRANCESCA FALLUCCHI
Affiliation:
Department of Computer Science, Systems and Production, University of Rome “Tor Vergata”, Italy emails: fallucchi@info.uniroma2.it, zanzotto@info.uniroma2.it
FABIO MASSIMO ZANZOTTO
Affiliation:
Department of Computer Science, Systems and Production, University of Rome “Tor Vergata”, Italy emails: fallucchi@info.uniroma2.it, zanzotto@info.uniroma2.it

Abstract

Capturing word meaning is one of the challenges of natural language processing (NLP). Formal models of meaning, such as networks of words or concepts, are knowledge repositories used in a variety of applications. To be effectively used, these networks have to be large or, at least, adapted to specific domains. Learning word meaning from texts is then an active area of research. Lexico-syntactic pattern methods are one of the possible solutions. Yet, these models do not use structural properties of target semantic relations, e.g. transitivity, during learning. In this paper, we propose a novel lexico-syntactic pattern probabilistic method for learning taxonomies that explicitly models transitivity and naturally exploits vector space model techniques for reducing space dimensions. We define two probabilistic models: the direct probabilistic model and the induced probabilistic model. The first is directly estimated on observations over text collections. The second uses transitivity on the direct probabilistic model to induce probabilities of derived events. Within our probabilistic model, we also propose a novel way of using singular value decomposition as unsupervised method for feature selection in estimating direct probabilities. We empirically show that the induced probabilistic taxonomy learning model outperforms state-of-the-art probabilistic models and our unsupervised feature selection method improves performance.

Type
Papers
Copyright
Copyright © Cambridge University Press 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Agirre, E., and Rigau, G. 1996. Word sense disambiguation using conceptual density. In Proceedings of the 16th Conference on Computational linguistics, Morristown, NJ, USA, pp. 1622. Stroudsburg PA: Association for Computational Linguistics.CrossRefGoogle Scholar
Baroni, M., Bernardini, S., Ferraresi, A., and Zanchetta, E. 2009. The wacky wide web: a collection of very large linguistically processed web-crawled corpora. Language Resources and Evaluation 43 (Part 3): 209226.Google Scholar
Caron, D., Hospital, W., and Corey, P. N. 1988. Variance estimation of linear regression coefficients in complex sampling situation. In Proceedings of the Survey Research Methods Section, American Statistical Association, pp. 688694.Google Scholar
Chklovski, T., and Pantel, P. 2004. VerbOCEAN: mining the web for fine-grained semantic verb relations. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcellona, Spain.Google Scholar
Cimiano, P., Hotho, A., and Staab, S. 2005. Learning concept hierarchies from text corpora using formal concept analysis. Journal of Artificial Intelligence research 24: 305339.CrossRefGoogle Scholar
Clark, P., Fellbaum, C., and Hobbs, J. 2008. Using and extending wordnet to support question-answering. In Proceedings of Fourth Global WordNet Conference (GWC'08), January 2008, Szeged, Hungary.Google Scholar
Corley, C., and Mihalcea, R. 2005. Measuring the semantic similarity of texts. In Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment, Ann Arbor, Michigan, June 2005, pp. 1318. Stroudsburg PA: Association for Computational Linguistics.CrossRefGoogle Scholar
Cortes, C., and Vapnik, V. 1995. Support vector networks. Machine Learning 20: 125.Google Scholar
Cox, D. R. 1958. The regression analysis of binary sequences. Journal of the Royal Statistical Society. Series B (Methodological) 20 (2): 215242.Google Scholar
Dhillon, I. S., Mallela, S., Guyon, I., and Elisseeff, A. 2003. A divisive information-theoretic feature clustering algorithm for text classification. Journal of Machine Learning Research 3: 2003.Google Scholar
Geffet, M., and Dagan, I. 2005. The distributional inclusion hypotheses and lexical entailment. In ACL '05: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Morristown, NJ, USA, pp. 107114. Stroudsburg PA: Association for Computational Linguistics.Google Scholar
Golub, G., and Kahan, W. 1965. Calculating the singular values and pseudo-inverse of a matrix. Journal of the Society for Industrial and Applied Mathematics, Series B: Numerical Analysis 2 (2): 205224.Google Scholar
Guyon, I., and Elisseeff, A. 2003, March. An introduction to variable and feature selection. Journal of Machine Learning Research 3: 11571182.Google Scholar
Harris, Z. 1964. Distributional structure. In Katz, J. J. and Fodor, J. A. (eds.), The Philosophy of Linguistics. New York: Oxford University Press.Google Scholar
Hearst, M. A. 1992. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 15th International Conference on Computational Linguistics (CoLing-92), Nantes, France.Google Scholar
Kahn, J., Linial, N., and Samorodnitsky, A. 1993. Inclusion–exclusion: exact and approximate. Combinatorica 16: 465477.Google Scholar
Lapata, M., and Keller, F. 2004. The web as a baseline: evaluating the performance of unsupervised web-based models for a range of nlp tasks. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Boston, MA.Google Scholar
Lin, D., and Pantel, P. 2001. DIRT-discovery of inference rules from text. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD-01), San Francisco, CA.Google Scholar
Liu, B. 2007. Web Data Mining: Exploring Hyperlinks, Contents and Usage Data. New York: Springer: Data-Centric Systems and Applications.Google Scholar
Maedche, A., and Staab, S. 2002. Measuring similarity between ontologies. In EKAW '02: Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management. Ontologies and the Semantic Web, pp. 251263. London, UK: Springer-Verlag.Google Scholar
McCarthy, D., Koeling, R., Weeds, J., and Carroll, J. 2004. Finding predominant word senses in untagged text. In ACL '04: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, Morristown, NJ, USA, p. 279. Stroudsburg PA: Association for Computational Linguistics.Google Scholar
Medche, A. 2002. Ontology Learning for the Semantic Web of Engineering and Computer Science, vol. 665. London: Kluwer International.CrossRefGoogle Scholar
Miller, G. A. 1995, November. WordNet: a lexical database for English. Communications of the ACM 38 (11): 3941.Google Scholar
Morin, E. 1999. Extraction de liens sémantiques entre termes à partir de corpus de textes techniques. Ph.D. thesis, Faculté des Sciences et de Techniques, Univesité de Nantes, Nantes, France.Google Scholar
Navigli, R., and Velardi, P. 2004. Learning domain ontologies from document warehouses and dedicated web sites. Computer Linguistics 30 (2): 151179.CrossRefGoogle Scholar
Nelder, J. A., and Wedderburn, R. W. M. 1972. Generalized linear models. Journal of the Royal Statistical Society. Series A (General) 135 (3): 370384.Google Scholar
Padó, S. 2006. User's guide to sigf: significance testing by approximate randomisation. http://www.nlpado.de/~sebastian/sigf.html.Google Scholar
Pantel, P., and Pennacchiotti, M. 2006. Espresso: leveraging generic patterns for automatically harvesting semantic relations. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Sydney, Australia, July 2006, pp. 113120. Stroudsburg PA: Association for Computational Linguistics.Google Scholar
Pekar, V., and Staab, S. 2002. Taxonomy learning: factoring the structure of a taxonomy into a semantic classification decision. Proceedings of the Nineteenth Conference on Computational Linguistics 2: 786792.Google Scholar
Penrose, R. 1955. A generalized inverse for matrices. In Mathematical Proceedings of the Cambridge Philosophical Society (1955), 51: 406413.Google Scholar
Ravichandran, D., and Hovy, E. 2002. Learning surface text patterns for a question answering system. In Proceedings of the 40th ACL Meeting, Philadelphia, Pennsilvania.Google Scholar
Resnik, P. 1993. Selection and Information: A Class-Based Approach to Lexical Relationships. Ph.D. thesis, Department of Computer and Information Science, University of Pennsylvania, PA.Google Scholar
Robison, H. R. 1970. Computer-detectable semantic structures. Information Storage and Retrieval 6 (3): 273288.Google Scholar
Snow, R., Jurafsky, D., and Ng, A. Y. 2006. Semantic taxonomy induction from heterogenous evidence. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, Sydney, July 2006, pp. 801–808.Google Scholar
Szpektor, I., Tanev, H., Dagan, I., and Coppola, B. 2004. Scaling web-based acquisition of entailment relations. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcellona, Spain.Google Scholar
Toumouth, A., Lehireche, A., Widdows, D., and Malki, M. 2006. Adapting wordnet to the medical domain using lexicosyntactic patterns in the ohsumed corpus. In AICCSA '06: Proceedings of the IEEE International Conference on Computer Systems and Applications, Washington, DC, USA, pp. 10291036. Washington, DC: IEEE Computer Society.Google Scholar
Yeh, A. 2000. More accurate tests for the statistical significance of result differences. In Proceedings of the 18th Conference on Computational Linguistics, Morristown, NJ, USA, pp. 947953. Stroudsburg PA: Association for Computational Linguistics.Google Scholar
Yoshida, K., Tsuruoka, Y., Miyao, Y., and Tsujii, J. 2007. Ambiguous part-of-speech tagging for improving accuracy and domain portability of syntactic parsers. In Veloso, M. M. (ed.), IJCAI, pp. 1783–1788.Google Scholar
Zanzotto, F. M., Pennacchiotti, M., and Moschitti, A. 2009. A machine learning approach to textual entailment recognition. Journal of Natural Language Engineering 15–04: 551582.Google Scholar
Zanzotto, F. M., Pennacchiotti, M., and Pazienza, M. T. 2006. Discovering asymmetric entailment relations between verbs using selectional preferences. In ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, Morristown, NJ, USA, July 2006, pp. 849856. Stroudsburg PA: Association for Computational Linguistics.Google Scholar