Hostname: page-component-78c5997874-4rdpn Total loading time: 0 Render date: 2024-11-10T14:08:28.090Z Has data issue: false hasContentIssue false

Evaluation of taxonomic and neural embedding methods for calculating semantic similarity

Published online by Cambridge University Press:  28 September 2021

Dongqiang Yang*
Affiliation:
School of Computer Science and Technology, Shandong Jianzhu University, Jinan, China
Yanqin Yin
Affiliation:
School of Computer Science and Technology, Shandong Jianzhu University, Jinan, China
*
*Corresponding author. E-mail: ydq@sdjzu.edu.cn

Abstract

Modelling semantic similarity plays a fundamental role in lexical semantic applications. A natural way of calculating semantic similarity is to access handcrafted semantic networks, but similarity prediction can also be anticipated in a distributional vector space. Similarity calculation continues to be a challenging task, even with the latest breakthroughs in deep neural language models. We first examined popular methodologies in measuring taxonomic similarity, including edge-counting that solely employs semantic relations in a taxonomy, as well as the complex methods that estimate concept specificity. We further extrapolated three weighting factors in modelling taxonomic similarity. To study the distinct mechanisms between taxonomic and distributional similarity measures, we ran head-to-head comparisons of each measure with human similarity judgements from the perspectives of word frequency, polysemy degree and similarity intensity. Our findings suggest that without fine-tuning the uniform distance, taxonomic similarity measures can depend on the shortest path length as a prime factor to predict semantic similarity; in contrast to distributional semantics, edge-counting is free from sense distribution bias in use and can measure word similarity both literally and metaphorically; the synergy of retrofitting neural embeddings with concept relations in similarity prediction may indicate a new trend to leverage knowledge bases on transfer learning. It appears that a large gap still exists on computing semantic similarity among different ranges of word frequency, polysemous degree and similarity intensity.

Type
Article
Copyright
© The Author(s), 2021. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Agirre, E., Alfonseca, E., Hall, K., Strakova, J., Pasca, M. and Soroa, A. (2009). A Study on Similarity and Relatedness Using Distributional and Wordnet-Based Approaches. The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Boulder Colorado. Association for Computational Linguistics, pp. 19–27.CrossRefGoogle Scholar
Agirre, E., Arregi, X., Artola, X., Díaz de Ilarraza, A. and Sarasola, K. (1994). Conceptual Distance and Automatic Spelling Correction. Workshop on Computational Linguistics for Speech and Handwriting Recognition, Leeds, UK. Nottingham Trent University, Faculty of Engineering & Computing.Google Scholar
Baker, CF., Fillmore, C.J. and Lowe, J.B. (1998). The Berkeley Framenet Project. The 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Montreal, Quebec, Canada. Association for Computational Linguistics, pp. 86–90.Google Scholar
Banjade, R., Maharjan, N., Niraula, N., Rus, V. and Gautam, D. (2015). Lemon and Tea Are Not Similar: Measuring Word-to-Word Similarity by Combining Different Methods. International Conference on Intelligent Text Processing and Computational Linguistics. pp. 335–346.CrossRefGoogle Scholar
Baroni, M., Dinu, G. and Kruszewski, G. (2014). Don’t Count, Predict! A Systematic Comparison of Context-Counting Vs. Context-Predicting Semantic Vectors. The 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, Maryland. Association for Computational Linguistics, pp. 238–247.Google Scholar
Bengio, Y., Ducharme, J., Vincent, P. and Janvin, C. (2003). “A Neural Probabilistic Language Model.” Journal of Machine Learning Research 3, 11371155.Google Scholar
Bojanowski, P., Grave, E., Joulin, A. and Mikolov, T. (2017). “Enriching Word Vectors with Subword Information.Transactions of the Association for Computational Linguistics 5, 135146.CrossRefGoogle Scholar
Bommasani, R., Davis, K. and Cardie, C. (2020). Interpreting Pretrained Contextualized Representations Via Reductions to Static Embeddings. The 58th Annual Meeting of the Association for Computational Linguistics, Online. Association for Computational Linguistics, pp. 4758–4781.Google Scholar
Budanitsky, A. and Hirst, G. (2006). “Evaluating Wordnet-Based Measures of Lexical Semantic Relatedness.Computational Linguistics 32, 1347.CrossRefGoogle Scholar
Bullinaria, J.A. and Levy, J.P. (2006). “Extracting Semantic Representations from Word Co-Occurrence Statistics: A Computational Study.Behavior Research Methods 39, 510526.CrossRefGoogle Scholar
Collins, A.M. and Loftus, E.F. (1975). “A Spreading Activation Theory of Semantic Priming.Psychological Review 82, 407428.CrossRefGoogle Scholar
Collins, A.M. and Quillian, M.R. (1969). “Retrieval Time from Semantic Memory.Journal of Verbal Learning and Verbal Behavior 8, 240247.CrossRefGoogle Scholar
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K. and Kuksa, P. (2011). “Natural Language Processing (Almost) from Scratch.The Journal of Machine Learning Research 12, 24932537.Google Scholar
Curran, J.R. (2003). From Distributional to Semantic Similarity. Ph.D thesis, University of Edinburgh.Google Scholar
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, T.K. and Harshman, R.A. (1990). “Indexing by Latent Semantic Analysis.Journal of the American Society of Information Science 41, 391407.3.0.CO;2-9>CrossRefGoogle Scholar
Devlin, J., Chang, M.W., Lee, K. and Toutanova, K. (2018). Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding. The 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. pp. 4171–4186.Google Scholar
Dinu, G. (2012). Word Meaning in Context: A Probabilistic Model and Its Application to Question Answering. PhD thesis, Saarlan University.Google Scholar
Ethayarajh, K. (2019). How Contextual Are Contextualized Word Representations? Comparing the Geometry of Bert, Elmo, and Gpt-2 Embeddings. The 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). pp. 55–65.CrossRefGoogle Scholar
Faruqui, M., Dodge, J., Jauhar, S.K., Dyer, C., Hovy, E. and Smith, N.A. (2015). Retrofitting Word Vectors to Semantic Lexicons. The 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. pp. 1606–1615.Google Scholar
Fellbaum, C. (1998). Wordnet: An Electronic Lexical Database. Cambridge, MA, The MIT Press.CrossRefGoogle Scholar
Feng, Y., Bagheri, E., Ensan, F. and Jovanovic, J. (2017). “The State of the Art in Semantic Relatedness: A Framework for Comparison.The Knowledge Engineering Review 32, 130.CrossRefGoogle Scholar
Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G. and Ruppin, E. (2001). “Placing Search in Context: The Concept Revisited.ACM Transactions on Information Systems 20, 116131.Google Scholar
Firth, J.R. (1957). A Synopsis of Linguistic Theory 1930–1955. Selected Papers of J.R. Firth 1952–1959. Longman, pp. 1–32.Google Scholar
Gabrilovich, E. and Markovitch, S. (2007). Computing Semantic Relatedness Using Wikipedia-Based Explicit Semantic Analysis. The 20th International Joint Conference for Artificial Intelligence, Hyderabad, India. Morgan Kaufmann, pp. 1606–1611.Google Scholar
Ganitkevitch, J., VanDurme, B. and Callison-Burch, C. (2013). Ppdb: The Paraphrase Database. The 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, Georgia. Association for Computational Linguistics, pp. 758–764.Google Scholar
Gerz, D., Vuli’c, I., Hill, F., Reichart, R. and Korhonen, A. (2016). Simverb-3500: A Large-Scale Evaluation Set of Verb Similarity. The 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Texas. Association for Computational Linguistics, pp. 2173–2182.CrossRefGoogle Scholar
Goldberg, Y. (2019) “Assessing Bert’s Syntactic Abilities.” arXiv:1901.05287.Google Scholar
Goldstone, R.L. (1994). “Similarity, Interactive Activation, and Mapping.Journal of Experimental Psychology: Learning, Memory and Cognition 20, 328.Google Scholar
Guzzi, P.H., Mina, M., Guerra, C. and Cannataro, M. (2011). “Semantic Similarity Analysis of Protein Data: Assessment with Biological Features and Issues.Briefings in Bioinformatics 13, 569585.CrossRefGoogle ScholarPubMed
Harispe, S., Ranwez, S., Janaqi, S. and Montmain, J. (2015). Semantic Similarity from Natural Language and Ontology Analysis, Morgan & Claypool Publishers.CrossRefGoogle Scholar
Harris, Z. (1985). Distributional Structure. The Philosophy of Linguistics. Oxford University Press, pp. 2647.Google Scholar
Hill, F., Reichart, R. and Korhonen, A. (2015). “Simlex-999: Evaluating Semantic Models with Genuine Similarity Estimation.Computational Linguistics 41, 665695.CrossRefGoogle Scholar
Hirst, G. and St-Onge, D. (1997). Lexical Chains as Representations of Context for the Detection and Correction of Malapropisms. Wordnet: An Electronic Lexical Database. The MIT Press.Google Scholar
Howard, J. and Ruder, S. (2018). Universal Language Model Fine-Tuning for Text Classification. ACL. Association for Computational Linguistics.CrossRefGoogle Scholar
Huang, E.H., Socher, R., Manning, C.D. and Ng, A.Y. (2012). Improving Word Representations Via Global Context and Multiple Word Prototypes. The 50th Annual Meeting of the Association for Computational Linguistics, Jeju Island, Korea. Association for Computational Linguistics, pp. 873–882.Google Scholar
Jarmasz, M. and Szpakowicz, S. (2003). Roget’s Thesaurus and Semantic Similarity. Recent Advances in Natural Language Processing (RANLP 2003), Borovets, Bulgaria. John Benjamins Publishing Company pp. 212–219.Google Scholar
Jawahar, G., Sagot, B. and Seddah, D. (2019). What Does Bert Learn About the Structure of Language? The 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy. Association for Computational Linguistics, pp. 3651–3657.CrossRefGoogle Scholar
Jiang, J.J. and Conrath, D.W. (1997). Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. The 10th International Conference on Research in Computational Linguistics (ROCLING), Taiwan. Association for Computational Linguistics, pp. 19–33.Google Scholar
Kilgarriff, A. (1997). “I Don’t Believe in Word Senses.Computers and the Humanities 31, 91113.CrossRefGoogle Scholar
Kilgarriff, A. (2004). How Dominant Is the Commonest Sense of a Word? TSD 2004, Brno, Czech Republic. Springer-Verlag Berlin Heidelberg, pp. 103–112.Google Scholar
Kilgarriff, A. and Yallop, C. (2000). What’s in a Thesaurus? The Second International Conference on Language Resources and Evaluation (LREC-2000), Athens, Greece. European Language Resources Association (ELRA), pp. 1371–1379.Google Scholar
Krizhevsky, A., Sutskever, I. and Hinton, G.E. (2012). “Imagenet Classification with Deep Convolutional Neural Networks.Communications of the ACM 60, 8490.CrossRefGoogle Scholar
Lastra-Díaz, J. and Garcia-Serrano, A. (2015). “A Novel Family of Ic-Based Similarity Measures with a Detailed Experimental Survey on Wordnet.Engineering Applications of Artificial Intelligence 46, 140153.CrossRefGoogle Scholar
, M. and Fokkens, A. (2015). Taxonomy Beats Corpus in Similarity Identification, but Does It Matter? International Conference Recent Advances in NLP 2015. pp. 346–355.Google Scholar
Leacock, C. and Chodorow, M. (1994). Filling in a Sparse Training Space for Word Sense Identification.Google Scholar
Leacock, C. and Chodorow, M. (1998). Combining Local Context and Wordnet Similarity for Word Sense Identification. Wordnet: An Electronic Lexical Database. The MIT Press, pp. 265283.Google Scholar
Levy, O. and Goldberg, Y. (2014a). Dependency-Based Word Embeddings. The 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, Maryland. Association for Computational Linguistics, pp. 302–308.CrossRefGoogle Scholar
Levy, O. and Goldberg, Y. (2014b). Neural Word Embedding as Implicit Matrix Factorization. The 27th International Conference on Neural Information Processing Systems (NIPS), Montreal, Canada. MIT Press, pp. 2177–2185.Google Scholar
Levy, O., Goldberg, Y. and Dagan, I. (2015). “Improving Distributional Similarity with Lessons Learned from Word Embeddings.Transactions of the Association for Computational Linguistics 3, 211225.CrossRefGoogle Scholar
Lin, D. (1997). Using Syntactic Dependency as a Local Context to Resolve Word Sense Ambiguity. The 35th Annual Meeting of the Association for Computational Linguistics, Madrid, Spain. Association for Computational Linguistics, pp. 64–71.CrossRefGoogle Scholar
Lipton, Z.C. and Steinhardt, J. (2018). “Troubling Trends in Machine Learning Scholarship: Some Ml Papers Suffer from Flaws That Could Mislead the Public and Stymie Future Research.Queue 17, 133.Google Scholar
Luong, T., Socher, R. and Manning, C. (2013). Better Word Representations with Recursive Neural Networks for Morphology. The Seventeenth Conference on Computational Natural Language Learning, 104113.Google Scholar
McCann, B., Bradbury, J., Xiong, C. and Socher, R. (2017). Learned in Translation: Contextualized Word Vectors. The 31st International Conference on Neural Information Processing Systems. pp. 6297–6308.Google Scholar
McCarthy, D., Koeling, R. and Weeds, J. (2004a). Ranking Wordnet Senses Automatically.Google Scholar
McCarthy, D., Koeling, R., Weeds, J. and Carroll, J. (2004b). Finding Predominant Senses in Untagged Text. The 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04), Barcelona, Spain. Association for Computational Linguistics, pp. 267–287.CrossRefGoogle Scholar
McHale, M.L. (1998). A Comparison of Wordnet and Roget’s Taxonomy for Measuring Semantic Similarity. Coling-ACLʼ98 Workshop on Usage of WordNet in Natural Language Processing, Montreal, Canada. Association for Computational Linguistics, pp. 115–120.Google Scholar
Mikolov, T., Chen, K., Corrado, G. and Dean, J. (2013a). Efficient Estimation of Word Representations in Vector Space. International Conference on Learning Representations (ICLR) Workshop Track pp. 1301–3781.Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. and Dean, J. (2013b). Distributed Representations of Words and Phrases and Their Compositionality. The 26th International Conference on Neural Information Processing Systems (NIPS), Lake Tahoe, Nevada. Curran Associates Inc., pp. 3111–3119.Google Scholar
Miller, G. (1995). “A Lexical Database for English.Communications of the ACM 38, 3941.CrossRefGoogle Scholar
Miller, G.A. and Charles, W.G. (1991). “Contextual Correlates of Semantic Similarity.Language and Cognitive Processes 6, 128.CrossRefGoogle Scholar
Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D. and Miller, K.J. (1990). “Introduction to Wordnet: An Online Lexical Database.International Journal of Lexicography 3, 235244.CrossRefGoogle Scholar
Mohammad, S.M. and Hirst, G. (2012). “Distributional Measures as Proxies for Semantic Relatedness.” ArXiv abs/1203.1889.Google Scholar
Morris, J. and Hirst, G. (1991). “Lexical Cohesion Computed by Thesaural Relations as an Indicator of the Structure of Text.Computational Linguistics 17, 2148.Google Scholar
Mrkšić, N., Ó Séaghdha, D., Thomson, B., Gasic, M., Rojas-Barahona, L.M., Su, P.-H., Vandyke, D., Wen, T.-H. and Young, S.J. (2016). Counter-Fitting Word Vectors to Linguistic Constraints. The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: 142–148.Google Scholar
Mrkšić, N., Vulić, I., Ó Séaghdha, D., Leviant, I., Reichart, R., Gašić, M., Korhonen, A. and Young, S. (2017). “Semantic Specialization of Distributional Word Vector Spaces Using Monolingual and Cross-Lingual Constraints.Transactions of the Association for Computational Linguistics 5, 309324.CrossRefGoogle Scholar
Navigli, R. and Ponzetto, S.P. (2012a). “Babelnet: The Automatic Construction, Evaluation and Application of a Wide-Coverage Multilingual Semantic Network.Artificial Intelligence 193(Supplement C), 217250.CrossRefGoogle Scholar
Navigli, R. and Ponzetto, S.P. (2012b). Babelrelate! A Joint Multilingual Approach to Computing Semantic Relatedness. The Twenty-Sixth AAAI Conference on Artificial Intelligence, Toronto Ontario Canada. AAAI Press, pp. 108–114.CrossRefGoogle Scholar
Padó, S. and Lapata, M. (2007). “Dependency-Based Construction of Semantic Space Models.Computational Linguistics 33, 161199.CrossRefGoogle Scholar
Panchenko, A. (2013). Similarity Measures for Semantic Relation Extraction. PhD thesis, Université catholique de Louvain.Google Scholar
Pedersen, T., Banerjee, S. and Patwardhan, A. (2005). Maximizing Semantic Relatedness to Perform Word Sense Disambiguation. Supercomputing Institute Research Report UMSI, University of Minnesota.Google Scholar
Pedersen, T., Pakhomov, S.V.S., Patwardhan, S. and Chute, C.G. (2007). “Measures of Semantic Similarity and Relatedness in the Biomedical Domain.Journal of Biomedical Informatics 40, 288299.CrossRefGoogle ScholarPubMed
Pedersen, T., Patwardhan, S. and Michelizzi, J. (2004). Wordnet::Similarity - Measuring the Relatedness of Concepts. The Nineteenth National Conference on Artificial Intelligence (AAAI-04), San Jose, CA. AAAI Press, pp. 1024–1025.Google Scholar
Pennington, J., Socher, R. and Manning, C.D. (2014). Glove: Global Vectors for Word Representation. The 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). pp. 1532–1543.CrossRefGoogle Scholar
Peters, M., Neumann, M., Zettlemoyer, L. and Yih, W.-T. (2018b). Dissecting Contextual Word Embeddings: Architecture and Representation. The 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium. Association for Computational Linguistics, pp. 1499–1509.CrossRefGoogle Scholar
Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K. and Zettlemoyer, L. (2018a). Deep Contextualized Word Representations. The 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. pp. 2227–2237.CrossRefGoogle Scholar
Pilehvar, M.T., Kartsaklis, D., Prokhorov, V. and Collier, N. (2018). Card-660: Cambridge Rare Word Dataset - a Reliable Benchmark for Infrequent Word Representation Models. The 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium. Association for Computational Linguistics, pp. 1391–1401.CrossRefGoogle Scholar
Ponti, E.M., Vulić, I., Glavaš, G., Mrkšić, N. and Korhonen, A. (2018). Adversarial Propagation and Zero-Shot Cross-Lingual Transfer of Word Vector Specialization. The 2018 Conference on Empirical Methods in Natural Language Processing. pp. 282–293.CrossRefGoogle Scholar
Quillian, M.R. (1967). “Word Concepts: A Theory and Simulation of Some Basic Semantic Capabilities.Behavioral Science 12, 410430.CrossRefGoogle ScholarPubMed
Quillian, M.R. (1968). Semantic Memory. Semantic Information Processing. The MIT Press, pp. 227270.Google Scholar
Rada, R., Mili, H., Bicknell, E. and Blettner, M. (1989). “Development and Application of a Metric on Semantic Nets.IEEE Transactions on Systems, Man and Cybernetics 19, 1730.CrossRefGoogle Scholar
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D. and Sutskever, I. (2018) “Language Models Are Unsupervised Multitask Learners.”Google Scholar
Rajpurkar, P., Zhang, J., Lopyrev, K. and Liang, P. (2016). Squad: 100,000+ Questions for Machine Comprehension of Text. The 2016 Conference on Empirical Methods in Natural Language Processing. pp. 2383–2392.CrossRefGoogle Scholar
Raunak, V., Gupta, V. and Metze, F. (2019). Effective Dimensionality Reduction for Word Embeddings. The 4th Workshop on Representation Learning for NLP (RepL4NLP-2019), Florence, Italy. Association for Computational Linguistics, pp. 235–243.CrossRefGoogle Scholar
Recski, G., Iklódi, E., Pajkossy, K. and Kornai, A. (2016). Measuring Semantic Similarity of Words Using Concept Networks. The 1st Workship on Representation Learning for NLP. pp. 193–200.CrossRefGoogle Scholar
Resnik, P. and Diab, M. (2000). Measuring Verb Similarity. The 22nd Annual Meeting of the Cognitive Science Society (CogSci 2000), Philadelphia, Pennsylvania, USA. LAWRENCE ERLBAUM ASSOCIATES, pp. 399–404.Google Scholar
Resnik, R. (1995). Using Information Content to Evaluate Semantic Similarity in a Taxonomy. The 1995 International Joint Conference on AI (IJCAI-95), Montreal, Canada. International Joint Conferences on Artificial Intelligence, pp. 448–453.Google Scholar
Rogers, A., Kovaleva, O. and Rumshisky, A. (2020) “A Primer in Bertology: What We Know About How Bert Works.” arXiv:2002.12327.CrossRefGoogle Scholar
Rubenstein, H. and Goodenough, J.B. (1965). “Contextual Correlates of Synonymy.Communications of the ACM 8, 627633.CrossRefGoogle Scholar
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C. and Fei-Fei, L. (2014). “Imagenet Large Scale Visual Recognition Challenge.International Journal of Computer Vision 115, 211252.CrossRefGoogle Scholar
Sahlgren, M. (2006). The Word-Space Model: Using Distributional Analysis to Represent Syntagmatic and Paradigmatic Relations between Words in High-Dimensional Vector Spaces. Ph.D thesis, Stockholm University.Google Scholar
Sánchez, D., Batet, M. and Isern, D. (2011). “Ontology-Based Information Content Computation.Knowledge-Based Systems 24, 297303.CrossRefGoogle Scholar
Seco, N., Veale, T. and Hayes, J. (2004). An Intrinsic Information Content Metric for Semantic Similarity in Wordnet. The 16th European Conference on Artificial Intelligence, Valencia, Spain. IOS Press, pp. 1089–1090.Google Scholar
Shi, W., Chen, M., Zhou, P. and Chang, K.-W. (2019). Retrofitting Contextualized Word Embeddings with Paraphrases. The 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China. Association for Computational Linguistics, pp. 1198–1203.CrossRefGoogle Scholar
Speer, R. and Chin, J. (2016) “An Ensemble Method to Produce High-Quality Word Embeddings.” arXiv:1604.01692.Google Scholar
Speer, R. and Havasi, C. (2012). Representing General Relational Knowledge in Conceptnet 5. The Eighth International Conference on Language Resources and Evaluation (LRECʼ12), Istanbul, Turkey. European Language Resources Association (ELRA), pp. 3679–3686.Google Scholar
Strube, M. and Ponzetto, S.P. (2006). Wikirelate! Computing Semantics Relatedness Using Wikipedia. The 21st National Conference on Artificial Intelligence, Boston, Mass. AAAI Press, pp. 1219–1224.Google Scholar
Sussna, M. (1993). Word Sense Disambiguation for Free-Text Indexing Using a Massive Semantic Network. The Second International Conference on Information and Knowledge Management (CKIM), Washington, D.C., United States. ACM, pp. 67–74.CrossRefGoogle Scholar
Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T. and Qin, B. (2014). Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification. The 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, Maryland. Association for Computational Linguistics, pp. 1555–1565.CrossRefGoogle Scholar
Tenney, I., Xia, P., Chen, B., Wang, A., Poliak, A., McCoy, R.T., Kim, N., Van Durme, B., Bowman, S.R., Das, D. and Pavlick, E. (2019). What Do You Learn from Context? Probing for Sentence Structure in Contextualized Word Representations. The 7th International Conference on Learning Representations.Google Scholar
Torgerson, W.S. (1965). “Multidimensional Scaling of Similarity.Psychometrika 30, 379393.CrossRefGoogle ScholarPubMed
Turney, P.D. (2001). Mining the Web for Synonyms: Pmi-Ir Versus Lsa on Toefl. The Twelfth European Conference on Machine Learning (ECML2001), Freiburg, Germany. Springer, pp. 491–502.CrossRefGoogle Scholar
Turney, P.D. and Pantel, P. (2010). “From Frequency to Meaning: Vector Space Models of Semantics.Journal of Artificial Intelligence Research 37, 141188.CrossRefGoogle Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L. and Polosukhin, I. (2017) “Attention Is All You Need.” arXiv e-prints, arXiv:1706.03762.Google Scholar
Wang, A., Singh, A., Michael, J., Hill, F., Levy, O. and Bowman, S.R. (2018). Glue: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. The 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. pp. 353–355.Google Scholar
Wang, T. and Hirst, G. (2011). Refining the Notions of Depth and Density in Wordnet-Based Semantic Similarity Measures. The Conference on Empirical Methods in Natural Language Processing, Edinburgh, United Kingdom. Association for Computational Linguistics, pp. 1003–1011.Google Scholar
Weeds, J.E. (2003). Measures and Applications of Lexical Distributional Similarity. Ph.D thesis, University of Sussex.Google Scholar
Wieting, J., Bansal, M., Gimpel, K. and Livescu, K. (2015). “From Paraphrase Database to Compositional Paraphrase Model and Back.Transactions of the Association for Computational Linguistics 3, 345358.CrossRefGoogle Scholar
Wu, Z. and Palmer, M. (1994). Verb Semantics and Lexical Selection. The 32nd Annual Meeting of the Association for Computational Linguistics, Madrid, Spain. Association for Computational Linguistics, pp. 133–138.CrossRefGoogle Scholar
Yang, D. and Powers, D. (2010). “Using Grammatical Relations to Automate Thesaurus Construction.Journal of Research and Practice in Information Technology 42, 105122.Google Scholar
Yang, D. and Powers, D.M.W. (2005). Measuring Semantic Similarity in the Taxonomy of Wordnet. The Twenty-Eighth Australasian Computer Science Conference (ACSC2005), Newcastle, Australia. ACS, pp. 315–322.Google Scholar
Yang, D. and Powers, D.M.W. (2006). Verb Similarity on the Taxonomy of Wordnet. The 3rd International WordNet Conference (GWC-06), Jeju Island, Korea. KAIST, pp. 121–128.Google Scholar
Yin, Z. and Shen, Y. (2018). On the Dimensionality of Word Embedding. Advances in Neural Information Processing Systems 31. Curran Associates, Inc., pp. 895–906.Google Scholar
Yosinski, J., Clune, J., Bengio, Y. and Lipson, H. (2014). How Transferable Are Features in Deep Neural Networks? The 27th International Conference on Neural Information Processing Systems. pp. 3320–3328.Google Scholar
Yu, M. and Dredze, M. (2014). Improving Lexical Embeddings with Semantic Knowledge. The 52nd Annual Meeting of the Association for Computational Linguistics. pp. 545–550.CrossRefGoogle Scholar
Zesch, T. and Gurevych, I. (2010). “Wisdom of Crowds Versus Wisdom of Linguists – Measuring the Semantic Relatedness of Words.Natural Language Engineering 16, 2559.CrossRefGoogle Scholar
Zhang, Z., Gentile, A.L. and Ciravegna, F. (2013). “Recent Advances in Methods of Lexical Semantic Relatedness – a Survey.Natural Language Engineering 19, 411479.CrossRefGoogle Scholar
Zipf, G.K. (1965). Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. N.Y., Hafner Pub. Co.Google Scholar