Hostname: page-component-745bb68f8f-l4dxg Total loading time: 0 Render date: 2025-01-27T05:19:52.979Z Has data issue: false hasContentIssue false

Discovering multiword expressions

Published online by Cambridge University Press:  11 September 2019

Aline Villavicencio*
Affiliation:
Federal University of Rio Grande do Sul, Porto Alegre, Brazil University of Sheffield, Sheffield, UK University of Essex, Colchester, England, UK
Marco Idiart
Affiliation:
Federal University of Rio Grande do Sul, Porto Alegre, Brazil
*
*Corresponding author. Email: a.villavicencio@sheffield.ac.uk

Abstract

In this paper, we provide an overview of research on multiword expressions (MWEs), from a natural language processing perspective. We examine methods developed for modelling MWEs that capture some of their linguistic properties, discussing their use for MWE discovery and for idiomaticity detection. We concentrate on their collocational and contextual preferences, along with their fixedness in terms of canonical forms and their lack of word-for-word translatatibility. We also discuss a sample of the MWE resources that have been used in intrinsic evaluation setups for these methods.

Type
Article
Copyright
© Cambridge University Press 2019 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Arnon, I. and Snider, N. (2010). More than words: Frequency effects for multi-word phrases. Journal of Memory and Language 62, 6782.CrossRefGoogle Scholar
Attia, M., Toral, A., Tounsi, L., Pecina, P. and van Genabith, J. (2010). Automatic extraction of Arabic multiword expressions. In Proceedings of the Workshop on Multiword Expressions: From Theory to Applications (MWE 2010), Beijing, China. Association for Computational Linguistics, pp. 1826.Google Scholar
Baldwin, T. and Kim, S. N. (2010). Multiword expressions. In Indurkhya, N. and Damerau, F. J. (eds), Handbook of Natural Language Processing, 2nd Edn. Boca Raton, FL, USA: CRC Press, Taylor and Francis Group, pp. 267292.Google Scholar
Baroni, M., Dinu, G. and Kruszewski, G. (2014). Don’t count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Baltimore, Maryland. Association for Computational Linguistics, pp. 238247.CrossRefGoogle Scholar
Barrett, M., Bingel, J., Hollenstein, N., Rei, M. and Søgaard, A. (2018). Sequence classification with human attention. In Korhonen, A. and Titov, I., (eds), Proceedings of the 22nd Conference on Computational Natural Language Learning, CoNLL 2018, October 31–November 1, 2018, Brussels, Belgium. Association for Computational Linguistics, pp. 302312.Google Scholar
Barrett, M., Bingel, J., Keller, F. and Søgaard, A. (2016). Weakly supervised part-of-speech tagging using eye-tracking data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7–12, 2016, Berlin, Germany, Volume 2 of Short Papers. The Association for Computer Linguistics.Google Scholar
Biber, D., Johansson, S., Leech, G., Conrad, S. and Finegan, E. (1999). Longman Grammar of Spoken and Written English, 1st Edn. Harlow, Essex: Pearson Education Ltd. 1204 p.Google Scholar
Bouma, G. (2009). Normalized (pointwise) mutual information in collocation extraction. In From Form to Meaning: Processing Texts Automatically, Proceedings of the Biennial GSCL Conference 2009, volume Normalized, Tübingen, pp. 3140.Google Scholar
Butnariu, C., Kim, S.N., Nakov, P., Ó Séaghdha, D., Szpakowicz, S. and Veale, T. (2009). SemEval-2010 task 9: The interpretation of noun compounds using paraphrasing verbs and prepositions. In Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions (SEW-2009), Boulder, Colorado. Association for Computational Linguistics, pp. 100105.CrossRefGoogle Scholar
Cacciari, C. and Tabossi, P. (1988). The comprehension of idioms. Journal of Memory and Language 27, 668683.CrossRefGoogle Scholar
Calzolari, N., Fillmore, C.J., Grishman, R., Ide, N., Lenci, A., MacLeod, C. and Zampolli, A. (2002). Towards best practice for multiword expressions in computational lexicons. In Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02), Las Palmas, Canary Islands, Spain. European Language Resources Association (ELRA).Google Scholar
Camacho-Collados, J., Pilehvar, M.T. and Navigli, R. (2015). A framework for the construction of monolingual and cross-lingual word similarity datasets. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Beijing, China. Association for Computational Linguistics, pp. 17.Google Scholar
Caseli, H.d.M., Ramisch, C., Nunes, M.d.G.V. and Villavicencio, A. (2010). Alignment-based extraction of multiword expressions. Language Resources and Evaluation 44(1–2), 5977.CrossRefGoogle Scholar
Church, K. (2013). How many multiword expressions do people know? ACM Transactions on Speech and Language Processing 10(2), 4:14: 13.CrossRefGoogle Scholar
Church, K.W. and Hanks, P. (1990). Word association norms, mutual information, and lexicography. Computational Linguistics 16(1), 2229.Google Scholar
Clark, S. 2015. Vector Space Models of Lexical Meaning, Chapter 16. John Wiley & Sons, Ltd, pp. 493522.Google Scholar
Constant, M., Eryiit, G., Monti, J., Plas, L., Ramisch, C., Rosner, M. and Todirascu, A. (2017). Multiword expression processing: A survey. Computational Linguistics 43(4), 837892.CrossRefGoogle Scholar
Cook, P., Fazly, A. and Stevenson, S. (2008). The VNC-tokens Dataset. In Grégoire, N., Evert, S. and Krenn, B. (eds), Proceedings of the LREC Workshop Towards a Shared Task for MWEs (MWE 2008), Marrakech, Morocco, pp. 1922.Google Scholar
Cop, U., Dirix, N., Drieghe, D. and Duyck, W. (2017). Presenting geco: An eyetracking corpus of monolingual and bilingual sentence reading. Behavior Research Methods 49(2), 602615.CrossRefGoogle ScholarPubMed
Cordeiro, S., Villavicencio, A., Idiart, M. and Ramisch, C. (2019). Unsupervised compositionality prediction of nominal compounds. Computational Linguistics 45(1), 157.CrossRefGoogle Scholar
Curran, J. and Moens, M. (2002). Scaling context space. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics, pp. 231238.Google Scholar
de Marneffe, M.-C., Padó, S. and Manning, C.D. (2009). Multi-word expressions in textual inference: Much ado about nothing? In Proceedings of the 2009 Workshop on Applied Textual Inference, Suntec, Singapore. Association for Computational Linguistics, pp. 19.Google Scholar
Dunning, T. (1993). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1), 6174.Google Scholar
Evert, S. and Krenn, B. (2005). Using small random samples for the manual evaluation of statistical association measures. Computer Speech and Language 19(4), 450466.CrossRefGoogle Scholar
Farahmand, M., Smith, A. and Nivre, J. (2015). A multiword expression data set: Annotating non-compositionality and conventionalization for english noun compounds. In Proceedings of the 11th Workshop on Multiword Expressions, Denver, Colorado. Association for Computational Linguistics, pp. 2933.CrossRefGoogle Scholar
Fazly, A., Cook, P. and Stevenson, S. (2009). Unsupervised type and token identification of idiomatic expressions. Computational Linguistics 35(1), 61103.CrossRefGoogle Scholar
Fellbaum, C. (ed) (1998). WordNet: An Electronic Lexical Database (Language, Speech, and Communication). Cambridge, Massachusetts: MIT Press, 423 p.CrossRefGoogle Scholar
Fillmore, C.J. (1979). Innocence: A second idealization for linguistics. Annual Meeting of the Berkeley Linguistics Society 5, pp. 6376.CrossRefGoogle Scholar
Fillmore, C.J., Kay, P. and O’Connor, M.C. (1988). Regularity and idiomaticity in grammatical constructions: The case of let alone. Language 64, 501538.CrossRefGoogle Scholar
Firth, J.R. (1957). Papers in Linguistics 1934–1951. Oxford, UK: Oxford UP, 233 p.Google Scholar
Frege, G. (1892–1960). Über sinn und bedeutung. Zeitschrift für Philosophie und philosophische Kritik 100, 2550. Translated, as ‘On Sense and Reference’, by Max Black.Google Scholar
Glucksberg, S. (1989). Metaphors in conversation: How are they understood? why are they used? Metaphor and Symbolic Activity 4(3), 125143.CrossRefGoogle Scholar
Hartung, M., Kaupmann, F., Jebbara, S. and Cimiano, P. (2017). Learning compositionality functions on word embeddings for modelling attribute meaning in adjective-noun phrases. In Proceedings of the 15th Meeting of the European Chapter of the Association for Computational Linguistics (EACL).CrossRefGoogle Scholar
Hendrickx, I., Kozareva, Z., Nakov, P., Ó Séaghdha, D., Szpakowicz, S. and Veale, T. (2013). Semeval-2013 task 4: Free paraphrases of noun compounds. In Proceedings of *SEM 2013, Volume 2 – SemEval. ACL, pp. 138143.Google Scholar
Jackendoff, R. (1997). Twistin’ the night away. Language 73, 534559.CrossRefGoogle Scholar
Justeson, J.S. and Katz, S.M. (1995). Technical terminology: Some linguistic properties and an algorithm for identification in text. Natural Language Engineering 1(1), 927.CrossRefGoogle Scholar
Kilgarriff, A., Rychlý, P., Smrz, P. and Tugwell, D. (2004a). The sketch engine. In Williams, G. and Vessier, S. (eds), Proceedings of the 11th EURALEX International Congress, Lorient, France. Université de Bretagne-Sud, Faculté des lettres et des sciences humaines, pp. 105115.Google Scholar
Kilgarriff, A., Rychly, P., Smrz, P. and Tugwell, D. (2004b). The sketch engine. In Proceedings of EURALEX.Google Scholar
Kim, S.N., Medelyan, O., Kan, M.-Y. and Baldwin, T. (2010). Semeval-2010 task 5: Automatic keyphrase extraction from scientific articles. In Erk, K. and Strapparava, C. (eds), Proceedings of the 5th SemEval (SemEval 2010), Uppsala, Sweden. ACL, pp. 2126.Google Scholar
King, M. and Cook, P. (2018). Leveraging distributed representations and lexico-syntactic fixedness for token-level prediction of the idiomaticity of english verb-noun combinations. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Melbourne, Australia. Association for Computational Linguistics, pp. 345350.CrossRefGoogle Scholar
Kiros, R., Zhu, Y., Salakhutdinov, R.R., Zemel, R., Urtasun, R., Torralba, A. and Fidler, S. (2015). Skip-thought vectors. In Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M. and Garnett, R. (eds), Advances in Neural Information Processing Systems 28, Curran Associates, Inc, pp. 32943302.Google Scholar
Korkontzelos, I., Zesch, T., Zanzotto, F.M. and Biemann, C. (2013). Semeval-2013 task 5: Evaluating phrasal semantics. In Diab, M.T., Baldwin, T. and Baroni, M. (eds), Proceedings of the 7th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2013, June 14–15, Atlanta, Georgia, USA, 2013, pp. 3947.Google Scholar
Kruszewski, G. and Baroni, M. (2014). Dead parrots make bad pets: Exploring modifier effects in noun phrases. In Bos, J., Frank, A. and Navigli, R. (eds), Proceedings of the Third Joint Conference on Lexical and Computational Semantics, *SEM@COLING 2014, August 23–24, 2014, Dublin, Ireland. The *SEM 2014 Organizing Committee, pp. 171181.CrossRefGoogle Scholar
Lapesa, G. and Evert, S. (2014). A large scale evaluation of distributional semantic models: Parameters, interactions and model selection. Transactions of the Association for Computational Linguistics 2, 531545.CrossRefGoogle Scholar
Lapesa, G. and Evert, S. (2017). Large-scale evaluation of dependency-based DSMs: Are they worth the effort? In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, Valencia, Spain. Association for Computational Linguistics, pp. 394400.Google Scholar
Leacock, C. and Chodorow, M. (1998). Combining local context and wordnet similarity for word sense identification. In Fellfaum, C. (ed), WordNet: An electronic lexical database, pp. 265283, Cambridge, Massachusetts: MIT Press.Google Scholar
Levy, O. and Goldberg, Y. (2014). Dependency-based word embeddings. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Baltimore, Maryland. Association for Computational Linguistics, pp. 302308.CrossRefGoogle Scholar
Lin, D. (1998). Automatic retrieval and clustering of similar words. In Proceedings of the 17th International Conference on Computational Linguistics-Volume 2. Association for Computational Linguistics, pp. 768774.Google Scholar
Losnegaard, G.S., Sangati, F., Parra Escartín, C., Savary, A., Bargmann, S. and Monti, J. (2016). PARSEME survey on MWE resources. In 9th International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia, pp. 22992306.Google Scholar
Manning, C.D. and Schütze, H. (1999). Foundations of Statistical Natural Language Processing. Cambridge, USA: MIT Press, 620 p.Google Scholar
McCarthy, D., Keller, B. and Carroll, J. (2003). Detecting a continuum of compositionality in phrasal verbs. In Bond, F., Korhonen, A., McCarthy, D., and Villavicencio, A. (eds), Proceedings of the ACL Workshop on MWEs: Analysis, Acquisition and Treatment (MWE 2003), Sapporo, Japan. ACL, pp. 7380.Google Scholar
McGill, W.J. (1954). Multivariate information transmission. Psychometrika 19(2), 97116.CrossRefGoogle Scholar
Melamed, I.D. (1997). Automatic discovery of non-compositional compounds in parallel data. In Proceedings of the 2nd EMNLP (EMNLP-2), Brown University, RI, USA. ACL, pp. 97108.Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S. and Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Proceedings of 26th International Conference on Neural Information Processing Systems - Volume 2, Advances in Neural Information Processing Systems, Lake Tahoe, Nevada, pp. 31113119.Google Scholar
Mitchell, J. and Lapata, M. (2008). Vector-based models of semantic composition. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-08:HLT), Columbus, Ohio. Association for Computational Linguistics, pp. 236244.Google Scholar
Mitchell, J. and Lapata, M. (2010). Composition in distributional models of semantics. Cognitive Science 34(8), 13881429.CrossRefGoogle ScholarPubMed
Moon, R. (1998). Fixed Expressions and Idioms in English: A Corpus-based Approach. Oxford Studies in Lexicography. Oxford, UK: Clarendon Press.Google Scholar
Nakov, P. (2008). Paraphrasing verbs for noun compound interpretation. In Proceedings of the LREC Workshop Towards a Shared Task for MWEs (MWE 2008), pp. 4649.Google Scholar
Nunberg, G., Sag, I.A. and Wasow, T. (1994). Idioms. In Everson, S. (ed), Language, Oxford, UK: Cambridge University Press, pp. 491538.Google Scholar
Padró, M., Idiart, M., Villavicencio, A. and Ramisch, C. (2014). Nothing like good old frequency: Studying context filters for distributional thesauri. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2014) - Short Papers, Doha, Qatar.CrossRefGoogle Scholar
Pastor, G.C. and Colson, J.-P. (2019). Computational and Corpus-based Phraseology. John Benjamins.CrossRefGoogle Scholar
Pearce, D. (2001). Synonymy in collocation extraction. In WordNet and Other Lexical Resources: Applications, Extensions and Customizations (NAACL 2001 Workshop), pp. 4146.Google Scholar
Pearce, D. (2002). A comparative evaluation of collocation extraction techniques. In Proceedings of the Third LREC (LREC 2002). Las Palmas, Canary Islands, Spain: ELRA, pp. 15301536.Google Scholar
Pecina, P. (2010). Lexical association measures and collocation extraction. Language Resources and Evaluation 44(1–2), 137158.CrossRefGoogle Scholar
Pecina, P. and Schlesinger, P. (2006). Combining association measures for collocation extraction. In Proceedings of the 21th International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pp. 651658.CrossRefGoogle Scholar
Pennington, J., Socher, R. and Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar. Association for Computational Linguistics, pp. 15321543.CrossRefGoogle Scholar
Peters, M., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K. and Zettlemoyer, L. (2018). Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), New Orleans, Louisiana. Association for Computational Linguistics, pp. 22272237.Google Scholar
Ramisch, C. (2015). Multiword Expressions Acquisition: A Generic and Open Framework, volume XIV of Theory and Applications of Natural Language Processing. Springer.CrossRefGoogle Scholar
Ramisch, C., Cordeiro, S.R., Savary, A., Vincze, V., Barbu Mititelu, V., Bhatia, A., Buljan, M., Candito, M., Gantar, P., Giouli, V., Güngör, T., Hawwari, A., Iñurrieta, U., Kovalevskait, J., Krek, S., Lichte, T., Liebeskind, C., Monti, J., Parra Escartn, C., QasemiZadeh, B., Ramisch, R., Schneider, N., Stoyanova, I., Vaidya, A. and Walsh, A. (2018). Edition 1.1 of the PARSEME shared task on automatic identification of verbal multiword expressions. In Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), Santa Fe, New Mexico, USA. Association for Computational Linguistics, pp. 222240.Google Scholar
Ramisch, C., Cordeiro, S., Zilio, L., Idiart, M., Villavicencio, A. and Wilkens, R. (2016). How naked is the naked truth? A multilingual lexicon of nominal compound compositionality. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, p. 156.CrossRefGoogle Scholar
Ramisch, C., Schreiner, P., Idiart, M. and Villavicencio, A. (2008a). An evaluation of methods for the extraction of multiword expressions. In Proceedings of the LREC 2008 Workshop on Multiword Expressions, Marrakech, pp. 5053.Google Scholar
Ramisch, C. and Villavicencio, A. (2018). Computational treatment of multiword expressions. In Mitkov, R. (ed), The Oxford Handbook of Computational Linguistics, 2nd Edn, Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199573691.013.56.Google Scholar
Ramisch, C., Villavicencio, A., Moura, L. and Idiart, M. (2008b). Picking them up and figuring them out: Verb-particle constructions, noise and idiomaticity. In CoNLL 2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning, Manchester, England, pp. 4956.CrossRefGoogle Scholar
Rayson, P., Piao, S., Sharoff, S., Evert, S. and Moirón, B.V. (2010). Multiword expressions: Hard going or plain sailing? Language Resources and Evaluation 44(1–2), 15.CrossRefGoogle Scholar
Reddy, S., McCarthy, D. and Manandhar, S. (2011). An empirical study on compositionality in compound nouns. In Proceedings of The 5th International Joint Conference on Natural Language Processing 2011 (IJCNLP 2011), Chiang Mai, Thailand.Google Scholar
Rohanian, O., Taslimipoor, S., Yaneva, V. and Ha, L. A. (2017). Using gaze data to predict multiword expressions. In Mitkov, R. and Angelova, G. (eds), Proceedings of the International Conference Recent Advances in Natural Language Processing, September 2–8, 2017, Varna, Bulgaria, pp. 601609.CrossRefGoogle Scholar
Roller, S. and Schulte im Walde, S. (2014). Feature norms of German noun compounds. In Proceedings of the 10th Workshop on Multiword Expressions, ACL, pp. 104108.Google Scholar
Roller, S., Schulte im Walde, S. and Scheible, S. (2013). The (un)expected effects of applying standard cleansing models to human ratings on compositionality. In Proceedings of the 9th Workshop on Multiword Expressions, Atlanta, Georgia, USA, pp. 3241.Google Scholar
Rosén, V., Losnegaard, G.S., De Smedt, K., Bejček, E., Savary, A., Przepiórkowski, A., Osenova, P. and Barbu Mititelu, V. (2015). A survey of multiword expressions in treebanks. In Proceedings of the 14th International Workshop on Treebanks & Linguistic Theories Conference, Warsaw, Poland.Google Scholar
Sag, I.A., Baldwin, T., Bond, F., Copestake, A.A. and Flickinger, D. (2002). Multiword expressions: A pain in the neck for nlp. In Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing, CICLing’02. Berlin, Heidelberg: Springer-Verlag, pp. 115.Google Scholar
Salehi, B., Cook, P. and Baldwin, T. (2014). Using distributional similarity of multi-way translations to predict multiword expression compositionality. In Bouma, G. and Parmentier, Y. (eds), Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, Gothenburg, Sweden. The Association for Computer Linguistics, pp. 472481.CrossRefGoogle Scholar
Salehi, B., Cook, P. and Baldwin, T. (2015). A word embedding approach to predicting the compositionality of multiword expressions. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, Colorado. Association for Computational Linguistics, pp. 977983.Google Scholar
Salehi, B., Cook, P. and Baldwin, T. (2018). Exploiting multilingual lexical resources to predict MWE compositionality. In Markantonatou, S., Ramisch, C., Savary, A. and Vincze, V. (eds), Multiword Expressions at Length and in Depth: Extended Papers from the MWE 2017 Workshop. Berlin: Language Science Press, pp. 343373.Google Scholar
Salton, G., Ross, R.J. and Kelleher, J.D. (2016). Idiom token classification using sentential distributed semantics. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7–12, 2016, Berlin, Germany, Volume 1: Long Papers. The Association for Computer Linguistics.Google Scholar
Savary, A., Sailer, M., Parmentier, Y., Rosner, M., Rosén, V., Przepiórkowski, A., Krstev, C., Vincze, V., Wójtowicz, B., Losnegaard, G.S., Parra Escartín, C., Waszczuk, J., Constant, M., Osenova, P. and Sangati, F. (2015). PARSEME – PARSing and Multiword Expressions within a European multilingual network. In 7th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics (LTC 2015), Pozna, Poland.Google Scholar
Schneider, N., Hovy, D., Johannsen, A. and Carpuat, M. (2016). SemEval-2016 task 10: Detecting minimal semantic units and their meanings (DiMSUM). In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), San Diego, California. Association for Computational Linguistics, pp. 546559.Google Scholar
Schneider, N. and Smith, N.A. (2015). A corpus and model integrating multiword expressions and supersenses. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, Colorado. Association for Computational Linguistics, pp. 15371547.Google Scholar
Schulte im Walde, S., Hätty, A., Bott, S. and Khvtisavrishvili, N. (2016). GhoSt-NN: A representative gold standard of German noun–noun compounds. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). Portorož, Slovenia. European Language Resources Association (ELRA), pp. 22852292.Google Scholar
Seretan, V. (2011). Syntax-Based Collocation Extraction, volume 44 of Text, Speech and Language Technology, 1st Edn. Dordrecht, Netherlands: Springer, 212 p.CrossRefGoogle Scholar
Siyanova-Chanturia, A. (2013). Eye-tracking and erps in multi-word expression research: A state-of-the-art review of the method and findings. The Mental Lexicon 8(2), 245268.Google Scholar
Søgaard, A., Vulic, I., Ruder, S. and Faruqui, M. (2019). Cross-Lingual Word Embeddings . Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers.Google Scholar
Sporleder, C. and Li, L. (2009). Unsupervised recognition of literal and non-literal use of idiomatic expressions. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, EACL’09, Stroudsburg, PA, USA. Association for Computational Linguistics, pp. 754762.Google Scholar
Taslimipoor, S., Rohanian, O., Mitkov, R. and Fazly, A. (2017). Investigating the opacity of verb-noun multiword expression usages in context. In Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), Valencia, Spain. Association for Computational Linguistics, pp. 133138.CrossRefGoogle Scholar
Tsvetkov, Y. and Wintner, S. (2012). Extraction of multi-word expressions from small parallel corpora. Natural Language Engineering 18(04), 549573.CrossRefGoogle Scholar
Turney, P.D. and Pantel, P. (2010). From frequency to meaning: vector space models of semantics. Journal of Artificial Intelligence Research 37(1), 141188.CrossRefGoogle Scholar
Van de Cruys, T. (2011). Two multivariate generalizations of pointwise mutual information. In Proceedings of the Workshop on Distributional Semantics and Compositionality, Portland, Oregon, USA. Association for Computational Linguistics, pp. 1620.Google Scholar
Villavicencio, A. (2005). The availability of verb-particle constructions in lexical resources: How much is enough? Computer Speech & Language Special issue on MWEs 19(4), 415432.CrossRefGoogle Scholar
Villavicencio, A., Kordoni, V., Zhang, Y., Idiart, M., and Ramisch, C. (2007). Validation and evaluation of automatically acquired multiword expressions for grammar engineering. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, Czech Republic. Association for Computational Linguistics, pp. 10341043.Google Scholar
Watanabe, S. (1960). Information theoretical analysis of multivariate correlation. IBM Journal of Research and Development 4(1), 6682.CrossRefGoogle Scholar
Wilkens, R., Zilio, L., Cordeiro, S.R., Paula, F., Ramisch, C., Idiart, M. and Villavicencio, A. (2017). LexSubNC: A dataset of lexical substitution for nominal compounds. In Proceedings of the 12th International Conference on Computational Semantics (IWCS 2017), Montpellier, France.Google Scholar
Wray, A. (2002). Formulaic Language and the Lexicon. Cambridge, UK: Cambridge UP. 348 p.CrossRefGoogle Scholar
Wu, Z. and Palmer, M. (1994). Verbs semantics and lexical selection. In Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, ACL’94, Stroudsburg, PA, USA. Association for Computational Linguistics, pp. 133138.CrossRefGoogle Scholar
Yazdani, M., Farahmand, M. and Henderson, J. (2015). Learning semantic composition to detect non-compositionality of multiword expressions. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal. Association for Computational Linguistics, pp. 17331742.CrossRefGoogle Scholar
Zhang, Y., Kordoni, V., Villavicencio, A. and Idiart, M. (2006). Automated multiword expression prediction for grammar engineering. In Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties, Sydney, Australia. Association for Computational Linguistics, pp. 3644.CrossRefGoogle Scholar