Skip to main content Accessibility help
×
Hostname: page-component-78c5997874-j824f Total loading time: 0 Render date: 2024-11-13T05:59:51.875Z Has data issue: false hasContentIssue false

6 - Developing Multilingual Automatic Semantic Annotation Systems

Published online by Cambridge University Press:  10 June 2019

Meng Ji
Affiliation:
University of Sydney
Michael Oakes
Affiliation:
University of Wolverhampton
Get access

Summary

We report the development of a multilingual system for the semantic analysis of text. The research on the English Semantic Tagger started in 1990, and after that the system has been ported, first, to Finnish and Russian, and, thereafter, to Arabic, Chinese, Czech, Dutch, French, Italian, Malay, Portuguese, Spanish, Urdu, and Welsh. The development processes of the semantic taggers for English, Finnish, and Russian were relatively similar, involving manual construction of the semantic lexicons, whereas, to speed up the research, new bootstrapping methods including computational approaches have been utilised later in the creation of the semantic lexicons for the other languages. We describe these manual and automatic processes as well as envisaging directions for future development. The resulting multilingual framework of semantic taggers based on equivalent semantic lexicons and one common semantic taxonomy offers a wealth of potential applications which this chapter also illustrates. In addition to developing monolingual applications for these semantic taggers, it is also possible to create cross-lingual and multilingual applications. Furthermore, while the existing semantic analysis systems are designed for the analysis of general language, such systems can also be tailored for a specific purpose to deal more accurately with only one particular domain or task.

Type
Chapter
Information
Advances in Empirical Translation Studies
Developing Translation Resources and Technologies
, pp. 94 - 109
Publisher: Cambridge University Press
Print publication year: 2019

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Archer, D., Rayson, P., Piao, S. and McEnery, T. (2004). Comparing the UCREL semantic annotation scheme with lexicographical taxonomies. In Williams, G. and Vessier, S. (eds.), Proceedings of the 11th EURALEX Congress (pp. 817827). Morbihan: Université de Bretagne Sud.Google Scholar
Baron, A., and Rayson, P. (2009). Automatic standardization of texts containing spelling variation: How much training data do you need? In Mahlberg, M., González-Díaz, V. and Smith, C. (eds.), Proceedings of Corpus Linguistics 2009. Liverpool: University of Liverpool.Google Scholar
Davies, M. and Preto-Bay, A. (2007). A Frequency Dictionary of Portuguese. London:Routledge.CrossRefGoogle Scholar
El-Haj, M., Rayson, P., Piao, S. and Wattam, S. (2017). Creating and validating multilingual semantic representations for six languages: Expert versus non-expert crowds. In EACL 2017 Workshop on Sense, Concept and Entity Representations and Their Applications. Valencia. Association for Computational Linguistics.Google Scholar
ESRC Centre for Corpus Approaches to Social Science (2016). NewsHack 2016 retrospective. ESRC Centre for Corpus Approaches to Social Science (website). Retrieved from http://cass.lancs.ac.uk/?p=1978.Google Scholar
Garside, R., and Smith, N. (1997). A hybrid grammatical tagger: CLAWS4. In Garside, R., Leech, G. and McEnery, A. (eds.), Corpus Annotation: Linguistic Information from Computer Text Corpora (pp. 102–121). London: Longman.Google Scholar
Hancock, J. T.,Woodworth, M. T. and Porter, S. (2013). Hungry like the wolf: A word‐pattern analysis of the language of psychopaths. Legal and Criminological Psychology 18(1), 102114.CrossRefGoogle Scholar
Jiménez, R. M., Sanjurjo-González, H., Rayson, P. E. and Piao, S. S. (2017). Building a Spanish lexicon for corpus analysis. In The 35th International Conference of AESLA. 2017.AESLA.Google Scholar
Kay, C., Roberts, J., Samuels, M. and Wotherspoon, I. (2009). Unlocking the OED: The story of the Historical Thesaurus of the OED. In Kay, C., Roberts, J., Samuels, M. and Wotherspoon, I. (eds.), Historical Thesaurus of the Oxford English Dictionary Oxford: Oxford University Press, pp. xiiixx.Google Scholar
Kettunen, K. and Löfberg, L. (2017). Tagging named entities in 19th century and modern Finnish newspaper material with a Finnish semantic tagger. Paper presented at NoDaLiDa 2017, Gothenburg.Google Scholar
Löfberg, L. (2017). Creating large semantic lexical resources for the Finnish language. Doctoral thesis. Lancaster University.Google Scholar
Löfberg, L., Piao, S., Rayson, P., Juntunen, J-P., Nykänen, A. and Varantola, K. (2005). A semantic tagger for the Finnish language. In Proceedings of the Corpus Linguistics 2005 Conference. Proceedings from the Corpus Linguistics Conference Series online e-journal: www.birmingham.ac.uk/Documents/college-artslaw/corpus/conference-archives/2005-journal/LanguageProcessingandCorpustool/Asemantictagger.doc.Google Scholar
McArthur, T. (1981). Longman Lexicon of Contemporary English. London: Longman.Google Scholar
Miller, G. A. (1995). WordNet: A lexical database for English, Communications of the ACM, 38(11), 3941.Google Scholar
Mohamed, G., Potts, A. and Hardie, A. (2013). AraSAS: A semantic tagger for Arabic. Paper presented at Second Workshop on Arabic Corpus Linguistics, Lancaster University, United Kingdom.Google Scholar
Mudraya, O.,Babych, B., Piao, S., Rayson, P. and Wilson, A. (2006). Developing a Russian semantic tagger for automatic semantic annotation. In Proceedings of Corpus Linguistics 2006 St. Petersburg, pp. 290297.Google Scholar
Piao, S., Bianchi, F., Dayrell, C., D’Egidio, A. and Rayson, P. (2015). Development of the multilingual semantic annotation system. In The 2015 Conference of the North American Chapter of the Association for Computational Linguistics – Human Language Technologies (NAACL HLT 2015). Association for Computational Linguistics, pp. 12681274.Google Scholar
Piao, S. S., Hu, X. and Rayson, P. (2015). Towards a semantic tagger for analysing contents of Chinese corporate reports. Paper presented at the 4th International Conference on Information Science and Cloud Computing (ISCC 2015).Google Scholar
Piao, S., Rayson, P., Archer, D., Bianchi, F., Dayrell, C., El-Haj, M., Jiménez, R., Knight, D., Kren, M., Löfberg, L., Nawab, R. A., Shafi, J., Teh, P. L. and Mudraya, O. (2016). Lexical coverage evaluation of large-scale multilingual semantic lexicons for twelve languages. In Proceedings of the 10th Edition of the Language Resources and Evaluation Conference (LREC2016) European Language Resources Association (ELRA), pp. 26142619.Google Scholar
Piao, S. S., Dallachy, F., Baron, A., Demmen, J. E., Wattam, S., Durkin, P., McCracken, J., Rayson, P. and Alexander, M. (2017a). A time-sensitive historical thesaurus-based semantic tagger for deep semantic annotation, Computer Speech and Language 46, 113135.Google Scholar
Piao, S. S., Rayson, P. E., Knight, D., Watkins, G., and Donnelly, K. (2017b). Towards a Welsh semantic tagger: Creating lexicons for a resource poor language. Paper presented at the Corpus Linguistics Conference 2017, University of Birmingham, United Kingdom.Google Scholar
Qian, Y., and Piao, S. (2009). The development of a semantic annotation scheme for Chinese kinship, Corpora 4(2), 189208.CrossRefGoogle Scholar
Rayson, P., Archer, D., Piao, S. and McEnery, T. (2004). The UCREL semantic analysis system. In Proceedings of LREC-04 Workshop: Beyond Named Entity Recognition Semantic Labeling for NLP Tasks. Lisbon, Portugal: European Language Resources Association (ELRA), pp. 712.Google Scholar
Rayson, P., and Baron, A. (2011). Automatic error tagging of spelling mistakes in learner corpora. In Meunier, F., De Cock, S., Gilquin, G. and Paquot, M. (eds.), A Taste for Corpora: In Honour of Sylviane Granger. Amsterdam: John Benjamins, pp. 109126.Google Scholar
Schmid, H. (1994). Probabilistic part-of-speech tagging using decision trees. Proceedings of International Conference on New Methods in Language Processing. Manchester, UK: University of Manchester.Google Scholar
Simm, W., Ferrario, M. A., Piao, S., Whittle, J. and Rayson, P. (2010). Classification of short text comments by sentiment and actionability for VoiceYourView. In 2010 IEEE Second International Conference on Social Computing (SocialCom) IEEE, pp. 552557.CrossRefGoogle Scholar
Toutanova, K., Klein, D., Manning, C. and Singer, Y. (2003). Feature-rich part-of-speech tagging with a cyclic dependency network. In Proceedings of HLT-NAACL 2003. Stroudsburg, PA: Association for Computational Linguistics, pp. 252259.Google Scholar
Warner, W., and Hirschberg, J. (2012). Detecting hate speech on the World Wide Web. In Proceedings of the Second Workshop on Language in Social Media. Stroudsburg, PA: Association for Computational Linguistics, pp. 1926.Google Scholar
Whissell, C. M. and Dewson, M. R. (1986). A dictionary of affect in language: III. Analysis of two biblical and two secular passages. Perceptual and Motor Skills 62(1), 127132.Google Scholar
Xiao, R., Rayson, P. and McEnery, T. (2009). A Frequency Dictionary of Mandarin Chinese: Core Vocabulary for Learners. London: Routledge.Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×