Modeling the impact of orthographic coding on Czech–Polish and Bulgarian–Russian reading intercomprehension

Irina Stenger; Klára Jágrová; Andrea Fischer; Tania Avgustinova; Dietrich Klakow; Roland Marti

doi:10.1017/S0332586517000130

Modeling the impact of orthographic coding on Czech–Polish and Bulgarian–Russian reading intercomprehension

Published online by Cambridge University Press: 05 October 2017

Dietrich Klakow and

Irina Stenger: Affiliation:
Collaborative Research Center (SFB) 1102: Information Density and Linguistic Encoding Project C4: INCOMSLAV Mutual Intelligibility and Surprisal in Slavic Intercomprehension Saarland University, Postfach 151150, 66041 Saarbrücken, Germany. ira.stenger@mx.uni-saarland.de, kjagrova@coli.uni-saarland.de, andrea.fischer@lsv.uni-saarland.de, avgustinova@coli.uni-saarland.de, dietrich.klakow@lsv.uni-saarland.de, rwmslav@mx.uni-saarland.de
Klára Jágrová: Affiliation:
Collaborative Research Center (SFB) 1102: Information Density and Linguistic Encoding Project C4: INCOMSLAV Mutual Intelligibility and Surprisal in Slavic Intercomprehension Saarland University, Postfach 151150, 66041 Saarbrücken, Germany. ira.stenger@mx.uni-saarland.de, kjagrova@coli.uni-saarland.de, andrea.fischer@lsv.uni-saarland.de, avgustinova@coli.uni-saarland.de, dietrich.klakow@lsv.uni-saarland.de, rwmslav@mx.uni-saarland.de
Andrea Fischer: Affiliation:
Collaborative Research Center (SFB) 1102: Information Density and Linguistic Encoding Project C4: INCOMSLAV Mutual Intelligibility and Surprisal in Slavic Intercomprehension Saarland University, Postfach 151150, 66041 Saarbrücken, Germany. ira.stenger@mx.uni-saarland.de, kjagrova@coli.uni-saarland.de, andrea.fischer@lsv.uni-saarland.de, avgustinova@coli.uni-saarland.de, dietrich.klakow@lsv.uni-saarland.de, rwmslav@mx.uni-saarland.de
Tania Avgustinova: Affiliation:
Collaborative Research Center (SFB) 1102: Information Density and Linguistic Encoding Project C4: INCOMSLAV Mutual Intelligibility and Surprisal in Slavic Intercomprehension Saarland University, Postfach 151150, 66041 Saarbrücken, Germany. ira.stenger@mx.uni-saarland.de, kjagrova@coli.uni-saarland.de, andrea.fischer@lsv.uni-saarland.de, avgustinova@coli.uni-saarland.de, dietrich.klakow@lsv.uni-saarland.de, rwmslav@mx.uni-saarland.de
Dietrich Klakow: Affiliation:
Collaborative Research Center (SFB) 1102: Information Density and Linguistic Encoding Project C4: INCOMSLAV Mutual Intelligibility and Surprisal in Slavic Intercomprehension Saarland University, Postfach 151150, 66041 Saarbrücken, Germany. ira.stenger@mx.uni-saarland.de, kjagrova@coli.uni-saarland.de, andrea.fischer@lsv.uni-saarland.de, avgustinova@coli.uni-saarland.de, dietrich.klakow@lsv.uni-saarland.de, rwmslav@mx.uni-saarland.de
Roland Marti: Affiliation:
Collaborative Research Center (SFB) 1102: Information Density and Linguistic Encoding Project C4: INCOMSLAV Mutual Intelligibility and Surprisal in Slavic Intercomprehension Saarland University, Postfach 151150, 66041 Saarbrücken, Germany. ira.stenger@mx.uni-saarland.de, kjagrova@coli.uni-saarland.de, andrea.fischer@lsv.uni-saarland.de, avgustinova@coli.uni-saarland.de, dietrich.klakow@lsv.uni-saarland.de, rwmslav@mx.uni-saarland.de

Article contents

Abstract
References

Get access

Abstract

Focusing on orthography as a primary linguistic interface in every reading activity, the central research question we address here is how orthographic intelligibility can be measured and predicted between closely related languages. This paper presents methods and findings of modeling orthographic intelligibility in a reading intercomprehension scenario from the information-theoretic perspective. The focus of the study is on two Slavic language pairs: Czech–Polish (West Slavic, using the Latin script) and Bulgarian–Russian (South Slavic and East Slavic, respectively, using the Cyrillic script). In this article, we present computational methods for measuring orthographic distance and orthographic asymmetry by means of the Levenshtein algorithm, conditional entropy and adaptation surprisal method that are expected to predict the influence of orthography on mutual intelligibility in reading.

Keywords

entropy Levenshtein distance mutual intelligibility reading intercomprehension Slavic orthographic code surprisal

Type: Research Article
Information: Nordic Journal of Linguistics , Volume 40 , Special Issue 2: Receptive Multilingualism , October 2017 , pp. 175 - 199

DOI: https://doi.org/10.1017/S0332586517000130 [Opens in a new window]
Copyright: Copyright © Nordic Association of Linguistics 2017

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

CORPORA

Czech National Corpus: Srovnávací frekvenčni seznamy. 2010. http://ucnk.ff.cuni.cz/srovnani10.php (accessed 1 January 2016).Google Scholar

Frequency Dictionaries of Bulgarian . 2011. Department of Computational Linguistics, Bulgarian Academy of Sciences. http://dcl.bas.bg/en/tchestotni-retchnitsi-na-balgarskiya-ezik-2 (accessed 5 April 2016).Google Scholar

Internationalism list . http://www.eurocomslav.de/kurs/iwslav.htm (accessed 11 July 2015).Google Scholar

Lista frekwencyjna [Frequency list]. 2016. Grupa Technologii Językowych G4.19 Politechniki Wrocławskiej. http://www.nlp.pwr.wroc.pl/narzedzia-i-zasoby/zasoby/lista-frekwencyjna (accessed 8 September 2016).Google Scholar

Novyj Častotnyj Slovar’ Russkoj Leksiki [New frequency dictionary of Russian vocabulary] (NČS). 2009. Ol'ga N. Ljaševskaja & Sergej A. Šarov. http://dict.ruslang.ru/freq.php (accessed 5 April 2016).Google Scholar

Otwarty słownik czesko-polski [Open Czech–Polish dictionary] V.03.2010 (c) . 2010. J. Kazojć. http://www.slowniki.org.pl/czesko-polski.pdf (accessed 22 April 2015).Google Scholar

Pan-Slavic list . http://www.eurocomslav.de/kurs/pwslav.htm (accessed 11 July 2015).Google Scholar

Russko-bolgarskij Razgovornik [Russian–Bulgarian phrase book]. Izdatel'stvo ‘Chermes’. https://drive.google.com/file/d/0B3ZsKnxnxCJNSUd3RzNnOVYydlU/view (accessed 15 April 2016).Google Scholar

Russko-bolgarskij Slovar’ [Russian–Bulgarian dictionary]. http://www.lexicons.ru/modern/b/bulgarian/index.html (accessed 5 April 2016).Google Scholar

Swadesh-list . http://en.wiktionary.org/wiki/Appendix:Swadesh_lists_for_Slavic_languages (accessed 11 July 2015).Google Scholar

REFERENCES

Beijering, Katrin, Gooskens, Charlotte & Heeringa, Wilbert. 2008. Predicting intelligibility and perceived linguistic distance by means of the Levenshtein algorithm. In van Koppen, Marjo & Botma, Bert (eds.), Linguistics in the Netherlands 2008, 13–24. Amsterdam: John Benjamins.Google Scholar

Bidwell, Charles E. 1963. Slavic Historical Phonology in Tabular Form. The Hague: Mouton & Co. Google Scholar

Braunmüller, Kurt & Ludger Zeevaert, L. 2001. Semikommunikation, rezeptive Mehrsprachigkeit und verwandte Phänomene. Eine bibliographische Bestandaufnahme (Arbeiten zur Mehrsprachigkeit, Folge B, 19). Hamburg: Universität Hamburg.Google Scholar

Broda, Bartosz & Piasecki, Maciej. 2013. Parallel, massive processing in SuperMatrix: A general tool for distributional semantic analysis of corpora. International Journal of Data Mining, Modelling and Management 5 (1), 1–19.CrossRef Google Scholar

Budovičová, Viera. 1987. Literary languages in contact: A sociolinguistic approach to the relation between Slovak and Czech today. In Chloupek, Jan & Nekvapil, Jiří (eds.), Reader in Czech Sociolinguistics, 156–175. Amsterdam: John Benjamins.Google Scholar

Comrie, Bernard. 1996a. Adaptations of the Roman alphabet: Languages of Eastern and Southern Europe. In Daniels & Bright (eds.), 663–675.Google Scholar

Comrie, Bernard. 1996b. Adaptations of the Cyrillic alphabet. In Daniels & Bright (eds.), 700–726.Google Scholar

Corbett, Greville G. 1998. Agreement in Slavic. Presented at the workshop Comparative Slavic Morphosyntax, Indiana University, Bloomington. [Position paper]Google Scholar

Cubberley, Paul. 1996. The Slavic alphabets. In Daniels & Bright (eds.), 346–355.Google Scholar

Daniels, Peter T. 2001. Writing systems. In Aronoff, Mark & Rees-Miller, Janie (eds.), The Handbook of Linguistics, 43–80. Malden, MA: Blackwell.Google Scholar

Daniels, Peter T. & Bright, William (eds.). 1996. The World's Writing Systems. New York & Oxford: Oxford University Press.Google Scholar

Doyé, Peter. 2005. Intercomprehension. Guide for the Development of Language Education Policies in Europe: From Linguistic Diversity to Plurilingual Education (Reference Studies). Strasbourg: Council of Europe.Google Scholar

Fischer, Andrea, Jágrová, Klára, Stenger, Irina, Avgustinova, Tania, Klakow, Dietrich & Marti, Roland. 2015. An orthography transformation experiment with Czech–Polish and Bulgarian–Russian parallel word sets. In Sharp, Bernadette, Lubaszewski, Wiesław & Delmonte, Rodolfo (eds.), Natural Language Processing and Cognitive Science 2015 Proceedings, 115–126. Venezia: Libreria Editrice Cafoscarina.Google Scholar

Frinsel, Felicity, Kingma, Anne, Gooskens, Charlotte & Swarte, Femke. 2015. Predicting the asymmetric intelligibility between spoken Danish and Swedish using conditional entropy. Tijdschrift voor Slandinavistiek 34 (2), 120–138.Google Scholar

Frost, Ram. 2012. Towards a universal model of reading. Behavioral and Brain Sciences 35 (5), 263–329.CrossRef Google Scholar PubMed

Gooskens, Charlotte. 2007. The contribution of linguistic factors to the intelligibility of closely related languages. Journal of Multilingual and Multicultural Development 28 (6), 445–467.Google Scholar

Gooskens, Charlotte. 2013. Experimental methods for measuring intelligibility of closely related language varieties. In Bayley, Robert, Cameron, Richard & Lucas, Ceil (eds.), Handbook of Sociolinguistics, 195–213. Oxford: Oxford University Press.Google Scholar

Gooskens, Charlotte & Hilton, Nanna H.. 2013. The effect of social factors on the comprehension of a closely related language. In Tirkkonen, Jani-Matti & Anttikoski, Esa (eds.), Proceedings of the 24th Scandinavian Conference of Linguistics, 201–210. Joensuu: University of Eastern Finland.Google Scholar

Gooskens, Charlotte & van Bezooijen, Renée. 2006. Mutual comprehensibility of written Afrikaans and Dutch: Symmetrical or asymmetrical? Literary and Linguistic Computing 21 (4), 543–557.Google Scholar

Gooskens, Charlotte & van Bezooijen, Renée. 2013a. Explaining Danish–Swedish asymmetric word intelligibility: An error analysis. In Gooskens & van Bezooijen (eds.), 59–82.Google Scholar

Gooskens, Charlotte & van Bezooijen, Renée (eds.). 2013b. Phonetics in Europe: Perception and Production. Frankfurt a.M.: Peter Lang.Google Scholar

Hale, John. 2016. Information-theoretical complexity metrics. Language and Linguistics Compass 10 (9), 397–412.Google Scholar

Harley, Trevor. 2008. The Psychology of Language: From Data to Theory. New York: Psychology Press.Google Scholar

Haugen, Einar. 1966. Semicommunication: The language gap in Scandinavia. Sociological Inquiry 36, 280–297.CrossRef Google Scholar

Heeringa, Wilbert, Golubovic, Jelena, Gooskens, Charlotte, Schüppert, Anja, Swarte, Femke & Voigt, Stefanie. 2013. Lexical and orthographic distances between Germanic, Romance and Slavic languages and their relationship to geographic distance. In Gooskens & van Bezooijen (eds.), 99–137.Google Scholar

Heeringa, Wilbert, Kleiweg, Peter, Gooskens, Charlotte & Nerbonne, John. 2006. Evaluation of string distance algorithms for dialectology. In Nerbonne, John & Hinrichs, Erhard (eds.), Linguistic Distances Workshop at the Joint Conference of International Committee on Computational Linguistics and the Association for Computational Linguistics, 51–62. The Association for Computational Linguistics (ACL).Google Scholar

Ivanova, Vera F. 1991. Sovremennaja russkaja orfografija [Contemporary Russian orthography]. Moskva: Vysšaja škola.Google Scholar

Jágrová, Klára, Stenger, Irina, Marti, Roland & Avgustinova, Tania. 2017. Lexical and orthographic distances between Bulgarian, Czech, Polish, and Russian: A comparative analysis of the most frequent nouns. In Emonds, Joseph & Janebová, Markéta (eds.), Language Use and Linguistic Structure: Proceedings of the Olomouc Linguistics Colloquium 2016, 401–416. Olomouc: Palacký University.Google Scholar

Jensen, John B. 1989. On the mutual intelligibility of Spanish and Portuguese. Hispania 72 (4), 848–852.Google Scholar

Joshi, R. Malatesha & Aaron, P. G.. 2006. Introduction to the volume. In Joshi, R. Malatesha & Aaron, P. G. (eds.), Handbook of Orthography and Literacy, xiii–xiv. Mahwah, NJ & London: Lawrence Erlbaum.Google Scholar

Kazojć, Jerzy. 2010. Otwarty słownik czesko-polski [Open Czech–Polish dictionary], V.03.2010 (c). http://www.slowniki.org.pl/czesko-polski.pdf (accessed 22 April 2015).Google Scholar

Kempgen, Sebastian. 2009. Phonetik, Phonologie, Orthographie, Flexionsmorphologie. In Kempgen et al. (eds.), 1–14.Google Scholar

Kempgen, Sebastian, Kosta, Peter, Berger, Tilman & Gutschmidt, Karl (eds.). 2009. The Slavic Languages: An International Handbook of their Structure, their History and their Investigation, vol. 1. Berlin & New York: Walter de Gruyter.Google Scholar

Kravchenko, Alexander V. 2009. The experiential basis of speech and writing as different cognitive domains. Pragmatics & Cognition 17 (3), 527–548.Google Scholar

Křen, Michal. 2010. Srovnávací frekvenční seznamy [Comparative frequency lists]. Prague: Institute of the Czech National Corpus Faculty of Arts, Charles University Prague. http://ucnk.ff.cuni.cz/index.php (accessed 11 September 2016).Google Scholar

Kučera, Karel. 2009. The orthographic principles in the Slavic languages: Phonetic/phonological. In Kempgen et al. (eds.), 70–76.Google Scholar

Kürschner, Sebastian, van Bezooijen, Renée & Gooskens, Charlotte. 2008. Linguistic determinants of the intelligibility of Swedish words among Danes. International Journal of Humanities and Arts Computing 2 (1/2), 83–100.Google Scholar

Levenshtein, Vladimir I. 1966. Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady 10 (8), 707–710.Google Scholar

Ljaševskaja, Ol'ga N. & Šarov, Sergej A.. 2009. Častotnyj slovar’ sovremennogo russkogo jazyka [Frequency dictionary of the contemporary Russian language]. Moskva: Azbukovnik.Google Scholar

Marti, Roland. 2014. Historische Graphematik des Slavischen: Glagolitische und kyrillische Schrift. In Gutschmidt, Karl, Kempgen, Sebastian, Berger, Tilman & Kosta, Peter (eds.), The Slavic Languages: An International Handbook of their Structure, their History and their Investigation, vol. 2, 1497–1514. Berlin & New York: Walter de Gruyter.Google Scholar

Maslov, Jurij S. 1981. Grammatika bolgarskogo jazyka [A grammar of the Bulgarian language]. Moskva: Vysšaja škola.Google Scholar

Moberg, Jens, Gooskens, Charlotte, Nerbonne, John & Vaillette, Nathan. 2006. Conditional entropy measures intelligibility among related languages. In Dirix, Peter, Schuurman, Ineke, Vandeghinste, Vincent & Van Eynde, Frank (eds.), Computational Linguistics in the Netherlands 2006: Selected Papers from the 17th CLIN Meeting, 51–66. Utrecht: LOT.Google Scholar

Möller, Robert & Zeevaert, Ludger. 2015. Investigating word recognition in intercomprehension: Methods and findings. Linguistics 2015 53 (2), 313–352.Google Scholar

Musatov, Valerij N. 2012. Russkij jazyk. Fonetika, fonologija, orfoėpija, grafika, orfografija [The Russian language: Phonetics, phonology, orphoepy, graphetics, orthography]. Moskva: Izdatel'stvo ‘Flinta’.Google Scholar

Sampson, Geoffrey. 1985. Writing Systems: A Linguistic Introduction. Stanford, CA: Stanford University Press.Google Scholar

Schüppert, Anja & Gooskens, Charlotte. 2012. The role of extra-linguistic factors for receptive bilingualism: Evidence from Danish and Swedish pre-schoolers. International Journal of Bilingualism 16 (3), 332–347.Google Scholar

Schüppert, Anja, Hilton, Nanna H. & Gooskens, Charlotte. 2015. Swedish is beautiful, Danish is ugly? Investigating the link between language attitudes and spoken word recognition. Linguistics 53 (2), 375–403.CrossRef Google Scholar

Sgall, Petr. 2006. Towards a theory of phonemic orthography. In Sgall, Petr (ed.), Language in its Multifarious Aspects, 430–452. Prague: Charles University; Karolinum Press.Google Scholar

Shannon, Claude E. 1948. A mathematical theory of communication. Bell System Technical Journal 27 (379–423), 623–656.Google Scholar

Skorvid, Sergej S. 2005. Češskij jazyk [The Czech language]. In Moldovan, Aleksandr M., Skorvid, Sergej S., Kibrik, Andrej A., Rogova, Natal'ja V., Jakuškina, Ekaterina I., Žuravlёv, Anatolij F. & Tolstaja, Svetlana (eds.), Jazyki mira. Slavjanskie jazyki [The languages of the world: Slavic languages], 234–274. Moskva: Academia.Google Scholar

Smith, Nathaniel J. & Levy, Roger. 2013. The effect of word predictability on reading time is logarithmic. Cognition 128 (3), 302–319.Google Scholar

Stenger, Irina, Avgustinova, Tania & Marti, Roland. 2017. Levenshtein distance and word adaptation surprisal as methods of measuring mutual intelligibility in reading comprehension of Slavic languages. Computational Linguistics and Intellectual Technologies: International Conference ‘Dialogue 2017’ Proceedings. Issue 16 (23), vol. 1, 304–317.Google Scholar

Stenger, Irina, Jágrová, Klára, Fischer, Andrea & Avgustinova, Tania. In press. ‘Reading Polish with Czech eyes’ or ‘How Russian can a Bulgarian text be?’: Orthographic differences as an experimental variable in Slavic intercomprehension. In Kosta, Peter & Radeva-Bork, Teodora (eds.), Current Developments in Slavic Linguistics: Twenty Years After [preliminary title]. Frankfurt am Main: Peter Lang.Google Scholar

Ternes, Elmar & Vladimirova-Buhtz, Tatjana. 2010. Bulgarian. In IPA (ed.), Handbook of the International Phonetic Association: A Guide to the Use of the International Phonetic Alphabet, 55–57. Cambridge: Cambridge University Press.Google Scholar

Tribus, Myron. 1961. Thermostatics and Thermodynamics. Princeton, NJ: D. van Nostrand Company.Google Scholar

van Bezooijen, Renée & Gooskens, Charlotte. 2007. Interlingual text comprehension: Linguistic and extralinguistic determinants. In ten Thije, Jan D. & Zeevaert, Ludger (eds.), Receptive Multilingualism: Linguistic Analyses, Language Policies and Didactic Concepts, 249–264. Amsterdam: John Benjamins.Google Scholar

van Heuven, Vincent J., Gooskens, Charlotte & van Bezooijen, Renée. 2015. Introduction Micrela: Predicting mutual intelligibility between closely related languages in Europe. In Navracsics, Judit & Batyi, Szilvia (eds.), First and Second Language: Interdisciplinary Approaches (Studies in Psycholinguistics 6), 127–145. Budapest: Tinta konyvkiado.Google Scholar

Vanhove, Jan. 2016. The early learning of interlingual correspondences rules in receptive multilingualism. International Journal of Bilingualism 20 (5), 580–593.Google Scholar

Vanhove, Jan & Berthele, Raphael. 2015a. The lifespan development of cognate guessing skills in an unknown related language. International Review of Applied Linguistics in Language Teaching 53 (1), 1–38.Google Scholar

Vanhove, Jan & Berthele, Raphael. 2015b. Item-related determinants of cognate guessing in multilinguals. In De Angelis, Gessica, Jessner, Ulrike & Kresić, Marija (eds.), Crosslinguistic Influence and Crosslinguistic Interaction in Multilingual Language Learning, 95–118. London: Bloomsbury.Google Scholar

Vasmer, Max. 1973. Ėtimologičeskij slovar’ russkogo jazyka [Etymological dictionary of the Russian language]. Moskva: Progress.Google Scholar

Yanushevskaya, Irena & Bunčić, Daniel. 2015. Russian. Journal of the International Phonetic Association 45 (2), 221–228.CrossRef Google Scholar

Žuravlev, Anatolij F. (ed.). 1974–2012. Ėtimologičeskij slovar’ slavjanskich jazykov. Praslavjanskij leksičeskij fond [Etymological dictionary of the Slavic languages: The Common Slavic lexical basis], vols. 1–37. Moskva: Nauka.Google Scholar

Article contents

Modeling the impact of orthographic coding on Czech–Polish and Bulgarian–Russian reading intercomprehension

Abstract

Keywords

Access options

References

CORPORA

REFERENCES

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests