Hostname: page-component-cd9895bd7-hc48f Total loading time: 0 Render date: 2024-12-26T08:32:44.352Z Has data issue: false hasContentIssue false

Using the crowd for readability prediction

Published online by Cambridge University Press:  14 December 2012

ORPHÉE DE CLERCQ
Affiliation:
Faculty of Applied Language Studies, University College Ghent, Ghent, Belgium e-mail: orphee.declercq@hogent.be, veronique.hoste@hogent.be, bart.desmet@hogent.be, philip.vanoosten@hogent.be, lieve.macken@hogent.be Department of Applied Mathematics and Computer Science, Ghent University, Ghent, Belgium e-mail: martine.decock@ugent.be
VÉRONIQUE HOSTE
Affiliation:
Faculty of Applied Language Studies, University College Ghent, Ghent, Belgium e-mail: orphee.declercq@hogent.be, veronique.hoste@hogent.be, bart.desmet@hogent.be, philip.vanoosten@hogent.be, lieve.macken@hogent.be Department of Linguistics, Ghent University, Ghent, Belgium
BART DESMET
Affiliation:
Faculty of Applied Language Studies, University College Ghent, Ghent, Belgium e-mail: orphee.declercq@hogent.be, veronique.hoste@hogent.be, bart.desmet@hogent.be, philip.vanoosten@hogent.be, lieve.macken@hogent.be Department of Applied Mathematics and Computer Science, Ghent University, Ghent, Belgium e-mail: martine.decock@ugent.be
PHILIP VAN OOSTEN
Affiliation:
Faculty of Applied Language Studies, University College Ghent, Ghent, Belgium e-mail: orphee.declercq@hogent.be, veronique.hoste@hogent.be, bart.desmet@hogent.be, philip.vanoosten@hogent.be, lieve.macken@hogent.be Department of Applied Mathematics and Computer Science, Ghent University, Ghent, Belgium e-mail: martine.decock@ugent.be
MARTINE DE COCK
Affiliation:
Department of Applied Mathematics and Computer Science, Ghent University, Ghent, Belgium e-mail: martine.decock@ugent.be
LIEVE MACKEN
Affiliation:
Faculty of Applied Language Studies, University College Ghent, Ghent, Belgium e-mail: orphee.declercq@hogent.be, veronique.hoste@hogent.be, bart.desmet@hogent.be, philip.vanoosten@hogent.be, lieve.macken@hogent.be Department of Applied Mathematics and Computer Science, Ghent University, Ghent, Belgium e-mail: martine.decock@ugent.be

Abstract

While human annotation is crucial for many natural language processing tasks, it is often very expensive and time-consuming. Inspired by previous work on crowdsourcing, we investigate the viability of using non-expert labels instead of gold standard annotations from experts for a machine learning approach to automatic readability prediction. In order to do so, we evaluate two different methodologies to assess the readability of a wide variety of text material: A more traditional setup in which expert readers make readability judgments and a crowdsourcing setup for users who are not necessarily experts. To this purpose two assessment tools were implemented: a tool where expert readers can rank a batch of texts based on readability, and a lightweight crowdsourcing tool, which invites users to provide pairwise comparisons. To validate this approach, readability assessments for a corpus of written Dutch generic texts were gathered. By collecting multiple assessments per text, we explicitly wanted to level out readers' background knowledge and attitude. Our findings show that the assessments collected through both methodologies are highly consistent and that crowdsourcing is a viable alternative to expert labeling. This is a good news as crowdsourcing is more lightweight to use and can have access to a much wider audience of potential annotators. By performing a set of basic machine learning experiments using a feature set that mainly encodes basic lexical and morpho-syntactic information, we further illustrate how the collected data can be used to perform text comparisons or to assign an absolute readability score to an individual text. We do not focus on optimising the algorithms to achieve the best possible results for the learning tasks, but carry them out to illustrate the various possibilities of our data sets. The results on different data sets, however, show that our system outperforms the readability formulas and a baseline language modelling approach. We conclude that readability assessment by comparing texts is a polyvalent methodology, which can be adapted to specific domains and target audiences if required.

Type
Articles
Copyright
Copyright © Cambridge University Press 2012 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Anderson, R. C., and Davison, A. 1986. Conceptual and empirical bases of readability formulas. Technical Report 392, University of Illinois at Urbana-Champaign, Urbana, IL, USA.Google Scholar
Bailin, A., and Grafstein, A. 2001. The linguistic assumptions underlying readability formulae: a critique. Language & Communication 21 (3): 285301.Google Scholar
Berger, A., Caruana, R., Cohn, D., Freitag, D., and Mittal, V. 2000. Bridging the lexical chasm: statistical approaches to answer finding. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'00), New York, USA, pp. 192–9.Google Scholar
Brouwer, R. H. M. 1963. Onderzoek naar de leesmoeilijkheden van Nederlands proza. Pedagogische Studiën 40: 454–64.Google Scholar
Church, K., and Hanks, P. 1990. Word association norms, mutual information, and lexicography. Computational Linguistics 16 (1): 22–9.Google Scholar
Collins-Thompson, K., and Callan, J. 2004. A language modeling approach to predicting reading difficulty. In Proceedings of HLT/NAACL 2004, Boston, MA, USA.Google Scholar
Collins-Thompson, K., and Callan, J. 2005. Predicting reading difficulty with statistical language models. Journal of the American Society for Information Science and Technology 56: 1448–62.Google Scholar
Cox, E. P. 1980. The optimal number of response alternatives for a scale: a review. Journal of Marketing Research 17 (4): 407–22.Google Scholar
Dale, E., and Chall, J. S. 1948. A formula for predicting readability. Educational Research Bulletin 27: 1120.Google Scholar
Davison, A., and Kantor, R. N. 1982. On the failure of readability formulas to define readable texts: a case study from adaptations. Reading Research Quarterly 17 (2): 187209.Google Scholar
Douma, W. 1960. De leesbaarheid van landbouwbladen: een onderzoek naar en een toepassing van leesbaarheidsformules. Bulletin, 17.Google Scholar
DuBay, W. H. 2004. The Principles of Readability. Costa Mesa, CA: Impact Information.Google Scholar
DuBay, W. H. (ed.) 2007. Unlocking Language: The Classic Readability Sstudies. Costa Mesa, CA: BookSurge.Google Scholar
Feng, L., Elhadad, N., and Huenerfauth, M. 2009. Cognitively motivated features for readability assessment. In Proceedings of the 12th Conference of the European Chapter of the ACL, Boulder, CO, USA, pp. 229–37.Google Scholar
Feng, L., Jansche, M., Huenerfauth, M., and Elhadad, N. 2010. A comparison of features for automatic readability assessment. In Proceedings of COLING 2010, Poster Vol. 23–27, Beijing, China, pp. 276–84.Google Scholar
Finin, T., Murnane, W., Karandikar, A., Keller, N., Martineau, J., and Dredze, M. 2010. Annotating named entities in Twitter data with crowd sourcing. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon(tm)s Mechanical Turk, Los Angeles, CA, USA, pp. 80–8.Google Scholar
Flesch, R. 1948. A new readability yardstick. Journal of Applied Psychology 32 (3): 221–33.Google Scholar
François, T. 2009. Combining a statistical language model with logistic regression to predict the lexical and syntactic difficulty of texts for FFL. In Proceedings of the EACL 2009 Student Research Workshop, Athens, Greece.Google Scholar
Graesser, A. C., McNamara, D. S., Louwerse, M. M., and Cai, Z. 2004. Coh-Metrix: analysis of text on cohesion and language. Behavior Research Methods, Instruments and Computers 36: 193202.Google Scholar
Gunning, R. 1952. The Technique of Clear Writing. New York: McGraw-Hill.Google Scholar
Heilman, M., Collins-Thompson, K., and Eskenazi, M. 2008. An analysis of statistical models and features for reading difficulty prediction. In The Third Workshop on Innovative Use of NLP for Building Educational Applications, Columbus, OH, USA.Google Scholar
Hoste, V., Vanopstal, K., Lefever, E., and Delaere, I. 2010. Classification-based scientific term detection in patient information. Terminology 16: 129.Google Scholar
Kanungo, T., and Orr, D. 2009. Predicting the readability of short web summaries. In Proceedings of the Second ACM International Conference on Web Search and Data Mining (WSDM'09), New York, NY, USA, pp. 202–11.Google Scholar
Kate, R. J., Luo, X., Patwardhan, S., Franz, M., Florian, R., Mooney, R. J., Roukos, S., and Welty, C. 2010. Learning to predict readability using diverse linguistic features. In Proceedings of the 23rd International Conference on Computational Linguistics, Beijing, China.Google Scholar
Kincaid, J. P., Jr., R. P. F., Rogers, R. L., and Chissom, B. S. 1975. Derivation of new readability formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for navy-enlisted personnel. Technical Report, Naval Technical Training Command Millington Tenn Research Branch, Department of Navy, Washington, DC.Google Scholar
Kraf, R., and Pander Maat, H. 2009. Leesbaarheidsonderzoek: oude problemen, nieuwe kansen. Tijdschrift voor Taalbeheersing 31 (2): 97123.Google Scholar
Leroy, G., and Endicott, J. 2011. Term familiarity to indicate perceived and actual difficulty of text in medical digital libraries. In Proceeding of the International Conference on Asia-Pacific Digital Libraries (ICADL 2011), Beijing, China.Google Scholar
Leroy, G., Helmreich, S., and Cowie, J. 2010. The influence of text characteristics on perceived and actual difficulty of health information. International Journal of Medical Informatics 79 (6): 438–49.CrossRefGoogle ScholarPubMed
McNamara, D. S., Kintsch, E., Songer, N. B., and Kintsch, W. 1993. Are good texts always better? Interactions of text coherence, background knowledge, and levels of understanding in learning from text. Technical Report, Institute of Cognitive Science, University of Colorado, Boulder, CO, USA.Google Scholar
Petersen, S., and Ostendorf, M. 2009. A machine learning approach to reading level assessment. Computer Speech & Language, 23 (1): 89106.Google Scholar
Pitler, E., and Nenkova, A. 2008. Revisiting readability: a unified framework for predicting text quality. In Proceedings of EMNLP, Waikiki, HI, USA, pp. 186–95.Google Scholar
Poesio, M., Kruschwitz, U., and Chamberlain, J. 2008. ANAWIKI: creating anaphorically annotated resources through web cooperation. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco, European Language Resources Association (ELRA).Google Scholar
Rankin, E. F. 1959. The cloze procedure: its validity and utility. Eighth Yearbook of the National Reading Conference 8: 131–44.Google Scholar
Rayson, P., and Garside, R. 2000. Comparing corpora using frequency profiling. In Proceedings of the Workshop on Comparing Corpora, 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong, China, pp. 16.Google Scholar
Salton, G. 1989. Automatic Text Processing: The Transformation, Analysis and Retrieval of Information by Computer. Boston, MA: Addison Wesley Longman.Google Scholar
Schuurman, I., Hoste, V., and Monachesi, P. 2010. Interacting semantic layers of annotation in SoNaR, a reference corpus of contemporary written Dutch. In Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta, European Language Resources Association (ELRA), pp. 2471–7.Google Scholar
Schwarm, S. E., and Ostendorf, M. 2005. Reading level assessment using support vector machines and statistical language models. In Proceedings of the 43rd Annual Meeting of the ACL, Ann Arbor, MI, pp. 523–30.Google Scholar
Si, L., and Callan, J. 2001. A statistical model for scientific readability. In Proceedings of the 10th International Conference on Information Knowledge Management, Atlanta, GA, USA, pp. 574–6.Google Scholar
Snow, R., O'Connor, B., Jurafsky, D., and Ng, A. Y. 2008. Cheap and fast – but is it good?: evaluating non-expert annotations for natural language tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, October 25–27, Honolulu, HI, USA, pp. 254–63.Google Scholar
Staphorsius, G. 1994. Leesbaarheid en Leesvaardigheid. De Ontwikkeling van een Domeingericht Meetinstrument. Arnhem, Netherlands: Cito.Google Scholar
Staphorsius, G., and Krom, R. S. 1985. Cito Leesbaarheidsindex voor het Basisonderwijs: Verslag van een Leesbaarheidsonderzoek. Number 36 in Specialistisch Bulletin. Arnhem, Netherlands: Cito.Google Scholar
Stolcke, A. 2002. Srilm – an extensible language modeling toolkit. In Proceedings of the International Conference on Spoken Language Processing (ICSLP-2002), Denver, CO, USA.Google Scholar
Tanaka-Ishii, K., Tezuka, S., and Terada, H. 2010. Sorting texts by readability. Computational Linguistics 36 (2): 203–27.Google Scholar
van den Bosch, A., Busser, B., Daelemans, W., and Canisius, S. 2007. An efficient memory-based morphosyntactic tagger and parser for Dutch. In Selected Papers of the 17th Computational Linguistics in the Netherlands Meeting, Leuven, Belgium, pp. 191206.Google Scholar
van Noord, G. J., Bouma, G., van Eynde, F., de Kok, D., van der Linde, J., Schuurman, I., Sang, E. T. K., and Vandeghinste, V. 2012. Large-scale syntactic annotation of written Dutch: LASSY. In Essential Speech and Language Technology for Dutch. Series: Theory and Applications of Natural Language Processing. New York: Springer, pp. 147164.Google Scholar
van Oosten, P., Tanghe, D., and Hoste, V. 2010. Towards an improved methodology for automated readability prediction. In Proceedings of the seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta, European Language Resources Association.Google Scholar
vor der Brück, T., Hartrumpf, S., and Helbig, H. 2008. A readability checker with supervised learning using deep indicators. Informatica 4: 429–35.Google Scholar
Zeng, Q., Goryachev, S., Tse, T., Keselman, A., and Boxwala, A. 2008. Estimating consumer familiarity with health terminology: a context-based approach. JAMIA Journal of the American Medical Informatics Association 15 (3): 349–56.Google Scholar