Hostname: page-component-cd9895bd7-hc48f Total loading time: 0 Render date: 2024-12-26T08:58:26.449Z Has data issue: false hasContentIssue false

SCESS: a WFSA-based automated simplified chinese essay scoring system with incremental latent semantic analysis

Published online by Cambridge University Press:  30 October 2014

SHUDONG HAO
Affiliation:
School of Information Science and Technology, Beijing Forestry University, Beijing, China email: shudongh@acm.org
YANYAN XU*
Affiliation:
School of Information Science and Technology, Beijing Forestry University, Beijing, China email: shudongh@acm.org
DENGFENG KE
Affiliation:
Institute of Automation, Chinese Academy of Sciences, Beijing, China email: dengfeng.ke@ia.ac.cn
KAILE SU
Affiliation:
Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia email: k.su@griffith.edu.au
HENGLI PENG
Affiliation:
Institute of Educational Measurement, Beijing Language and Culture University, Beijing, China email: penghl6402@aliyun.com
*
Corresponding author: xuyyxu@gmail.com

Abstract

Writing in language tests is regarded as an important indicator for assessing language skills of test takers. As Chinese language tests become popular, scoring a large number of essays becomes a heavy and expensive task for the organizers of these tests. In the past several years, some efforts have been made to develop automated simplified Chinese essay scoring systems, reducing both costs and evaluation time. In this paper, we introduce a system called SCESS (automated Simplified Chinese Essay Scoring System) based on Weighted Finite State Automata (WFSA) and using Incremental Latent Semantic Analysis (ILSA) to deal with a large number of essays. First, SCESS uses an n-gram language model to construct a WFSA to perform text pre-processing. At this stage, the system integrates a Confusing-Character Table, a Part-Of-Speech Table, beam search and heuristic search to perform automated word segmentation and correction of essays. Experimental results show that this pre-processing procedure is effective, with a Recall Rate of 88.50%, a Detection Precision of 92.31% and a Correction Precision of 88.46%. After text pre-processing, SCESS uses ILSA to perform automated essay scoring. We have carried out experiments to compare the ILSA method with the traditional LSA method on the corpora of essays from the MHK test (the Chinese proficiency test for minorities). Experimental results indicate that ILSA has a significant advantage over LSA, in terms of both running time and memory usage. Furthermore, experimental results also show that SCESS is quite effective with a scoring performance of 89.50%.

Type
Articles
Copyright
Copyright © Cambridge University Press 2014 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Attali, Y., and Burstein, J. 2006. Automated essay scoring with e-rater v.2.0. Journal of Technology, Learning, and Assessment 4 (3). Available at http://www.jtla.org/.Google Scholar
Brand, M. 2002. Incremental singular value decomposition of uncertain data with missing values. In Heyden, A., Sparr, G., Nielsen, M., and Johansen, P. (eds.) Proceedings of the 2002 European Conference on Computer Vision (ECCV 2002), Copenhagen, Denmark. Springer Lecture Notes in Computer Science vol. 2350, Berlin: Springer Verlag, pp. 707720.CrossRefGoogle Scholar
Brand, M. 2003. Fast online SVD revisions for lightweight recommender systems. In Barbar, D. and Kamath, C. (eds.) Proceedings of the 3rd SIAM International Conference on Data Mining 2003, San Francisco, CA, USA: SIAM pp. 3746.Google Scholar
Burstein, J. 2003. The e-rater scoring engine: automated essay scoring with natural language processing. In Shermis, M. D. and Burstein, J. (eds.) Automated Essay Scoring: A Cross-Disciplinary Perspective, Mahwah, NJ: Lawrence Erlbaum Associates, pp. 113121.Google Scholar
Burstein, J., and Chodorow, M. 2010. Progress and new directions in technology for automated essay evaluation. In Kaplan, R. B. (eds.) The Oxford Handbook of Applied Linguistics. 2nd ed.n, Oxford: Oxford University Press, pp. 487497.Google Scholar
Cao, Y. W., and Chen, Y. 2007. Automated Chinese essay scoring with latent semantic analysis. Examinations Research vol. 3(1), Tianjin: Tianjin People’s Press, pp. 6371.Google Scholar
Chang, C. C., and Lin, C. J. 2011. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2 (3): 27:1–27:27. New York, NY: Association for Computing Machinery (ACM).CrossRefGoogle Scholar
Chang, T. H., Lee, C. H., and Tam, H. P. 2007. On developing techniques for automated Chinese essay scoring: a case in ACES system. Paper presented at the Forum for Educational Evaluation in East Asia.Google Scholar
Chang, T. H., Lee, C. H., Tsai, P. Y., and Tam, H. P. 2009. Automated essay scoring using set of literary sememes. Information: An International Interdisciplinary Journal 12 (2): 351357. Tokyo, Japan: International Information Institute.Google Scholar
Chang, T. H., Chen, H. C., Tseng, Y. H., and Zheng, J. L. 2013. Automatic detection and correction for Chinese misspelled words using phonological and orthographic similarities. In Proceedings of the Seventh SIGHAN Workshop on Chinese Language Processing (ACL-SIGHAN 2013). Nagoya, Japan: Asian Federation of Natural Language Processing, pp. 97101.Google Scholar
Chang, T. H., Sung, Y. T., and Lee, Y. T. 2013. Evaluating the difficulty of concepts on domain knowledge using latent semantic analysis. In Proceedings of International Conference on Asian Language Processing. Urumqi, China: Washington DC: IEEE Computer Society Press, pp. 193196.Google Scholar
Chen, Z. P., Lv, Y. Q., Liu, H. S., and Tu, H. 2009. Chinese spelling correction in search engines based on n-gram model. Journal of China Academy of Electronics and Information Technology 4 (3): 323326. Beijing: China Academy of Electronics and Information Technology.Google Scholar
Chin, T. J., Schindler, K., and Suter, D. 2006. Incremental kernel SVD for face recognition with image sets. In Proceeding of the 7th International Conference on Automatic Face and Gesture Recognition, Southampton, UK, Washington DC: IEEE Computer Society Press, pp. 461466.Google Scholar
Elliot, S. 2003. Intellimetric TM: from here to validity. In Shermis, M. D. and Burstein, J. (eds.) Automated Essay Scoring: A Cross-Disciplinary Perspective, Mahwah, NJ: Lawrence Erlbaum Associates, pp. 7186.Google Scholar
Foltz, P. W., Laham, D. and Landauer, T. K. 1999. The intelligent essay assessor: Applications to educational technology. Interactive Multimedia Electronic Journal of Computer Enhanced Learning 1 (2). Winston-Salem, NC: Wake Forest University.Google Scholar
Gorrell, G. 2006. Generalized hebbian algorithm for incremental singular value decomposition in natural language processing. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, Trento, Italy, Stroudsburg, PA: Association for Computational Linguistics (ACL), pp. 97104.Google Scholar
Hao, S. D., Gao, Z. T., Zhang, M. Q., Xu, Y. Y., Peng, H. L., Ke, D. F., and Su, K. L. 2013. Automated error detection and correction of Chinese characters in written essays based on weighted finite-state transducer. In Proceedings of the 12th International Conference on Document Analysis and Recognition 2013 (ICDAR 2013), Washington DC, USA, Washington DC: IEEE Computer Society Press, pp. 763767.Google Scholar
Hao, S. D., Xu, Y. Y., Peng, H. L., Su, K. L., and Ke, D. F. 2014. Automated Chinese essay scoring from topic perspective using regularized latent semantic indexing. In Proceedings of the 22nd International Conference on Pattern Recognition 2014 (ICPR 2014), Stockholm, Sweden, Washington DC: IEEE Computer Society Press, pp. 30923097.Google Scholar
Ismail, S., Othman, R. M., and Kasim, S. 2011. Pairwise protein substring alignment with latent semantic analysis and support vector machines to detect romote protein homology. In Ubiquitous Computing and Multimedia Applications, Berlin: Springer Verlag, pp. 526546.Google Scholar
Jin, Y., Gao, Y., Shi, Y., Shang, L., Wang, R., and Yang, Y. 2011. P2lsa and p2lsa+: tow paralleled probabilistic latent semantic analysis algorithms based on the mapreduce model. In Fyfe, C., Tino, P., Charles, D., Garca-Osorio, C., and Yin, H. (eds.) Intelligent Data Engineering and Automated Learning, Springer Lecture Notes in Computer Science, Berlin: Springer Verlag, pp. 385393.Google Scholar
Ke, D. F., Peng, X. Y., Zhao, Z., Chen, Z. B., and Wang, J. S. 2011. Word-level-based automated Chinese essay scoring method. In Proceedings of National Conference on Man-Machine Speech Communication, Xi’an, China, Beijing: Chinese Information Processing Society of China, pp. 5759.Google Scholar
Landauer, T. K., Foltz, P. W., and Laham, D. 1998. An introduction to latent semantic analysis. Discourse Processes 25 (2-3): 259284. London: Routledge.CrossRefGoogle Scholar
Leacock, C., Chodorow, M., Gamon, M., and Tetreault, J. 2010 Automated grammatical error detection for language learners. Princeton, NJ: Morgan & Claypool Publishers.CrossRefGoogle Scholar
Li, C., Peng, X. Y., and Zhao, J. 2011. Research on assisted scoring system for Chinese proficiency test for minority. Journal Chinese Information Processing 25 (5): 120127. Beijing: Chinese Information Processing Society of China.Google Scholar
Li, Y. N. 2006. Automated essay scoring for testing Chinese as a second language. PhD thesis, Beijing: Beijing Language and Culture University.Google Scholar
Liu, C. H., Wang, Y. C., and Liu, D. R. 2007. Using LSA and text segmentation to improve automatic Chinese dialogue text summarization. Journal of Zhejiang University Science A 8 (1): 7987. Zhejiang: Zhejiang University.CrossRefGoogle Scholar
Lv, S. X. (1999). Eight hundred words in modern Chinese. Beijing: The Commercial Press (Beijing) Ltd.Google Scholar
Ma, J. S., Zhang, Y., Liu, T., and Li, S. 2004. Detecting Chinese text errors based on trigram and dependency parsing. Journal of The China Society For Scientific and Technical Information 6. Beijing: Science and Technology Information Society of China.Google Scholar
McInerney, J., Rogers, A., and Jennings, N. R. 2012. Improving location prediction services for new users with probabilistic latent semantic analysis. In Proceedings of the 2012 ACM Conference on Ubiquitous Computing, New York, NY: Association for Computing Machinery (ACM), pp. 906910.CrossRefGoogle Scholar
Mesaros, A., Heittola, T., and Klapuri, A. 2011. Latent semantic analysis in sound event detection. In Proceedings of the 19th European Signal Processing Conference (EUSIPCO 2011), Barcelona, Spain, The European Association for Signal Processing, pp. 1307–1311.Google Scholar
Nakov, P., Popova, A., and Mateev, P. 2001. Weight functions impact on LSA performance. In Proceedings of EuroConference Recent Advances in NLP (RANLP 2001), Tzigov Chark, Bulgaria, Stroudsburg, PA: Association for Computational Linguistics (ACL), pp. 187193.Google Scholar
Page, E. B. 1994. Computer grading of student prose, using modern concepts and software. Journal of Experimental Education 62 (2): 127142. London: Taylor & Francis, Ltd. UK.CrossRefGoogle Scholar
Pan, H., and Yan, J. 2009. An algorithm of text automatic proofreading based on chinese word segmentation. Journal of Wuhan University of Technology 31 (3): 1828. Wuhan: Wuhan University of Technology.Google Scholar
Peng, H. L. 2005. The minorities-oriented Chinese level test. China Examinations 10: 5759. Beijing: China Examinations.Google Scholar
Peng, X. J., and Wang, Y. F. 2009. CCH-based geometric algorithms for SVM and applications. Applied Mathematics and Mechanics 30 (1): 89100. Berlin: Springer Verlag.CrossRefGoogle Scholar
Peng, H. L., and Yu, Y. Y. 2013. Research on controlling central rating in net-based scoring of subjective test items. China Examinations 6: 39. Beijing: China Examinations.Google Scholar
Peng, X. Y., Ke, D. F., and Xu, B. 2012. Automated essay scoring based on finite state transducer: towards ASR transcription of oral English speech. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju, Korea, Stroudsburg, PA: Association for Computational Linguistics (ACL), pp. 5059.Google Scholar
Ramineni, C., Trapani, C. S., Williamson, D. M., Davey, T., and Bridgeman, B. 2012. Evaluation of the e-rater Scoring Engine for the TOEFL Independent and Integrated Prompts. Research Report ETS RR-12-06, http://www.ets.org/Media/Research/pdf/RR-12-06.pdf.CrossRefGoogle Scholar
Rosenfeld, R. 1994. Adaptive statistical language modeling a maximum entropy approach. PhD thesis, CMU-CS-94-138, Pittsburgh, PA: Carnegie Mellon Universiy.CrossRefGoogle Scholar
Rudner, L. M., and Liang, H. 2002. Automated essay scoring using Bayes theorem. In Journal of Technology, Learning, and Assessment, 1 (2). Available at http://www.jtla.org/.Google Scholar
Sarwar, B., Karypis, G., Konstan, J., and Riedl, J. 2002. Incremental singular value decomposition algorithms for highly scalable recommender systems. In Proceedings of the 5th Conference on Computer and Information Science, Greece, Athens, Washington DC: IEEE Computer Society Press, pp. 2728.Google Scholar
Shermis, M. D., and Burstein, J. C., 2003. Automated esssay scoring: a cross-disciplinary perspective. Mahwah, NJ: Lawrence Erlbaum Associates.CrossRefGoogle Scholar
Stolcke, A. 2002. SRILM - An extensible language modeling toolkit. In Proceedings of the International Conference on Spoken Language Processing (INTERSPEECH 2002), Denver, Colorado, USA, Washington DC: IEEE Computer Society Press, pp. 901904.Google Scholar
Teahan, W. J., Wen, Y., McNab, R. J., and Witten, I. H. 2000. A compression-based algorithm for Chinese word segmentation. Computational Linguistics, 26 (3): 375393. Cambridge, MA: MIT Press.CrossRefGoogle Scholar
Tonta, Y., and Darvish, H. R. 2010. Diffusion of latent semantic analysis as a research tool: a social network analysis approach. Journal of Informetrics 4 (2): 166174. Philadelphia, PA: Elsevier.CrossRefGoogle Scholar
Steinbiss, V., Tran, B., and Ney, H. 1994. Improvements in beam search. In Proceedings of the 3rd International Conference on Spoken Language Processing (ICSLP 1994), Yokohama, Japan. Washington DC: IEEE Computer Society Press.Google Scholar
Wang, D. H., and Liu, C. L. 2011. Dynamic text line segmentation for real-time recognition fo Chinese handwritten sentences. In Proceedings of the 11th International Conference on Document Analysis and Recognition 2011 (ICDAR 2011), Beijing, China, Washington DC: IEEE Computer Society Press, pp. 931935.CrossRefGoogle Scholar
Wang, L., and Wan, Y. 2011. Sentiment classification of documents based on latent semantic analysis. In Advanced Research on Computer Education, Simulation and Modeling, Berlin: Springer Verlag, pp. 356361.CrossRefGoogle Scholar
Wang, Q., Xu, J., Li, H., and Craswell, N. 2013. Regularized latent semantic indexing: a new approach to large-scale topic modeling. ACM Transactions on Information Systems, 31 (1): 5:1–5:44. New York, NY: Association for Computing Machinery (ACM).CrossRefGoogle Scholar
Wang, W., and Yu, B. 2009. Text categorization based on combination of modified back propagation neural network and latent semantic analysis. Neural Computing and Applications 18 (8): 875881. Berlin: Springer Verlag.CrossRefGoogle Scholar
Wild, F., Stahl, C., Stermsek, G., Neumann, G., and Penya, Y. 2005. Parameters driving effectiveness of automated essay scoring with LSA. In Proceedings of the 9th Computer Assisted Assessment Conference (CAA Conference 2005), Loughborough: Loughborough University, pp. 485494.Google Scholar
Wu, Y., Li, X. K., Liu, T., and Wang, K. Z. 2001. Research on and implementation of Chinese text proof-reading system. Journal of Harbin Institute of Technology 33 (1). Harbin: Harbin Institute of Technology.Google Scholar
Xu, Y. Y., and Yue, W. Y. 2009. A generalized framework for BDD-based replanning A* search. In Proceedings of the 10th International Conference on Software Engineering, Artificial Intelligences, Networking and Parallel/Distributed Computing, Daegu, Korea, Washington DC: IEEE Computer Society Press, pp. 133139.Google Scholar
Xu, Y. Y., Yue, W. Y., and Su, K. L. 2009. The BDD-based dynamic A* algorithm for real-time replanning. In Deng, X., Hopcroft, J. E. and Xue, J., (eds.) Proceedings of Frontiers in Algorithmics, Third International Workshop, Berlin: Springer Verlag, pp. 271282.Google Scholar
Yannakoudakis, H., Briscoe, T., and Medlock, B. 2011. A new dataset and method for automatically grading ESOL texts. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, Portland, Oregon, USA, Stroudsburg, PA: Association for Computational Linguistics (ACL), pp. 180189.Google Scholar
Yeh, J. Y., Ke, H. R., and Yang, W. P. 2002. Chinese text summarization using a trainable summarizer and latent semantic analysis. In Lim, E.-P., Foo, S., Khoo, C. S. G., Chen, H., Fox, E. A., Urs, S. R., Thanos, C. (eds.) Digital Libraries: People, Knowledge, and Technology, Springer 2002 Lecture Notes in Computer Science. Berlin: Springer Verlag, pp. 7687.CrossRefGoogle Scholar
Yue, W. Y., Xu, Y. Y., and Su, K. L. 2006. BDDRPA*: An Efficient BDD-Based Incremental Heuristic Search Algorithm for Replanning. In Sattar, A. and Kang, B. H. (eds.) Proceedings of Australian Conference on Artificial Intelligence, Berlin: Springer Verlag, pp. 627636.Google Scholar
Zhao, Y. H. 2011. Application of latent semantic analysis in auto-grading system. Journal of Yanbian University (Natural Science) 37 (4): 345348. Jilin: Yanbian University.Google Scholar