SCESS: a WFSA-based automated simplified chinese essay scoring system with incremental latent semantic analysis

SHUDONG HAO; YANYAN XU; DENGFENG KE; KAILE SU; HENGLI PENG

doi:10.1017/S1351324914000138

SCESS: a WFSA-based automated simplified chinese essay scoring system with incremental latent semantic analysis

Published online by Cambridge University Press: 30 October 2014

YANYAN XU ,

KAILE SU and

SHUDONG HAO: Affiliation:
School of Information Science and Technology, Beijing Forestry University, Beijing, China email: shudongh@acm.org
YANYAN XU*: Affiliation:
School of Information Science and Technology, Beijing Forestry University, Beijing, China email: shudongh@acm.org
DENGFENG KE: Affiliation:
Institute of Automation, Chinese Academy of Sciences, Beijing, China email: dengfeng.ke@ia.ac.cn
KAILE SU: Affiliation:
Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia email: k.su@griffith.edu.au
HENGLI PENG: Affiliation:
Institute of Educational Measurement, Beijing Language and Culture University, Beijing, China email: penghl6402@aliyun.com
*: †Corresponding author: xuyyxu@gmail.com

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Writing in language tests is regarded as an important indicator for assessing language skills of test takers. As Chinese language tests become popular, scoring a large number of essays becomes a heavy and expensive task for the organizers of these tests. In the past several years, some efforts have been made to develop automated simplified Chinese essay scoring systems, reducing both costs and evaluation time. In this paper, we introduce a system called SCESS (automated Simplified Chinese Essay Scoring System) based on Weighted Finite State Automata (WFSA) and using Incremental Latent Semantic Analysis (ILSA) to deal with a large number of essays. First, SCESS uses an n-gram language model to construct a WFSA to perform text pre-processing. At this stage, the system integrates a Confusing-Character Table, a Part-Of-Speech Table, beam search and heuristic search to perform automated word segmentation and correction of essays. Experimental results show that this pre-processing procedure is effective, with a Recall Rate of 88.50%, a Detection Precision of 92.31% and a Correction Precision of 88.46%. After text pre-processing, SCESS uses ILSA to perform automated essay scoring. We have carried out experiments to compare the ILSA method with the traditional LSA method on the corpora of essays from the MHK test (the Chinese proficiency test for minorities). Experimental results indicate that ILSA has a significant advantage over LSA, in terms of both running time and memory usage. Furthermore, experimental results also show that SCESS is quite effective with a scoring performance of 89.50%.

Information

Type: Articles
Information: Natural Language Engineering , Volume 22 , Issue 2 , March 2016 , pp. 291 - 319

DOI: https://doi.org/10.1017/S1351324914000138 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2014

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Attali, Y., and Burstein, J. 2006. Automated essay scoring with e-rater v.2.0. Journal of Technology, Learning, and Assessment 4 (3). Available at http://www.jtla.org/.Google Scholar

Brand, M. 2002. Incremental singular value decomposition of uncertain data with missing values. In Heyden, A., Sparr, G., Nielsen, M., and Johansen, P. (eds.) Proceedings of the 2002 European Conference on Computer Vision (ECCV 2002), Copenhagen, Denmark. Springer Lecture Notes in Computer Science vol. 2350, Berlin: Springer Verlag, pp. 707–720.CrossRef Google Scholar

Brand, M. 2003. Fast online SVD revisions for lightweight recommender systems. In Barbar, D. and Kamath, C. (eds.) Proceedings of the 3rd SIAM International Conference on Data Mining 2003, San Francisco, CA, USA: SIAM pp. 37–46.Google Scholar

Burstein, J. 2003. The e-rater scoring engine: automated essay scoring with natural language processing. In Shermis, M. D. and Burstein, J. (eds.) Automated Essay Scoring: A Cross-Disciplinary Perspective, Mahwah, NJ: Lawrence Erlbaum Associates, pp. 113–121.Google Scholar

Burstein, J., and Chodorow, M. 2010. Progress and new directions in technology for automated essay evaluation. In Kaplan, R. B. (eds.) The Oxford Handbook of Applied Linguistics. 2nd ed.n, Oxford: Oxford University Press, pp. 487–497.Google Scholar

Cao, Y. W., and Chen, Y. 2007. Automated Chinese essay scoring with latent semantic analysis. Examinations Research vol. 3(1), Tianjin: Tianjin People’s Press, pp. 63–71.Google Scholar

Chang, C. C., and Lin, C. J. 2011. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2 (3): 27:1–27:27. New York, NY: Association for Computing Machinery (ACM).CrossRef Google Scholar

Chang, T. H., Lee, C. H., and Tam, H. P. 2007. On developing techniques for automated Chinese essay scoring: a case in ACES system. Paper presented at the Forum for Educational Evaluation in East Asia.Google Scholar

Chang, T. H., Lee, C. H., Tsai, P. Y., and Tam, H. P. 2009. Automated essay scoring using set of literary sememes. Information: An International Interdisciplinary Journal 12 (2): 351–357. Tokyo, Japan: International Information Institute.Google Scholar

Chang, T. H., Chen, H. C., Tseng, Y. H., and Zheng, J. L. 2013. Automatic detection and correction for Chinese misspelled words using phonological and orthographic similarities. In Proceedings of the Seventh SIGHAN Workshop on Chinese Language Processing (ACL-SIGHAN 2013). Nagoya, Japan: Asian Federation of Natural Language Processing, pp. 97–101.Google Scholar

Chang, T. H., Sung, Y. T., and Lee, Y. T. 2013. Evaluating the difficulty of concepts on domain knowledge using latent semantic analysis. In Proceedings of International Conference on Asian Language Processing. Urumqi, China: Washington DC: IEEE Computer Society Press, pp. 193–196.Google Scholar

Chen, Z. P., Lv, Y. Q., Liu, H. S., and Tu, H. 2009. Chinese spelling correction in search engines based on n-gram model. Journal of China Academy of Electronics and Information Technology 4 (3): 323–326. Beijing: China Academy of Electronics and Information Technology.Google Scholar

Chin, T. J., Schindler, K., and Suter, D. 2006. Incremental kernel SVD for face recognition with image sets. In Proceeding of the 7th International Conference on Automatic Face and Gesture Recognition, Southampton, UK, Washington DC: IEEE Computer Society Press, pp. 461–466.Google Scholar

Elliot, S. 2003. Intellimetric TM: from here to validity. In Shermis, M. D. and Burstein, J. (eds.) Automated Essay Scoring: A Cross-Disciplinary Perspective, Mahwah, NJ: Lawrence Erlbaum Associates, pp. 71–86.Google Scholar

Foltz, P. W., Laham, D. and Landauer, T. K. 1999. The intelligent essay assessor: Applications to educational technology. Interactive Multimedia Electronic Journal of Computer Enhanced Learning 1 (2). Winston-Salem, NC: Wake Forest University.Google Scholar

Gorrell, G. 2006. Generalized hebbian algorithm for incremental singular value decomposition in natural language processing. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, Trento, Italy, Stroudsburg, PA: Association for Computational Linguistics (ACL), pp. 97–104.Google Scholar

Hao, S. D., Gao, Z. T., Zhang, M. Q., Xu, Y. Y., Peng, H. L., Ke, D. F., and Su, K. L. 2013. Automated error detection and correction of Chinese characters in written essays based on weighted finite-state transducer. In Proceedings of the 12th International Conference on Document Analysis and Recognition 2013 (ICDAR 2013), Washington DC, USA, Washington DC: IEEE Computer Society Press, pp. 763–767.Google Scholar

Hao, S. D., Xu, Y. Y., Peng, H. L., Su, K. L., and Ke, D. F. 2014. Automated Chinese essay scoring from topic perspective using regularized latent semantic indexing. In Proceedings of the 22nd International Conference on Pattern Recognition 2014 (ICPR 2014), Stockholm, Sweden, Washington DC: IEEE Computer Society Press, pp. 3092–3097.Google Scholar

Ismail, S., Othman, R. M., and Kasim, S. 2011. Pairwise protein substring alignment with latent semantic analysis and support vector machines to detect romote protein homology. In Ubiquitous Computing and Multimedia Applications, Berlin: Springer Verlag, pp. 526–546.Google Scholar

Jin, Y., Gao, Y., Shi, Y., Shang, L., Wang, R., and Yang, Y. 2011. P2lsa and p2lsa+: tow paralleled probabilistic latent semantic analysis algorithms based on the mapreduce model. In Fyfe, C., Tino, P., Charles, D., Garca-Osorio, C., and Yin, H. (eds.) Intelligent Data Engineering and Automated Learning, Springer Lecture Notes in Computer Science, Berlin: Springer Verlag, pp. 385–393.Google Scholar

Ke, D. F., Peng, X. Y., Zhao, Z., Chen, Z. B., and Wang, J. S. 2011. Word-level-based automated Chinese essay scoring method. In Proceedings of National Conference on Man-Machine Speech Communication, Xi’an, China, Beijing: Chinese Information Processing Society of China, pp. 57–59.Google Scholar

Landauer, T. K., Foltz, P. W., and Laham, D. 1998. An introduction to latent semantic analysis. Discourse Processes 25 (2-3): 259–284. London: Routledge.CrossRef Google Scholar

Leacock, C., Chodorow, M., Gamon, M., and Tetreault, J. 2010 Automated grammatical error detection for language learners. Princeton, NJ: Morgan & Claypool Publishers.CrossRef Google Scholar

Li, C., Peng, X. Y., and Zhao, J. 2011. Research on assisted scoring system for Chinese proficiency test for minority. Journal Chinese Information Processing 25 (5): 120–127. Beijing: Chinese Information Processing Society of China.Google Scholar

Li, Y. N. 2006. Automated essay scoring for testing Chinese as a second language. PhD thesis, Beijing: Beijing Language and Culture University.Google Scholar

Liu, C. H., Wang, Y. C., and Liu, D. R. 2007. Using LSA and text segmentation to improve automatic Chinese dialogue text summarization. Journal of Zhejiang University Science A 8 (1): 79–87. Zhejiang: Zhejiang University.CrossRef Google Scholar

Lv, S. X. (1999). Eight hundred words in modern Chinese. Beijing: The Commercial Press (Beijing) Ltd.Google Scholar

Ma, J. S., Zhang, Y., Liu, T., and Li, S. 2004. Detecting Chinese text errors based on trigram and dependency parsing. Journal of The China Society For Scientific and Technical Information 6. Beijing: Science and Technology Information Society of China.Google Scholar

McInerney, J., Rogers, A., and Jennings, N. R. 2012. Improving location prediction services for new users with probabilistic latent semantic analysis. In Proceedings of the 2012 ACM Conference on Ubiquitous Computing, New York, NY: Association for Computing Machinery (ACM), pp. 906–910.CrossRef Google Scholar

Mesaros, A., Heittola, T., and Klapuri, A. 2011. Latent semantic analysis in sound event detection. In Proceedings of the 19th European Signal Processing Conference (EUSIPCO 2011), Barcelona, Spain, The European Association for Signal Processing, pp. 1307–1311.Google Scholar

Nakov, P., Popova, A., and Mateev, P. 2001. Weight functions impact on LSA performance. In Proceedings of EuroConference Recent Advances in NLP (RANLP 2001), Tzigov Chark, Bulgaria, Stroudsburg, PA: Association for Computational Linguistics (ACL), pp. 187–193.Google Scholar

Page, E. B. 1994. Computer grading of student prose, using modern concepts and software. Journal of Experimental Education 62 (2): 127–142. London: Taylor & Francis, Ltd. UK.CrossRef Google Scholar

Pan, H., and Yan, J. 2009. An algorithm of text automatic proofreading based on chinese word segmentation. Journal of Wuhan University of Technology 31 (3): 18–28. Wuhan: Wuhan University of Technology.Google Scholar

Peng, H. L. 2005. The minorities-oriented Chinese level test. China Examinations 10: 57–59. Beijing: China Examinations.Google Scholar

Peng, X. J., and Wang, Y. F. 2009. CCH-based geometric algorithms for SVM and applications. Applied Mathematics and Mechanics 30 (1): 89–100. Berlin: Springer Verlag.CrossRef Google Scholar

Peng, H. L., and Yu, Y. Y. 2013. Research on controlling central rating in net-based scoring of subjective test items. China Examinations 6: 3–9. Beijing: China Examinations.Google Scholar

Peng, X. Y., Ke, D. F., and Xu, B. 2012. Automated essay scoring based on finite state transducer: towards ASR transcription of oral English speech. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju, Korea, Stroudsburg, PA: Association for Computational Linguistics (ACL), pp. 50–59.Google Scholar

Ramineni, C., Trapani, C. S., Williamson, D. M., Davey, T., and Bridgeman, B. 2012. Evaluation of the e-rater Scoring Engine for the TOEFL Independent and Integrated Prompts. Research Report ETS RR-12-06, http://www.ets.org/Media/Research/pdf/RR-12-06.pdf.CrossRef Google Scholar

Rosenfeld, R. 1994. Adaptive statistical language modeling a maximum entropy approach. PhD thesis, CMU-CS-94-138, Pittsburgh, PA: Carnegie Mellon Universiy.CrossRef Google Scholar

Rudner, L. M., and Liang, H. 2002. Automated essay scoring using Bayes theorem. In Journal of Technology, Learning, and Assessment, 1 (2). Available at http://www.jtla.org/.Google Scholar

Sarwar, B., Karypis, G., Konstan, J., and Riedl, J. 2002. Incremental singular value decomposition algorithms for highly scalable recommender systems. In Proceedings of the 5th Conference on Computer and Information Science, Greece, Athens, Washington DC: IEEE Computer Society Press, pp. 27–28.Google Scholar

Shermis, M. D., and Burstein, J. C., 2003. Automated esssay scoring: a cross-disciplinary perspective. Mahwah, NJ: Lawrence Erlbaum Associates.CrossRef Google Scholar

Stolcke, A. 2002. SRILM - An extensible language modeling toolkit. In Proceedings of the International Conference on Spoken Language Processing (INTERSPEECH 2002), Denver, Colorado, USA, Washington DC: IEEE Computer Society Press, pp. 901–904.Google Scholar

Teahan, W. J., Wen, Y., McNab, R. J., and Witten, I. H. 2000. A compression-based algorithm for Chinese word segmentation. Computational Linguistics, 26 (3): 375–393. Cambridge, MA: MIT Press.CrossRef Google Scholar

Tonta, Y., and Darvish, H. R. 2010. Diffusion of latent semantic analysis as a research tool: a social network analysis approach. Journal of Informetrics 4 (2): 166–174. Philadelphia, PA: Elsevier.CrossRef Google Scholar

Steinbiss, V., Tran, B., and Ney, H. 1994. Improvements in beam search. In Proceedings of the 3rd International Conference on Spoken Language Processing (ICSLP 1994), Yokohama, Japan. Washington DC: IEEE Computer Society Press.Google Scholar

Wang, D. H., and Liu, C. L. 2011. Dynamic text line segmentation for real-time recognition fo Chinese handwritten sentences. In Proceedings of the 11th International Conference on Document Analysis and Recognition 2011 (ICDAR 2011), Beijing, China, Washington DC: IEEE Computer Society Press, pp. 931–935.CrossRef Google Scholar

Wang, L., and Wan, Y. 2011. Sentiment classification of documents based on latent semantic analysis. In Advanced Research on Computer Education, Simulation and Modeling, Berlin: Springer Verlag, pp. 356–361.CrossRef Google Scholar

Wang, Q., Xu, J., Li, H., and Craswell, N. 2013. Regularized latent semantic indexing: a new approach to large-scale topic modeling. ACM Transactions on Information Systems, 31 (1): 5:1–5:44. New York, NY: Association for Computing Machinery (ACM).CrossRef Google Scholar

Wang, W., and Yu, B. 2009. Text categorization based on combination of modified back propagation neural network and latent semantic analysis. Neural Computing and Applications 18 (8): 875–881. Berlin: Springer Verlag.CrossRef Google Scholar

Wild, F., Stahl, C., Stermsek, G., Neumann, G., and Penya, Y. 2005. Parameters driving effectiveness of automated essay scoring with LSA. In Proceedings of the 9th Computer Assisted Assessment Conference (CAA Conference 2005), Loughborough: Loughborough University, pp. 485–494.Google Scholar

Wu, Y., Li, X. K., Liu, T., and Wang, K. Z. 2001. Research on and implementation of Chinese text proof-reading system. Journal of Harbin Institute of Technology 33 (1). Harbin: Harbin Institute of Technology.Google Scholar

Xu, Y. Y., and Yue, W. Y. 2009. A generalized framework for BDD-based replanning A* search. In Proceedings of the 10th International Conference on Software Engineering, Artificial Intelligences, Networking and Parallel/Distributed Computing, Daegu, Korea, Washington DC: IEEE Computer Society Press, pp. 133–139.Google Scholar

Xu, Y. Y., Yue, W. Y., and Su, K. L. 2009. The BDD-based dynamic A* algorithm for real-time replanning. In Deng, X., Hopcroft, J. E. and Xue, J., (eds.) Proceedings of Frontiers in Algorithmics, Third International Workshop, Berlin: Springer Verlag, pp. 271–282.Google Scholar

Yannakoudakis, H., Briscoe, T., and Medlock, B. 2011. A new dataset and method for automatically grading ESOL texts. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, Portland, Oregon, USA, Stroudsburg, PA: Association for Computational Linguistics (ACL), pp. 180–189.Google Scholar

Yeh, J. Y., Ke, H. R., and Yang, W. P. 2002. Chinese text summarization using a trainable summarizer and latent semantic analysis. In Lim, E.-P., Foo, S., Khoo, C. S. G., Chen, H., Fox, E. A., Urs, S. R., Thanos, C. (eds.) Digital Libraries: People, Knowledge, and Technology, Springer 2002 Lecture Notes in Computer Science. Berlin: Springer Verlag, pp. 76–87.CrossRef Google Scholar

Yue, W. Y., Xu, Y. Y., and Su, K. L. 2006. BDDRPA*: An Efficient BDD-Based Incremental Heuristic Search Algorithm for Replanning. In Sattar, A. and Kang, B. H. (eds.) Proceedings of Australian Conference on Artificial Intelligence, Berlin: Springer Verlag, pp. 627–636.Google Scholar

Zhao, Y. H. 2011. Application of latent semantic analysis in auto-grading system. Journal of Yanbian University (Natural Science) 37 (4): 345–348. Jilin: Yanbian University.Google Scholar

Article contents

SCESS: a WFSA-based automated simplified chinese essay scoring system with incremental latent semantic analysis

Abstract

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests