Hostname: page-component-745bb68f8f-l4dxg Total loading time: 0 Render date: 2025-01-27T04:37:35.091Z Has data issue: false hasContentIssue false

Recognition of visual scene elements from a story text in Persian natural language

Published online by Cambridge University Press:  24 August 2022

Mojdeh Hashemi-Namin
Affiliation:
Iran University of Science and Technology, Tehran, Iran
Mohammad Reza Jahed-Motlagh*
Affiliation:
Iran University of Science and Technology, Tehran, Iran
Adel Torkaman Rahmani
Affiliation:
Iran University of Science and Technology, Tehran, Iran
*
*Corresponding author. E-mail: jahedmr@iust.ac.ir

Abstract

Text-to-scene conversion systems map natural language text to formal representations required for visual scenes. The difficulty involved in this mapping is one of the most critical challenges for developing these systems. The current study mapped Persian natural language text as the headmost system to a conceptual scene model. This conceptual scene model is an intermediate semantic representation between natural language and the visual scene and contains descriptions of visual elements of the scene. It will be used to produce meaningful animation based on an input story in this ongoing study. The mapping task was modeled as a sequential labeling problem, and a conditional random field (CRF) model was trained and tested for sequential labeling of scene model elements. To the best of the authors’ knowledge, no dataset for this task exists; thus, the required dataset was collected for this task. The lack of required off-the-shelf natural language processing modules and a significant error rate in the available corpora were important challenges to dataset collection. Some features of the dataset were manually annotated. The results were evaluated using standard text classification metrics, and an average accuracy of 85.7% was obtained, which is satisfactory.

Type
Article
Copyright
© The Author(s), 2022. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Adorni, G., Di Manzo, M. and Giunchiglia, F. (1984). Natural language driven image generation. In Proceedings of the 10th International Conference on Computational Linguistics, COLING 1984, Stroudsburg, PA, USA. Association for Computational Linguistics, pp. 495500.Google Scholar
Alpaydin, E. (2014). Introduction to Machine Learning, 3rd Edn. Cambridge, MA: The MIT Press.Google Scholar
Arian, N. and Sabbagh, M. (2017). Semantic labeling of sentences in Persian language with supervised method. In Proceedings of the 22nd National CSI Computer Conference, CSICC 2017, Tehran, Iran. Computer Society of Iran, pp. 18.Google Scholar
Chang, A.X., Eric, M., Savva, M. and Manning, C.D. (2017). SceneSeer: 3D Scene Design with Natural Language. CoRR, pp. 110.Google Scholar
Chang, A.X., Monroe, W., Savva, M., Potts, C. and Manning, C.D. (2015). Text to 3D scene generation with rich lexical grounding. In The 53rd Annual Meeting of the Association for Computational Linguistics and The 7th International Joint Conference of the Asian Federation of Natural Language Processing, Beijing, China. Association for Computational Linguistics, pp. 110.CrossRefGoogle Scholar
Chang, A.X., Savva, M. and Manning, C.D. (2014a). Learning spatial knowledge for text to 3D scene generation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar. Association for Computational Linguistics, pp. 20282038.CrossRefGoogle Scholar
Chang, A.X., Savva, M. and Manning, C.D. (2014b). Semantic parsing for text to 3D scene generation. In Workshop on Semantic Parsing, Baltimore, Maryland, USA. Association for Computational Linguistics, pp. 1721.Google Scholar
Coyne, B., Rambow, O., Hirschberg, J. and Sproat, R. (2010). Frame semantics in text-to-scene generation. In Knowledge-Based and Intelligent Information and Engineering Systems, Lecture Notes in Computer Science, vol. 6279. Springer Berlin Heidelberg, pp. 375384.CrossRefGoogle Scholar
Coyne, B. and Sproat, R. (2001). WordsEye: an automatic text-to-scene conversion system. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 2001, New York, NY, USA. ACM, pp. 487496.CrossRefGoogle Scholar
Fillmore, C. (1982). Frame semantics. Linguistics in the Morning Calm. Hanshin Publishing Company, pp. 111137.Google Scholar
Finkel, J.R., Grenager, T. and Manning, C. (2005). Incorporating non-local information into information extraction systems by Gibbs sampling. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, ACL 2005, Stroudsburg, PA, USA. Association for Computational Linguistics, pp. 363370.CrossRefGoogle Scholar
Fort, K., Adda, G. and Cohen, K.B. (2011). Amazon mechanical turk: Gold mine or coal mine? Computational Linguistics 37(2), 413420.CrossRefGoogle Scholar
Frank, E., Hall, M.A. and Witten, I.H. (2016). The WEKA Workbench. Online Appendix for “Data Mining: Practical Machine Learning Tools and Techniques”, 4th Edn. Morgan Kaufmann.Google Scholar
Glass, K. and Bangay, S. (2008). Automating the creation of 3D animation from annotated fiction text. In IADIS 2008: Proceedings of the International Conference on Computer Graphics and Visualization 2008, MM’10, Amsterdam, The Netherlands. IADIS Press, pp. 310. 00006.Google Scholar
Glass, K. and Bangay, S. (2009). A method for automatically creating 3D animated scenes from annotated fiction text. International Journal on Computer Science and Information System 4(2), 103119.Google Scholar
Hassani, K. and Lee, W.-S. (2016). Visualizing natural language descriptions: A survey. ACM Computing Surveys (CSUR) 49(1), 134.CrossRefGoogle Scholar
Helfiandri, M.A., Zakhralativa Ruskanda, F. and Khodra, M.L. (2020). Generating Scene Descriptor from Indonesian Narrative Text. vol. CFP2013V-ART, Bandung, Indonesia. IEEE, pp. 16.Google Scholar
Hong, J.-H., Cho, S.-H., Jeon, J.-U. and Park, S.-Y. (2018). Development and evaluation of text-to-scene model for Korean language writing education as a Foreign language. Journal of The Korean Society for Computer Game 31(3), 6370.Google Scholar
Iran Telecommunication Research Center (2014). Qur’anic Question and Answer Project. http://quranjooy.itrc.ac.ir.Google Scholar
Jackendoff, R. (1990). Semantic Structures . Current Studies in Linguistics Series, vol. 18. Cambridge, MA: MIT Press.Google Scholar
Jain, P., Bhavsar, R., Kumar, A., Pawar, B.V., Darbari, H. and Bhavsar, V.C. (2018). Tree adjoining grammar based parser for a Hindi text-to-scene conversion system. In 3rd International Conference for Convergence in Technology, I2CT, Pune, India. IEEE, pp. 17.CrossRefGoogle Scholar
Johansson, R., Nugues, P. and Williams, D. (2004). Carsim: A system to convert written accident reports into animated 3D scenes. In Proceedings of the 2nd Joint SAIS/SSLS Workshop Artificial Intelligence and Learning Systems, AILS-04. Department of Computer Science, Lund University, pp. 7686.Google Scholar
Kayser, D. and Nouioua, F. (2009). From the textual description of an accident to its causes. Artificial Intelligence 173(12), 11541193.CrossRefGoogle Scholar
Kohavi, R. (1995). The power of decision tables. In Proceedings of the 8th European Conference on Machine Learning, ECML 95, Berlin, Heidelberg. Springer Berlin Heidelberg, pp. 174189.CrossRefGoogle Scholar
Lafferty, J., McCallum, A. and Pereira, F.C. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In International Conference on Machine Learning (ICML), MA, USA. Morgan Kaufmann, pp. 282289.Google Scholar
Landis, J.R. and Koch, G.G. (1977). The measurement of observer agreement for categorical data. Biometrics 33, 159174.CrossRefGoogle ScholarPubMed
Lu, R.-Q. and Zhang, S.-M. (2002). From story to animation–full life cycle computer aided animation generation. Acta Automatica Sinica 28, 321348.Google Scholar
Ma, M. (2006). Automatic Conversion of Natural Language to 3D Animation. PhD Thesis, University of Ulster.Google Scholar
Mesgar, M., Hajizade, M., Darrudi, E., Farhoodi, M., Mohamadzade, M., Alavi, T., Davoudi, M., Sarabi, Z. and Khalash, M. (2014). Semantic role labeling of Persian language based on dependency tree. Technical report, Iran Telecommunication Research Center, Tehran, Iran. sent to get published.Google Scholar
Miaoulis, G. and Plemenos, D. (2009). Intelligent Scene Modelling Information Systems . Studies in Computational Intelligence, vol. 181. Berlin, London: Springer. 00000.Google Scholar
Miller, G.A. (1995). WordNet: A lexical database for English. Communications of the ACM 38(11), 3941.CrossRefGoogle Scholar
Nazari, M. (2006). Film production and play.Google Scholar
Okazaki, N. (2007). CRFsuite: A fast implementation of Conditional Random Fields (CRFs).Google Scholar
Palmer, M., Gildea, D. and Kingsbury, P. (2005). The proposition bank: A corpus annotated with semantic roles. Computational Linguistics Journal 31, 1.Google Scholar
Pandian, S.L. and Geetha, T.V. (2009). CRF models for tamil part of speech tagging and chunking. In Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy, Berlin, Heidelberg. Springer Berlin Heidelberg, pp. 1122.CrossRefGoogle Scholar
Pardhi, V., Shah, K., Vaghasiya, J. and Hole, V. (2021). Generating a scene from text for smart education. In ICCICT, Mumbai, India. IEEE, pp. 16.Google Scholar
Quinlan, J.R. (1993). C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann.Google Scholar
Qur’anic Question and Answer Project (2014a). Semantic role labeling manual of style. Technical report, Iran Telecommunication Research Center, Tehran, Iran.Google Scholar
Qur’anic Question and Answer Project (2014b). Syntactic labeling manual of style on the basis of dependency grammar in Persian. Technical report, Iran Telecommunication Research Center, Tehran, Iran.Google Scholar
Rouhizadeh, M. (2013). Collecting Semantic Information for Locations in the Knowledge Resource of a Text-to-Scene Conversion System . Master of Science, Oregon Health & Science University, Oregon, USA.Google Scholar
Ruppenhofer, J., Ellsworth, M., Petruck, M.R., Johnson, C.R. and Scheffczyk, J. (2016). FrameNet II: Extended Theory and Practice. Berkeley, CA: International Computer Science Institute.Google Scholar
Shamsfard, M. (2011). Challenges and open problems in Persian text processing. In 5th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics. Lecture Notes in Artificial Intelligence, vol. 8387. Poznan, Poland: Springer, pp. 6569.Google Scholar
Shamsfard, M., Hesabi, A., Fadaei, H., Mansoory, N., Famian, A., Bagherbeigi, S., Fekri, E., Monshizadeh, M. and Assi, S.M. (2010a). Semi automatic development of farsnet; the persian wordnet. In Proceedings of 5th Global WordNet Conference, GWA2010, vol. 29, Mumbai, India. Indian Institute of Technology.Google Scholar
Shamsfard, M., Jafari, H.S. and Ilbeygi, M. (2010b). STeP-1: A set of fundamental tools for Persian text processing. In 7th Language Resources and Evaluation Conference, LREC 2010, Valletta, Malta. European Language Resources Association, pp. 859865.Google Scholar
Surdeanu, M., Johansson, R., Meyers, A., Marquez, L. and Nivre, J. (2008). The CoNLL-2008 shared task on joint parsing of syntactic and semantic dependencies. In Proceedings of the Twelfth Conference on Computational Natural Language Learning (CoNLL 2008), Manchester, UK. Association for Computational Linguistics, pp. 159177.CrossRefGoogle Scholar
Sutton, C. and McCallum, A. (2012). An introduction to conditional random fields. Foundations and Trends in Machine Learning 4(4), 267373.CrossRefGoogle Scholar
Tabibzadeh, O. (2006). Verb Capacity and Fundamental Structure of Sentence in Current Persian. Tehran, Iran: Markaz Publishing.Google Scholar
Takahashi, N., Ramamonjisoa, D. and Ogata, T. (2007). A tool for supporting an animated movie making based on writing stories in xml. In Proceedings of IADIS International Conference Applied Computing, Salamanca, Spain. International Association for Development of the Information Society, pp. 405409.Google Scholar
Ustalov, D. and Kudryavtsev, A. (2012). An ontology-based approach to text-to-picture synthesis systems. In Proceedings of the Second International Workshop on Concept Discovery in Unstructured Data (CDUD 2012) In Conjunction with the Tenth International Conference on Formal Concept Analysis (ICFCA 2012), vol. 871, Leuven, Belgium. Katholieke Universiteit Leuven, pp. 94101.Google Scholar
Yadav, P., Sathe, K. and Chandak, M. (2020). Generating animations from instructional text. International Journal of Advanced Trends in Computer Science and Engineering 9(3), 30233027.Google Scholar
Zeng, X., Tan, M.-l. and Ren, S. (2016). The implementation of graphic constraints for automatic text to scene conversion. In International Conference on Artificial Intelligence and Computer Science, AICS 2016, Guilin, China. World Scientific Pubilshing Company, pp. 364367.Google Scholar