A study of the evaluation metrics for generative images containing combinational creativity

Boheng Wang; Yunhuai Zhu; Liuqing Chen; Jingcheng Liu; Lingyun Sun; Peter Childs

doi:10.1017/S0890060423000069

A study of the evaluation metrics for generative images containing combinational creativity

Published online by Cambridge University Press: 23 March 2023

Lingyun Sun and

Boheng Wang: Affiliation:
Dyson School of Design Engineering, Imperial College London, London, UK
Yunhuai Zhu: Affiliation:
Zhejiang–Singapore Innovation and AI Joint Research Lab, Zhejiang University, Hangzhou, China
Liuqing Chen*: Affiliation:
International Design Institute, Zhejiang University, Hangzhou, China
Jingcheng Liu*: Affiliation:
International Campus, Zhejiang University, Hangzhou, China
Lingyun Sun: Affiliation:
International Design Institute, Zhejiang University, Hangzhou, China
Peter Childs: Affiliation:
Dyson School of Design Engineering, Imperial College London, London, UK
*: Author for correspondence: Liuqing Chen, E-mail: chenlq@zju.edu.cn
Author for correspondence: Liuqing Chen, E-mail: chenlq@zju.edu.cn

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

In the field of content generation by machine, the state-of-the-art text-to-image model, DALL⋅E, has advanced and diverse capacities for the combinational image generation with specific textual prompts. The images generated by DALL⋅E seem to exhibit an appreciable level of combinational creativity close to that of humans in terms of visualizing a combinational idea. Although there are several common metrics which can be applied to assess the quality of the images generated by generative models, such as IS, FID, GIQA, and CLIP, it is unclear whether these metrics are equally applicable to assessing images containing combinational creativity. In this study, we collected the generated image data from machine (DALL⋅E) and human designers, respectively. The results of group ranking in the Consensual Assessment Technique (CAT) and the Turing Test (TT) were used as the benchmarks to assess the combinational creativity. Considering the metrics’ mathematical principles and different starting points in evaluating image quality, we introduced coincident rate (CR) and average rank variation (ARV) which are two comparable spaces. An experiment to calculate the consistency of group ranking of each metric by comparing the benchmarks then was conducted. By comparing the consistency results of CR and ARV on group ranking, we summarized the applicability of the existing evaluation metrics in assessing generative images containing combinational creativity. In the four metrics, GIQA performed the closest consistency to the CAT and TT. It shows the potential as an automated assessment for images containing combinational creativity, which can be used to evaluate the images containing combinational creativity in the relevant task of design and engineering such as conceptual sketch, digital design image, and prototyping image.

Keywords

Combinational creativity creativity assessment generative model text-to-image turing test

Information

Type: Research Article
Information: AI EDAM , Volume 37 , 2023 , e11

DOI: https://doi.org/10.1017/S0890060423000069 [Opens in a new window]
Copyright: Copyright © The Author(s), 2023. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Adams, SS, Banavar, G and Campbell, M (2016) I-athlon: towards a multidimensional turing test. AI Magazine 37, 78–84.CrossRef Google Scholar

Amabile, TM (1982) Social psychology of creativity: a consensual assessment technique. Journal of Personality and Social Psychology 43, 997.CrossRef Google Scholar

Amabile, TM and Hennessey, B (1999) Consensual assessment. Encyclopedia of Creativity 1, 347–359.Google Scholar

Amato, G, Behrmann, M, Bimbot, F, Caramiaux, B, Falchi, F, Garcia, A, Geurts, J, Gibert, J, Gravier, G, Holken, H and Koenitz, H (2019) AI in the media and creative industries. arXiv preprint arXiv:1905.04175.Google Scholar

Boden, MA (2004) The Creative Mind: Myths and Mechanisms. London: Psychology Press.CrossRef Google Scholar

Boden, MA (2010) The turing test and artistic creativity. Kybernetes. 39, 409–413.CrossRef Google Scholar

Borji, A (2019) Pros and cons of GAN evaluation measures. Computer Vision and Image Understanding 179, 41–65.CrossRef Google Scholar

Bringsjord, S, Bello, P and Ferrucci, D (2003) Creativity, the turing test, and the (better) lovelace test. In Moor, JH (ed.), The Turing Test: The Elusive Standard of Artificial Intelligence. Dordrecht: Springer Netherlands, pp. 215–239.CrossRef Google Scholar

Brown, T, Mann, B, Ryder, N, Subbiah, M, Kaplan, JD, Dhariwal, P, Neelakantan, A, Shyam, P, Sastry, G, Askell, A and Agarwal, S (2020) Language models are few-shot learners. Advances in Neural Information Processing Systems 33, 1877–1901.Google Scholar

Bujang, MA, Omar, ED and Baharum, NA (2018) A review on sample size determination for Cronbach's alpha test: a simple guide for researchers. The Malaysian Journal of Medical Sciences: MJMS 25, 85.CrossRef Google Scholar

Burnard, Pamela and Anne, Power (2013) Issues in conceptions of creativity and creativity assessment in music education. In Handbook of Research on Creativity. Cheltenham: Edward Elgar Publishing, pp. 212–229.Google Scholar

Chen, L, Wang, P, Dong, H, Shi, F, Han, J, Guo, Y, Childs, PR, Xiao, J and Wu, C (2019) An artificial intelligence based data-driven approach for design ideation. Journal of Visual Communication and Image Representation 61, 10–22.CrossRef Google Scholar

Chu, H, Urtasun, R and Fidler, S (2016) Song from PI: a musically plausible network for pop music generation. arXiv preprint arXiv:1611.03477.Google Scholar

Cropley, DH and Kaufman, JC (2013) Rating the creativity of products. In Handbook of Research on Creativity. Edward Elgar Publishing.Google Scholar

Davis, GA (1975) In frumious pursuit of the creative person. The Journal of Creative Behavior 9(2), 75–87.CrossRef Google Scholar

Denson, C, Buelin, J, Lammi, M and D'Amico, S (2015) Developing instrumentation for assessing creativity in engineering design. Journal of Technology Education 27, 23–40.Google Scholar

Diaconis, P and Graham, RL (1977) Spearman's footrule as a measure of disarray. Journal of the Royal Statistical Society: Series B (Methodological) 39, 262–268.Google Scholar

Ding, M, Yang, Z, Hong, W, Zheng, W, Zhou, C, Yin, D, Lin, J, Zou, X, Shao, Z, Yang, H and Tang, J (2021) CogView: mastering text-to-image generation via transformers. Advances in Neural Information Processing Systems 34, 19822–19835.Google Scholar

Frolov, S, Hinz, T, Raue, F, Hees, J and Dengel, A (2021) Adversarial text-to-image synthesis: a review. Neural Networks 144, 187–209.CrossRef Google Scholar PubMed

Gu, S, Bao, J, Chen, D and Wen, F (2020) GIQA: Generated Image Quality Assessment. Glasgow. Computer Vision–ECCV 2020: 16th European Conference, 23–28.Google Scholar

Guo, J, Lu, S, Cai, H, Zhang, W, Yu, Y and Wang, J (2018) Long text generation via adversarial training with leaked information. Paper presented at the Proceedings of the AAAI Conference on Artificial Intelligence.CrossRef Google Scholar

Han, J (2018) Combinational creativity and computational creativity.Google Scholar

Heusel, M, Ramsauer, H, Unterthiner, T, Nessler, B and Hochreiter, S (2017) GANs trained by a two time-scale update rule converge to a local Nash equilibrium. Advances in Neural Information Processing Systems 30, 6626–6637.Google Scholar

Kaufman, JC, Baer, J, Cole, JC and Sexton, JD (2008 a) A comparison of expert and nonexpert raters using the consensual assessment technique. Creativity Research Journal 20, 171–178.CrossRef Google Scholar

Kaufman, JC, Plucker, JA and Baer, J (2008 b) Essentials of Creativity Assessment. Hoboken: John Wiley & Sons.Google Scholar

Kaufman, JC, Baer, J, Agars, MD and Loomis, D (2010) Creativity stereotypes and the consensual assessment technique. Creativity Research Journal 22, 200–205.CrossRef Google Scholar

Kim, D-H (2019) Evaluation of coco validation 2017 dataset with yolov3. Evaluation 6, 10356–10360.Google Scholar

Kosslyn, SM, Ganis, G and Thompson, WL (2001) Neural foundations of imagery. Nature Reviews Neuroscience 2, 635–642.CrossRef Google Scholar PubMed

Liang, W, Zhang, Y, Kwon, Y, Yeung, S and Zou, J (2022) Mind the gap: understanding the modality gap in multi-modal contrastive representation learning. arXiv preprint arXiv:2203.02053.Google Scholar

Lin, TY, Maire, M, Belongie, S, Hays, J, Perona, P, Ramanan, D, Dollár, P and Zitnick, CL (2014) Microsoft coco: common objects in context. Paper presented at the European conference on computer vision.CrossRef Google Scholar

Mansimov, E, Parisotto, E, Ba, JL and Salakhutdinov, R (2015) Generating images from captions with attention. arXiv preprint arXiv:1511.02793.Google Scholar

Muller, W (1989) Design discipline and the significance of visuo-spatial thinking. Design Studies 10, 12–23.CrossRef Google Scholar

Pearce, MT and Wiggins, GA (2007) Evaluating cognitive models of musical composition. Paper presented at the Proceedings of the 4th International Joint Workshop on Computational Creativity.Google Scholar

Radford, A, Kim, JW, Hallacy, C, Ramesh, A, Goh, G, Agarwal, S, Sastry, G, Askell, A, Mishkin, P, Clark, J, Krueger, G and Sutskever, I (2021) Learning transferable visual models from natural language supervision. Paper presented at the Proceedings of the 38th International Conference on Machine Learning, Proceedings of Machine Learning Research. https://proceedings.mlr.press/v139/radford21a.html Google Scholar

Ramesh, A, Pavlov, M, Goh, G, Gray, S, Voss, C, Radford, A and Sutskever, I (2021) Zero-shot text-to-image generation. Paper presented at the Proceedings of the 38th International Conference on Machine Learning, Proceedings of Machine Learning Research. https://proceedings.mlr.press/v139/ramesh21a.html Google Scholar

Ramesh, A, Dhariwal, P, Nichol, A, Chu, C and Chen, M (2022) Hierarchical text-conditional image generation with CLIP latents. arXiv preprint arXiv:2204.06125.Google Scholar

Ravuri, S and Vinyals, O (2019) Classification accuracy score for conditional generative models. Advances in Neural Information Processing Systems 32, 12247–12258.Google Scholar

Salimans, T, Goodfellow, I, Zaremba, W, Cheung, V, Radford, A and Chen, X (2016) Improved techniques for training GANs. Advances in Neural Information Processing Systems 29, 2226–2234.Google Scholar

Sarkar, P and Chakrabarti, A (2011) Assessing design creativity. Design Studies 32, 348–383.CrossRef Google Scholar

Shin, A, Crestel, L, Kato, H, Saito, K, Ohnishi, K, Yamaguchi, M and Harada, T (2017) Melody generation for pop music via word representation of musical properties. arXiv preprint arXiv:1710.11549.Google Scholar

Sternberg, RJ and Kaufman, JC (2018) The Nature of Human Creativity. Cambridge: Cambridge University Press.CrossRef Google Scholar

Torrance, EP (1972) Predictive validity of the torrance tests of creative thinking. The Journal of Creative Behavior 6(4), 236–252.CrossRef Google Scholar

Turing, I (2007) Computing machinery and intelligence-AM turing. Mind 59, 433.Google Scholar

Vaswani, A, Shazeer, N, Parmar, N, Uszkoreit, J, Jones, L, Gomez, AN and Polosukhin, I (2017) Attention is all you need. Advances in Neural Information Processing Systems 30, 6000–6010.Google Scholar

Ward, TB and Kolomyts, Y (2010) Cognition and creativity. In The Cambridge Handbook of Creativity, pp. 93–112.CrossRef Google Scholar

Yang, L-C and Lerch, A (2020) On the evaluation of generative models in music. Neural Computing and Applications 32, 4773–4784.CrossRef Google Scholar

Zhang, H, Yin, W, Fang, Y, Li, L, Duan, B, Wu, Z, … and Wang, H (2021) ERNIE-ViLG: unified generative pre-training for bidirectional vision-language generation. arXiv preprint arXiv:2112.15283.Google Scholar

Article contents

A study of the evaluation metrics for generative images containing combinational creativity

Abstract

Keywords

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests