and the whole earth was of one language, and of one speech … and they said … let us build us a city and a tower, whose top may reach unto heaven; and let us make a name, lest we be scattered abroad upon the face of the earth … And the Lord said … let us confound their language, that they may not understand one another's speech Footnote 1
For some it is a tempting perspective: to have Artificial Intelligence produce the subject metadata for textual and visual sources in the Humanities - and do away with expensive specialists. To some extent we have ourselves to blame for this. Standardization is not ‘sexy’ and those that develop software for the heritage sector have learned to accept that in the Arts and Humanities we do not really need the Lord's intervention to create a Babylonian confusion. When describing the subject matter of the images in their collections, institutions and researchers ‘confound their language', often preferring their own vocabulary over a shared standard. The resulting metadata divergence went largely unnoticed in the pre-digital age, but as digitized collections come together in the internet's shared data pool, the differences stand out. And as they do ‘not understand one another's speech', institutions and researchers now have to rely on a wide variety of search strategies to retrieve information from their ‘cities and towers'.
No doubt institutions and researchers had good reasons to prefer their own vocabulary over compliance with a standard; and given the number and variety of the heritage institutions and humanities researchers involved, it will not be easy to find a common denominator for their decision. Without a systematic investigation of their considerations, we can only speculate about why conforming to a shared standard for iconographic information is the exception rather than the rule.
Standardization and Artificial Intelligence
At the same time there is a fresh case to be made for the wider acceptance of standardization. The impetus comes from the spectacular developments in the field of Artificial Intelligence. AI is inconceivable without standardization. In a nutshell: data cannot be organized without some form of standardization and without organization there is no information to feed AI's training algorithms. Without standardization there is just the chaos of raw data.
AI is a monster with many heads, serving a quickly expanding array of purposes. One such example is the automatic generation of descriptive metadata for images, which would be a Holy Grail for short-staffed heritage institutions with large collections of images.
For some time now IT-researchers in the field of Computer Vision have recognized the potential of IconclassFootnote 2 as a tool to generate standardized metadata. Simplified: the datasets used to train AI and Computer Vision applications combine digital images with the words humans have used to describe them. Those image-related ‘Bags of Words’ (BoW's) are then used to generate descriptive metadata for pictures that have not yet been described but are visually similar to the ones that have. The purpose is to speed up the process and cut costs.
Digital images and BoW's are harvested from different sources and represent a wide spectrum of descriptive metadata: from simple captions and controlled keywords to elaborate descriptions and short essays.
From an IT-perspective datasets that include Iconclass tags have advantages. Every Iconclass concept represents a BoWFootnote 3, and as all concepts are organized hierarchically, every BoW automatically includes the broader terms of its hierarchical branch. Datasets of images tagged with Iconclass concepts thus contain rich, well-structured and multilingual BoW's.
However, whether the descriptive metadata consist of a simple caption or an elaborate description and interpretation of an image, every BoW is the end result of a process of distillation. The specific ingredients may be different for every image that is analyzed, but in general iconographic cataloguing follows a set pattern: we study an image and its details, compare it with other images, test our observations and hypotheses against the accumulated scholarship of our peers and predecessors, and write down our conclusions. That, at least, is what human researchers do.
In 2024 there is no escaping the question what AI could bring to the table. Since OpenAI's chatbot was launched, a stream of systems claiming to be able to produce descriptive metadata using AI has flooded the market. To test them systematically is far beyond the scope of this article. Fortunately Microsoft has embedded a state-of-the-art version in its Office products and its Edge browser. In Word, for example, it is used to generate alternative texts for pictures. That allows us to gather some anecdotal evidence. As an example I tested a picture that is both famous and simple, the device of the renowned Italian printer Aldus Manutius: a dolphin twisted around an anchor [illustration 1]. The suggested text in MS Word - ‘An anchor with a dragon on it’ - may not be completely correct, but it still demonstrates the impressive advances made in pattern recognition technology.
I have also prompted ChatGPT to interpret the device of Manutius, and its answer was quite good. To quote the most relevant fragmentsFootnote 4:
The device … features a dolphin entwined around an anchor with a Latin inscription ‘festina lente’ which translates to ‘make haste slowly'. The elements of the device are interpreted in detail: ‘Dolphins symbolize swiftness and intelligence. In this context … the speed and efficiency of printing, as well as the idea of scholarship and learning. The anchor symbolizes the stability of the printed word. Festina lente encapsulates the motto of Aldus and reflects the balance between speed and carefulness … these symbols convey Manutius' commitment to producing quality printed works efficiently and reliably.
The explanation adequately summarizes current scholarly consensus, which makes it easy to overlook the fact that the device is not inscribed ‘Festina lente’ but 'Aldus'. In fact, Manutius never included the adage in his device.Footnote 5
Early modern devices and emblems often played with images and meanings. Various meanings could be assigned to an image and different images could convey the same meaning. The Aldus device is only one example of an image juxtaposing the concepts of speed and stability. Asked which other early modern imagery expressed the ‘Festina lente’ idea, the chatbot indeed cited some examples popular in the 16th and 17th century. It mentioned the tortoise, the hourglass and the snail,Footnote 6 but it also referred to the much rarer image of a Labyrinth.
‘Aldus Manutius’ and ‘Festina lente’ are quite specific search terms, which will retrieve a relatively small set of links. A researcher interested in the visualization of this oxymoron - possibly the most cited of Erasmus’ adages - would want to cast the net wider by using a somewhat broader search term. As ChatGPT's standardization method is a trade secretFootnote 7 we cannot really know how it connects concepts, so I borrowed a broader term from Iconclass and asked the chatbot how early modern emblems visualized the broader concept ‘Quality of motion'.
It is at that point that ChatGPT revealed itself as the echo of a million voices rather than the source of reliable information it seemed to be at first. After dishing up a few clichés about early modern emblematics, it suggested that ‘dynamic figures in motion such as athletes or dancers', ‘speed lines or blurring effects’ or even ‘directional arrows’ could express qualities of motion in early modern emblems. When asked for some specific examples it went off the rails and cited snippets from various websites in phantasy combinations. It ascribed an emblem with the motto ‘festina lente’ to Paolo GiovioFootnote 8 but said it features a galloping horse.Footnote 9 Another emblem - of a flying bird with outstretched wings with the motto ‘volat irrevocabile tempus'Footnote 10 - was ascribed to Andrea Alciato, as was a second one - a winged hourglass with the motto ‘fugit irreparabile tempus.'Footnote 11 Finally, it suggested that the image of a flowing river with the motto ‘panta rhei’ is an early modern emblem invented by Heraclitus.
What is fascinating is that the answer could easily be mistaken for fact , since the best lies are those that use elements of the truth, and that it is being served with aplomb, as if an ill-prepared student is bluffing his way through an exam.
It makes one wonder what would have happened if the heritage information that AI applications work with, had not only been more standardized from the outset but had also referenced both digital and analogue sources. The latter is pertinent to the research situation in the Humanities as its sources are very hybrid. We continuously combine information from digital and digitized sources with information we get from the autopsy of physical objects and from printed books and articlesFootnote 12
Even more relevant from a research perspective, however, is that researchers need to make clear which sources they are using to build their hypotheses and reach their conclusions. The interpretation of primary and secondary sources and the discourse about those interpretations is a core aspect of humanities scholarship.
New Iconclass Browser - a classification in action
A screenshot [illustration 2] of the new version of the Iconclass Browser website illustrates its potential as a standardization instrument for both primary sources - visual as well as textual - and scholarly literature.
On the left various Types of Motion are shown in the classification's tree of concepts.
The central column shows the selected concept 51M11 Swiftness, Speed; ‘Agilità', ‘Celerità', ‘Velocità’ (Ripa). Its broader, narrower, and related terms are listed, and a sample of images expressing the concept is included. The combined concepts under ‘Quality of Motion’ - Speed, Haste, Slowness, and Lingering - have been used to tag some 150 primary sources, all of which are relevant as they were tagged by humans. The tally and the relevance rate are pertinent as the Iconclass dataset is dwarfed by the billions of pictures and BoW's processed by BigTech.
On the right scholarly literature about swiftness and about Manutius's device is referenced. Those references guide the researcher to secondary literature, both digital and printed. It also points the user to the full text of Erasmus's adages, a primary source of inspiration for many emblems..Footnote 13 By adding a visual search functionality and integrating an iconographic bibliography, a first step is taken to turn the classification into a laboratory for image research.
While the actual integration of primary and secondary sources with the online classification is a new step, the idea of standardizing access to their content is as old as Iconclass.Footnote 14 For Henri van de Waal they were two sides of the same coin, as I demonstrated at the IFLA Satellite conference in August 2023.Footnote 15
Given this potential, the question is justified as to why Iconclass, while acknowledged as an important standard in AI-projects,Footnote 16 is not applied more widely by heritage institutions and humanities researchers. I argued earlier that a broad, systematic survey would be needed to establish why the use of standards in general is as limited as it is. For the case of Iconclass, however, I may provide some more specific observations.
Applying a standard always adds some complexity to cataloguing. Iconclass is no exception, and it may indeed take some time to familiarize oneself with it. However, identifying the subject matter of historical images and naming their details, is often the real challenge. Finding fitting descriptors in Iconclass is the lesser problem, but as the processes overlap and mingle it is not always easy to distinguish them.
A common misconception is that Iconclass deals with content and not with form, and is limited to biblical and mythological subject matter. This probably goes back to the first publication of primary sources indexed with Iconclass in the 1960's and 1970's.Footnote 17 For years the Decimal Index of Art in the Low Countries (D.I.A.L.) was published in annual batches of photographs in postcard format. An Iconclass code would be printed on each card. Cards were sorted by code, so for each code an additional photocard had to be printed: a costly affair. As a consequence the emphasis in the D.I.A.L. selection of Netherlandish art would indeed shift to the biblical and mythological subject matter, even though the system contains thousands of descriptors for formal aspects of images.
A similar idea is that Iconclass only covers traditional visual arts. Although the system is indeed very suitable for the cataloguing of European and early modern imagery, it can easily be demonstrated that its concepts cover images beyond the domain of the arts and the early modern period. Over the years the schedules were successfully applied to material from the middle ages and classical antiquity and, at the other end, modern art and photo journalism. Additionally, the classification has proven to be both adaptable and expandable when new material demanded it.Footnote 18
An easily overlooked obstacle is the fact that the user experiences of retrieval with Iconclass are very divergent. In many cases the software used by heritage institutions does not fully exploit the system's potential. The words (BoW's) linked to every Iconclass concept are rarely used efficiently for retrieval, and the power of the hierarchical organization of the concept tree often remains unused. Seen from the perspective of the time spent on the iconographic cataloguing of a collection, this is a poor return on investment, while also obscuring the benefits of standardization.
My final observation is more speculative. For well over half a century now, Erwin Panofsky's introductory chapter in Studies in Iconology Footnote 19 has been used as a practical guideline for the analysis of the form and content of images, and it is frequently quoted as a model for iconographic cataloguing in recent publications about Computer Vision and Artificial Intelligence. Panofsky, however, was silent about vocabulary control and information retrieval.Footnote 20 His colleague Henri van de Waal, on the contrary, realized that these are core aspects of the systematic iconographic cataloguing for which he was forging Iconclass. While Panofsky's method can already be applied to a single image, Van de Waal's approach only pays off when a critical mass of images has been processed. From the perspective of the history of Art History It would be interesting to investigate whether the dominance of Panofsky's ‘system'Footnote 21 did not actually harm the case for standardization in iconography.
However that may be, after several decades of steady progress we have now reached a time of convergence. Hence this new version of Iconclass which brings together images from websites at a variety of institutions, and combines them with the literature that inspired many works of art, and studies that interpret their iconography.
When one considers the absurd mass of digital information that we are confronted with, it may seem laughable to offer a system that does not even begin to aim for completeness and could never keep up with the speed with which new images, books, and articles are produced. If that is what you think, please know that one of the Festina lente paraphrases is ‘Less haste, more speed'. It may take longer but in the end standardizing our sources will help Artificial Intelligence to produce a more reliable assistant for the student of the Humanities than it does now.