Archaeology has greatly evolved since its inception, with its development guided by one constant feature: technology. Indeed, digital or computational archaeology has consistently emerged as a subdiscipline on its own, influencing others. The influence of digital technology can be felt significantly in data management practices, an area in which a variety of projects are advancing pathways to archaeological data collection and analysis.
This study is spearheaded by the PERAIA Project: “Landscapes, Networks, and Society along the Ancient Libyan Sea,” which uses digital approaches to trace past interactions and understand the historical connectivity within the Eastern Mediterranean area (Figure 1). This research is also diachronic in scope and considers different scales of analysis; that is, connectivity between and within these regions from late prehistory to antiquity, around 1400 BC–AD 300. The PERAIA project traces settlement patterns and models the movement of people, goods, and ideas to represent possible flows generated by different communities’ economic and social exchanges.
At the same time, the project seeks to achieve ethical outcomes. Inspired by open science principles (Marwick et al. Reference Marwick, d'Alpoim, Michael Barton, Bates, Baxter, Bewan and Bollwerk2017), we have taken advantage of digital applications to attain open research, following the Linked Open Usable Data (LOUD) and Findable, Accessible, Interoperable, and Reusable (FAIR) principles of data management and development (Thiery Reference Thiery2019; Wilkinson et al. Reference Wilkinson, Dumontier, Jsbrand, Aalbersberg, Appleton, Axton and Baak2016). This goal of attaining open research is the focus of the article, which delves into the management, analysis, and dissemination of research data and results, underscoring our commitment to achieve open practices.
Background
Overview of Projects Implementing Digital Approaches to Data Management
An extensive list of projects have developed data management approaches with the aim of achieving more ethical research; this article mentions some of the most significant ones for the project's scope. This overview is biased toward Western and Mediterranean cases. We focus here on three categories of projects: digital archives, digital infrastructures or cyber-infrastructures and approaches, and funded research projects.
Many previous archiving projects achieved ethical data practices in pursuit of a more open ethos. Some examples are the Archaeology Data Service (ADS), the Designated Archaeology Collections Programme, the Online Access to the Index of Archaeological Investigations (OASIS), and the ARchives of European Archaeology (AREA) network. These digital archives, systems, and guidelines seek long-term preservation and (meta)data sharing and reuse (Barratt Reference Barratt2000; Evans and Moore Reference Evans and Moore2014; Green Reference Green2014; Hardman and Richards Reference Hardman, Richards, Doerr and Sarris2003; Schlanger Reference Schlanger2004). They influenced our project's efforts to develop digital archival practices for reusing and linking data.
Other projects seek to interlink either digital data with field publications or data with other data. These projects include ARENA, LEAP, DAPPER, NEARCHOS, ARIADNE, tDAR, DANS, Open Context, and Pelagios (Faniel et al. Reference Faniel, Kansa, Kansa, Barrera-Gomez, Yakel, Downie and McDonald2013; Kansa and Kansa Reference Kansa and Kansa2013; Meghini et al. Reference Meghini, Scopigno, Richards, Wright, Geser, Cuy and Fihn2017; Moore and Richards Reference Moore, Richards, Wilson and Edwards2015; Richards Reference Richards2002; https://pelagios.org/; https://www.tdar.org/about/). Most of the examples in this group can be classified as cyber-infrastructures (e.g., ARIADNE) or as research networks implementing digital approaches to achieve ethical practices (e.g., NEARCHOS).
Funded research projects have also developed interesting approaches to data management. These projects, which are eminently digital, influenced our methodological approach. The ORBIS Project, the Stanford Geospatial Model for the Roman Empire, is a pioneering work in which digitization and network representation are applied to the entire Roman Empire beyond regional scales (https://orbis.stanford.edu/). Along the same lines, VIATOR-E aims to digitize and disseminate the western Roman Empire's road network (http://viatore.icac.cat/). Its twin, the MINERVA Project (https://projects.au.dk/minerva), seeks to understand the centuries-long functioning of the Roman economy by digitizing the eastern Roman Empire's ancient routes. Two other projects of interest that cover chronologies beyond the Roman period are EIDOS of a City, which simulates the collapse and resilience of ancient Eastern Mediterranean urban environments via agent-based modeling—providing the first open access database of settlements across a broad timespan for Cyprus (Crawford and Vella Reference Crawford and Vella2022)—and the MedAFRICA Project, which provides an archaeological approach to the deep historical dynamics of Mediterranean Africa (around 9600–700 BC; https://www.medafrica-cam.org/).
Relevant projects outside the Western and Mediterranean areas are the ERC Desert Networks into the Eastern Desert of Egypt, EAMENA (Endangered Archaeology in the Middle East and North Africa), MarEA (Maritime Endangered Heritage), MAEASAM (Mapping Africa's Endangered Archaeological Sites and Monuments), MAHSA (Mapping Archaeological Heritage in South Asia), MAPHSA (Mapping the Archaeological Pre-Columbian Heritage of South America), and CyberSW (https://desertnetworks.huma-num.fr/; https://eamena.org/; https://marea.soton.ac.uk/blog/; https://maeasam.org/; https://www.mahsa.arch.cam.ac.uk/; https://www.upf.edu/web/maphsa; https://cybersw.org/). These projects are of interest for implementing digital approaches to data collection, management, and dissemination.
Linked Open Data Principles for Archaeology
There are an increasing number of studies emphasizing Big Data and computational approaches to archaeology (e.g., Opitz et al. Reference Opitz, Strawhacker, Buckland, Cothren, Dawson, Dugmore and Hambrecht2021; Turchin et al. Reference Turchin, Whitehouse, François, Hoyer, Alves, Baines and Baker2020). In this context, data formats are changing rapidly from analogical to digital. New approaches to storing and managing data are also emerging, but the question of how to better manage, store, and retrieve data remains open (Nicholson et al. Reference Nicholson, Kansa, Gupta and Fernandez2023). Equally or more important is how to link disparate data from different traditional disciplines to create better narratives. This issue is even more pressing if we focus on so-called legacy data (nonstandard digitized or analogical data). Some researchers have applied the FAIR principles and the LO(U)D framework—explained later—to resolve these issues (examples compiled in Niccolucci and Richards Reference Niccolucci and Richards2019). Such examples show the potential of some computational tools to make archaeology an open science.
Before discussing the possibilities of doing open archaeological research, it is important to address the “whys” of making archaeology truly open. Answering this question will explain why we attempted to create as FAIR and LO(U)D data as possible by extracting and crosschecking information from different (historical, archaeological, and environmental) gray sources.
What Are Gray Literature and Legacy Data?
Gray literature is in a broad sense the literature “produced on all levels of governments, academics, business and industry in print and electronic formats, but which is not controlled by commercial publishers” (Schöpfel Reference Schöpfel2006:67). Amanda Lawrence (Reference Lawrence2012) narrows the definition by focusing on three factors: the nature of the documents, the type of producers, and the means of dissemination. Following this, gray literature is a category that includes the following:
(1) Technical and project reports or manuals, spreadsheets/statistical files, working and discussion papers, nonpublished conference proceedings, theses, blogs, social media content, et cetera (Childress and Jul Reference Childress and Jul2003; Schöpfel Reference Schöpfel2006; Vaska Reference Vaska and Pejšová2010).
(2) Produced by governmental and nongovernmental institutions.
(3) Published without following the standard for each case (e.g., journal publications).
Legacy data can be defined as the data preserved in the pages of the different formats mentioned in that list (Allison Reference Allison2008).
Some key points emerge from these definitions, alluding to multiple issues:
• Gray literature and legacy data are ubiquitous. Such ubiquity presents an important challenge for reasons specified in the following points.
• Gray literature and legacy data are not easily accessible (Schöpfel Reference Schöpfel2006). Indeed, non-Western documents and data are commonly less accessible and usually restricted to governments’ control (Corlett Reference Corlett2011).
• This literature and data are not controlled by academic quality standards: for example, they are not peer reviewed (Lawrence Reference Lawrence2012; Vaska Reference Vaska and Pejšová2010). The lack of further and external revisions might challenge their reliability, which is problematic because many subsequent publications are built on them. Furthermore, accessing well-curated data and literature “can simplify difficult ideas for a non-specialized audience” (Pappas and Williams Reference Pappas and Williams2011:234)—highlighting the ethical imperative embedded in this topic.
Gray or invisible literature has gained attention in recent decades, especially with the introduction of new technologies (Schöpfel Reference Schöpfel2006; Vaska Reference Vaska and Pejšová2010). The main methods to ameliorate the impacts from gray sources address sharing, retrieval, and long-term preservation (Evans and Moore Reference Evans and Moore2014; Schöpfel Reference Schöpfel, Farace and Frantzen2010). This renewed attention and the advent of digital technologies have given rise to movements such as open access or open data (Coble et al. Reference Coble, Potvin and Shirazi2014; Crossick Reference Crossick2016; Moore and Richards Reference Moore, Richards, Wilson and Edwards2015; Richards and Hardman Reference Richards, Hardman, Greengrass and Hughes2008; Schöpfel Reference Schöpfel2006). Yet, as Schöpfel (Reference Schöpfel, Farace and Frantzen2010:28) points out, the notion of openness mainly focuses on “selection, dissemination, [and] access, not [so much] on preservation and organization.” This leads to an uneven sphere where information is freely available but is not truly open (Faniel et al. Reference Faniel, Kansa, Kansa, Barrera-Gomez, Yakel, Downie and McDonald2013; Huggett Reference Huggett2018).
The debates addressing the nature, issues, and possibilities brought by gray literature and legacy data are occurring within archaeology. Indeed, the concept of gray literature entered the archaeological domain soon after its coinage. The management and quality of these documents and data were realized before the 2000s, at least in the United States and the United Kingdom—first concerning analog formats (Cunliffe Reference Cunliffe1990; Thomas Reference Thomas1991) and later including the digital (Bauer-Clapp and Kirakosian Reference Bauer-Clapp and Kirakosian2017; Richards Reference Richards2017). The turn of the new millennium reinvigorated the debate, then more preoccupied with how computer applications could affect the quality and availability of unpublished works (Evans Reference Evans2015). In recent years, the debate has focused on better integrating unpublished field reports into “curatorial and research practice” (Evans Reference Evans2015).
General issues facing gray literature/data that also pervade archaeology are a lack of meta- and paradata, standardizations (data, bibliographic oversight, structure, written presentations), and indexes, as well as non-interoperable formats (Childress and Jul Reference Childress and Jul2003; Corlett Reference Corlett2011; Huggett Reference Huggett, Wilson and Edwards2015, Reference Huggett2018; Schöpfel Reference Schöpfel2006). The paucity of meta- and paradata, which is a common complaint in archaeology, is an important issue considering that such information is seen as a sign of consistency that enables data's trustiness and reuse (Faniel et al. Reference Faniel, Kansa, Kansa, Barrera-Gomez, Yakel, Downie and McDonald2013; Kansa and Kansa Reference Kansa and Kansa2013). Importantly for the context of this article, this paucity also affects archiving and preservation of the data. Another common problem is the scarcity of linkages between disparate data (Kansa and Kansa Reference Kansa and Kansa2013), which, in turn, negates interoperability.
Archaeologists dealing with legacy data sometimes overlook these problems, which are methodological in nature (Evans and Moore Reference Evans and Moore2014). Nonetheless, issues related to or concerned with gray literature and legacy data have generated an extensive literature in archaeology (Evans and Moore Reference Evans and Moore2014; Fellinger and Philpot Reference Fellinger and Philpot2014; Marchetti et al. Reference Marchetti, Angelini, Artioli, Benati, Bitelli, Curci, Marfia and Roccetti2018; Moore and Richards Reference Moore, Richards, Wilson and Edwards2015; Opitz Reference Opitz2018). As explained later, this has motivated archaeologists to find new ways of preserving and disseminating their results. It all comes down to understanding the benefits of sharing data and finding common paths to enable open research.
Open Data versus Open Access
Atici and others (Reference Atici, Kansa, Lev-Tov and Kansa2013) highlight that some archaeologists remain suspicious about sharing their data despite the benefits of doing so. Doubtful authors do not see the benefits of sharing and cite possible peer criticism, loss of control of their “symbolic and economic capital,” and data misuse as possible disadvantages (as compiled by Moore and Richards Reference Moore, Richards, Wilson and Edwards2015). These reasons are in our judgment not enough to negate the imperative of data sharing, although we recognize related financial sustainability problems and legal issues (Alexander Reference Alexander2013; Pratt Reference Pratt2013).
It is necessary therefore to reemphasize why making data and gray production open is essential. As a first argument, data are a public good; hence, they should be freely available (Porter Reference Porter2013). Making our data open is thus ethical because, as others have argued (Marchetti et al. Reference Marchetti, Angelini, Artioli, Benati, Bitelli, Curci, Marfia and Roccetti2018; Wright and Richards Reference Wright and Richards2018), it allows for breaking the dynamics of monopolistic information hoarding by some institutions or researchers. Moreover, Marchetti and others (Reference Marchetti, Angelini, Artioli, Benati, Bitelli, Curci, Marfia and Roccetti2018) posit the importance of open data for epistemological and operational reasons. Perhaps their most compelling argument is that partial data accessibility biases the research we produce and the knowledge we generate (Marchetti et al. Reference Marchetti, Angelini, Artioli, Benati, Bitelli, Curci, Marfia and Roccetti2018). Indeed, some researchers (Atici et al. Reference Atici, Kansa, Lev-Tov and Kansa2013; Kansa and Kansa Reference Kansa and Kansa2013) have argued that data sharing allows for better confronting biases while reinforcing informed research. Adding up, Moore and Richards (Reference Moore, Richards, Wilson and Edwards2015) argue that open data enables disciplinary interaction (use and reuse) and transparency (examination and reanalysis of data), which should occur between researchers and the public. Both elements—interaction and transparency—are fundamental, because interaction additionally enhances research and transparency promotes scientific rigor (Kansa and Kansa Reference Kansa and Kansa2013; Marchetti et al. Reference Marchetti, Angelini, Artioli, Benati, Bitelli, Curci, Marfia and Roccetti2018). In sum, open data strengthens knowledge creation and favors networking and communication across disciplines, researchers, and society in general (Pearce et al. Reference Pearce, Weller, Scanlon and Kinsley2010).
Archiving also has ethical considerations (Zwitter Reference Zwitter2014). Developing suitable accessibility is an ethical challenge that should embrace the complexities of reaching out to the public to involve them in our research. However, some overspecialized modes of archiving—either by museums/institutions or individuals—neglect data access to the wider public (Merriman and Swain Reference Merriman and Hedley1999). This affects digital archives too. An additional aspect of archiving is that the complexities related to outreach go hand in hand with legal issues that respond to specific contexts (Kansa and Kansa Reference Kansa and Kansa2013; Richardson Reference Richardson2018).
At this point, it is useful to define open data in contrast with open access. Huggett (Reference Huggett, Wilson and Edwards2015:7–8) divides data based on increasing openness, highlighting in this way the differences between data accessibility:
[1] Open access data provides online access to view datasets, limited only by a presumption of Internet access and the requirement for a modern web browser. Use of the data beyond viewing and searching online is restricted. [2] Open access data which returns summary geographical information as a downloadable output of a search query or via Web Feature Services (WFS). This can then be further analyzed using GIS software as if the data were held locally. [3] Open access data consisting of entire datasets which can be downloaded but where restrictions apply to the use and reuse of data and hence is not truly open data in the technical sense. [4] Open data which has no exclusions or restrictions on use, and conforms to the open definition or the most permissive Creative Commons licenses. In general, these datasets relate to specific projects, sites, or collections.
Thus, free access, open access, and open data are not equivalent––being fully “open” is neither easy nor as common as desirable. New frameworks—such as the FAIR Data Principles (Wilkinson et al. Reference Wilkinson, Dumontier, Jsbrand, Aalbersberg, Appleton, Axton and Baak2016) explained later—need to be followed and implemented. Nevertheless, several projects have already sought to develop approaches toward open data. Digital archives and cyber-infrastructures are perhaps the most salient cases considering their success and applicability. The main advantage of digital archives, which have become almost fundamental, is their capability to link data and files within the semantic web. These archives moreover emphasize the importance of meta- and paradata and of developing interoperable standards (Richards Reference Richards2004). Overall, online archives represent a step forward in handling archaeological reports, data, gray literature, and legacy data.
Cyber-infrastructures, such as ARIADNE, or research networks, such as NEARCHOS, are of paramount importance because of their positive impact and novelty. These projects follow principles based on open and FAIR data, long-term preservation, interoperability, and knowledge modeling. They are typically realized by using cyber-infrastructures and other digital tools (based on the semantic web) that enable linking, publishing, sharing, and connecting data, and information online; that is, openly (Green Reference Green2014).
Nonetheless, some issues should be considered when working with digital archives and cyber-infrastructures. For instance, many data are still preserved in PDF formats, which is problematic in terms of accessibility and reusability (Evans and Moore Reference Evans and Moore2014) and thereby do not conform to some open principles (Thiery Reference Thiery2019). This means that not every effort directly equates to open data (Moore and Richards Reference Moore, Richards, Wilson and Edwards2015). Such criticism, to be sure, is no reason to stop but rather a cautionary tale to consider.
All in all, these projects highlight that the way data are managed and made open is fundamental in our attempts to achieve more open, engaging, and ethical research. Such an ethos especially emphasizes digital technologies as the most productive source for achieving better, more open, and collaborative outcomes (Atici et al. Reference Atici, Kansa, Lev-Tov and Kansa2013; Galeazzi and Richards-Rissetto Reference Galeazzi and Richards-Rissetto2018; Halperin Reference Halperin2017; Marchetti et al. Reference Marchetti, Angelini, Artioli, Benati, Bitelli, Curci, Marfia and Roccetti2018; Wright and Richards Reference Wright and Richards2018). Aware of this, we have implemented digital approaches within the scope of digital archaeology to develop a project that invests in open outcomes. This effort does not only consist of just implementing digital approaches because, as we have been flagging throughout the text, it should be guided by some standards or principles—the FAIR and LOUD guidelines.
FAIR and LOUD Principles
This discussion raises the question of data accessibility based on their properties. This question is fundamental because open science hinges on appropriate ways of data handling. However, as already noted, a lack of standardization of denominations, what to record, and how impedes accessing and reusing data (Atici et al. Reference Atici, Kansa, Lev-Tov and Kansa2013; Faniel et al. Reference Faniel, Kansa, Kansa, Barrera-Gomez, Yakel, Downie and McDonald2013; Huggett Reference Huggett, Wilson and Edwards2015, Reference Huggett2018; Kansa and Kansa Reference Kansa and Kansa2013). This is both an epistemological and ethical issue that requires thorough consideration (Kansa and Kansa Reference Kansa and Kansa2013; Milek Reference Milek2018; Zwitter Reference Zwitter2014). We need networked research communities and, most importantly, better practices that enable accessible and reusable data.
An interesting approach in this respect is the FAIR data framework (Wilkinson et al. Reference Wilkinson, Dumontier, Jsbrand, Aalbersberg, Appleton, Axton and Baak2016). This set of principles stresses the importance of achieving Findable, Accessible, Interoperable, and Reusable (meta)data. These four principles are fundamental for archaeological and historical studies that need to make use of diverse interlinked data. The FAIR framework is further unfolded into 15 points that constitute a set of principles that must be achieved for the data to be FAIR (Wilkinson et al. Reference Wilkinson, Dumontier, Jsbrand, Aalbersberg, Appleton, Axton and Baak2016).
Discussions concerning FAIR data would not be possible without earlier efforts on linked data. Tim Berners-Lee, inventor of the world-wide web (WWW), developed the idea of linking data using the potentials of the WWW (Thiery Reference Thiery2019). Shortly afterward, the ideal of open data emerged. Open data rely on three pillars: (1) achieving data accessibility, (2) enabling reuse and redistribution, and (3) reaching universal participation (Open Knowledge Foundation 2023; Thiery Reference Thiery2019). This notion is no longer an ideal but a common goal to achieve that has changed the way of publishing data on the web. Indeed, Berners-Lee (Reference Berners-Lee2023) proposed a framework of five stars—five echelons to climb and attain—that assure data openness (https://5stardata.info/en/). Accomplishing these steps results in linked open data (LOD).
Only one letter differentiates LOD from LOUD: the U of usable. Proposed by Robert Sanderson, LOUD is an evolution of the LOD concept that emphasizes developing and linking open data in usable ways for developers (Thiery Reference Thiery2019). The LOUD framework is governed by five further principles (https://linked.art/loud/). In essence, following LOD's five stars and LOUD's five governing principles, along with the FAIR principles, enables and assures the “golden” standard for the creation and development of new data. A new framework has recently been proposed for classifying the different options for data accessibility: the “Sphere Seven Data Model or Model of Seven Data Spheres” (Thiery Reference Thiery2019). This framework merges all the possibilities into seven spheres, in which the seventh sphere is the most actual—and ethical—one because it presupposes the data to be FAIR and LOUD. These principles shed light on possible ways to turn legacy data into more ethical formats. During the development of the PERAIA project, we decided to consider and follow them as much as possible.
These guidelines are designed to contribute to society's enhancement by achieving more transparency and involving communities in accessing and contributing to data (and its management) because it is a public asset or a “common good” of the digital era. The CARE principles (Gupta et al. Reference Gupta, Martindale, Supernant and Elvidge2023) have emerged recently as an approach to data management that seeks to broaden the discussion. By directly considering local communities as active actors with rights over the data obtained in their lands, this approach reframes the debate beyond computational or academic technicalities. As recently argued (Nicholson et al. Reference Nicholson, Kansa, Gupta and Fernandez2023), FAIR and LOUD principles should not conflict with the CARE principles but rather should integrate the latter's social and ethical spirit.
Materials and Methods
The previous section describes the “whats” and “hows” of open practices to archaeological research. Influenced by such approaches, the PERAIA project implemented digital research methods that allow the development and management of large volumes of multivariate data to produce open outcomes. These methods and practices enabled several analyses through which we elaborated more articulated historical narratives while implementing our commitment to open research.
Sources and Data
Initially, the data were collected mainly from existing published surveys and archaeological reports (e.g., Hulin Reference Hulin2008; Hulin et al. Reference Hulin, Timby and Mutri2009, Reference Hulin, Timby, Muftah and Mutri2010; Rieger Reference Rieger2017, Reference Rieger2019, Reference Rieger2023; Watrous et al. Reference Watrous, Hadzi-Vallianou, Blitzer, Bennet, Pope, Shay, Thomas Shay, Tsougarakis and Angelomatis-Tsougarakis2004, Reference Watrous, Haggis, Nowicki, Vogeikoff-Brogan and Schultz2012). However, the current state of available data for the regions prompted the use of different sources to cross-check and obtain more information on archaeologically underrepresented areas. Old maps, ethnographic research, and historical aerial photographs, combined with open-source satellite imagery, were of great value to this end. The procedure consisted of remote aerial mapping through photointerpretation (Laguna-Palma et al. Reference Laguna-Palma, Toscano and Rodríguez-Rellán2023, Reference Laguna-Palma, Toscano and Rodríguez-Rellán2024). The aerial survey focused not only on the location of ancient settlements and burial sites but also on productive areas, historical resource catchment areas, and traditional routes: all these elements can be used as proxies for reconstructing settlement dynamics and mobility patterns within the study area.
More specifically, the sources mainly came from a series of maps produced by the US Army Corps of Engineers (US Army Map Service) during the span of time stretching from World War II to the Cold War and covering the entire study area. These maps were used in conjunction with historical aerial photographs (e.g., vertical aerial photographs from 1938 by the British Royal Air Force), analog imagery provided by the KH-9 Hexagon satellite, and current open-source satellite imagery (Laguna-Palma and Barruezo-Vaquero Reference Laguna-Palma and Barruezo-Vaquero2023; Laguna-Palma et al. Reference Laguna-Palma, Toscano and Rodríguez-Rellán2023, Reference Laguna-Palma, Toscano and Rodríguez-Rellán2024; Lindsay and Mkrtchyan Reference Lindsay and Mkrtchyan2023; Ray and Nikolaus Reference Ray and Nikolaus2022; Vetter et al. Reference Vetter, Rieger and Alexander2014). The localization and mapping of these sites and routes followed a twofold procedure: (1) cross-checking published data from surveys and excavations and (2) the systematic analysis of topographic maps, historical aerial photographs, and satellite imagery. This process increased the data on the locations of known archaeological heritage sites by adding hitherto unknown sites. This information is of relevance for two main reasons: first, it gives information about human and nonhuman factors—that is, topographical and hydrographical features and exact locations of ancient settlements, burial sites, productive areas, traditional routes, and so on (Figure 2); second, it provides information from before the expansion of large-scale agricultural, urban, and industrial developments, depicting landscapes that are totally or partially lost today.
It is important to note that, as mentioned earlier, the project also focused on digitizing all documented historical routes or paths within the study area. This was done using the same process of cross-checking information from all the documented topographic maps, survey data, and satellite imagery (Figure 3). This process enables the generation of models that can be compared and tested against the results obtained through spatial analysis and network research.
Database Record Structure
The project created a relational geodatabase that integrates two data models and collects all the information associated with each site and traditional route. This allowed us to comprehensively collect, evaluate, systematize, and map legacy data regarding the location of archaeological sites and routes, as well as associated field data and bibliographic information, together with environmental factors (Figure 4).
Each data model is based on a solid, yet modular and interrelated, structure for the spatial database that could store and manage all data. This allows for the standardization of information by applying it systematically to each site or documented route; each type is stored in one gazetteer. The main aim was to develop a model adapted to the project's research questions to extend its inferential capacity and, at the same time, account for ontologies and semantic languages shared with other projects working in open data (e.g., De Soto Reference De Soto, Prevosti and Guitart i Duran2021; Manière et al. Reference Manière, Maël and Bérangèr2020).
Gazetteer of Sites. Each site recorded in the database was identified with an ID, name, place category, coordinates, and chronology. Some of these core elements are present in other gazetteers, and for interoperability reasons, we attempted to follow the same semantic structure. We also added specific fields relevant to our project: the zonification number, ecological zone, documented remains, keywords, description, and associated references. Furthermore, we integrated a validation scale based on geolocation, available site information, and an assessment of potential threat risks (Table 1).
Gazetteer of Routes. Each route or traditional path was identified and recorded with an ID, zonification number, name, length (km), and the typology to which it belonged. As with the gazetteer of sites, this gazetteer also has some core elements that are present in other databases; we tried to follow the same semantic structure whenever possible. The typology of the routes was classified as main routes, secondary routes, or roads, together with an associated code based on the following characteristics: (1) main routes are considered to be those that connect two or more main nodes; (2) secondary routes are those that start from a main route and connect with other secondary nodes; (3) a track or rough road, normally worn out by use and not built, is considered to connect secondary nodes. This route classification is an ad hoc implementation. The model also integrates fields for a comparative chronology, description, associated references, and—following the hypothesis of the directionality of transport routes within the study area—a start and end point for each route. The project also integrates a validation scale with a value based on the level of knowledge about the routes (Table 2).
Openness and Access
The data model aimed to achieve a balance between customization and standardization. Customization was essential for adapting the model to the specific research questions of the project, whereas standardization was important for aligning it with open data principles. To accomplish this balance, the data were organized into structured datasets in standardized formats, and then a project website was created to disseminate such data (Figure 5). The former was a pivotal step to ensure interoperability; the latter ensured accessibility. We selected WordPress as a suitable web platform, based on characteristics such as a user-friendly interface, security, and scalability. To promote open research by facilitating access and their reuse potential, the data were stored in a nonproprietary format, specifically in CSV files (Laguna-Palma Reference Laguna-Palma2024).
These steps were driven by our commitment to fostering a collaborative and open research environment in which data can be easily shared and reused. In addition, we incorporated an interactive map developed using Carto technology. This map is a dynamic tool for visualizing sites in their geographic context. Its intuitive design enhances the user experience and provides an informative way to engage with the data. Moreover, the data, accompanied by metadata, are also stored and published in Zenodo, an open repository developed within the framework of the European Union. This repository allows the data to be linked and cited when used by others through a DOI identifier.
An Exercise in LOUD+FAIR Practices in Eastern Mediterranean Archaeology
As argued earlier, applying different approaches to data and research is always valuable so that anyone from any location can reach and explore it. As Thiery (Reference Thiery2019:2) wrote, “To reach this, data must be published OPEN.” Hence, the goal is reaching not just LOD, not just LOUD, and not just FAIR, but LOUD and FAIR data. In this section, we do an introspective exercise in which we assess our data and the stage that PERAIA has reached in this regard.
To assess our data, we applied the model of seven levels or spheres proposed by Thiery (Reference Thiery2019:5). This allows us to consider to what extent any of our implementations might be said to conform to Thiery's framework. The idea is to locate our project within these seven spheres and check how many of the echelons that reach LOUD+FAIR we are matching:
(1) The complete database is embedded in the PERAIA website (https://peraia.es/), which is hosted by the University of Granada (Spain) under the Creative Commons Attribution 4.0 International license (CC BY 4.0) that allows redistribution and reuse of a licensed work on the condition that the creator is appropriately credited. Additionally, the database is also uploaded to Zenodo, an international open repository that allows uploading the dataset and associated metadata in a versionable format. This achieves the first level described in the model—making the data available in a web service under an open license.
(2) The data are structured in tabular format and can be easily processed using any spreadsheet computer application. This fulfills the second level: the data are available and structured in a readable format.
(3) The entire dataset can be downloaded from the web service as a CSV file. This reaches the third level in terms of the data being structured and available in a nonproprietary format.
(4) The whole dataset is publicly accessible at two URIs: one maintained by the project at https://peraia.es/gazetteer/ and one using the DOI protocol at https://doi.org/10.5281/zenodo.7678852. This addresses the fourth level, which requires our data to be easily findable (through an identifier) for other users.
(5) The database was designed considering ontologies and semantic languages shared with other projects working in open data. Thus, our database record structure was designed and standardized by integrating the core elements present in other open datasets. In this, our data adhere to the fifth level (availability and linkability).
(6) The alphanumeric identifier code (DOI) allows linking and citing our data when used by other users. In addition to the possible reuse of all these data, the web service also allows the entire database to be downloaded as a CSV file—an appropriate format that developers can work with. This fulfills the sixth level, making the data usable.
(7) The database already contains more than 5,000 entries with elements of archaeological and historical interest. We intend that the scientific community, local administrations, and any interested user can access, review, and use the data provided. Therefore, although the project is still publishing data, we are following the seventh level of data compliance with the LOUD+FAIR principles.
Discussion
In critical hindsight, although most of our implementations conform to the Seven Data Spheres, they have some limitations. Our project aligns with the implementation of other projects for the Mediterranean region and has adhered to some of the ideas laid out by ADS or tDAR, but it does not follow the same protocols. This may compromise the principle of interoperability to some extent. In this regard, as stated in the “Materials and Methods” section, our data were designed to achieve a balance between customization and standardization. Some elements indeed follow previous semantics and ontologies, but others are new for the geographical and chronological context of this project. The former type of data is thus designed to be interoperable, directly usable, and linkable, whereas the latter type has the potential to be so. These limitations do not necessarily contradict our ascription to the Seven Data Spheres, but they signal the nuances and technical complexities of achieving an overall open (and thus ethical) approach.
Further considerations regarding the use of open data remain unresolved, particularly in terms of practical utility, broader impacts, and ethical implications. The novelty of our project limits the ability to conduct an accurate citation analysis to determine its impact and influence. However, to date, our datasets—integrated across two platforms (the PERAIA web application and Zenodo)—have collectively garnered more than a hundred downloads, reflecting interest in the research. Nonetheless, the investment of time in the development and implementation of these datasets must be considered by researchers when designing similar projects. In our case, it required two working years, which was a significant amount of both time and resources.
The overarching question is whether we are being ethical merely to be ethical or whether we are truly addressing a real need; complying with a hollow ethical stance is an empty task. To us, these approaches are ethical by nature because they signal the real needs of both the research community and society. Archaeological heritage is an asset or a “commons,” and so are the data on such matters. Regarding this, our data approaches and implementations have tried to reach out to every societal sector. Even though the project's outcomes will be of more interest to the archaeological and heritage sectors, they are also open to anyone interested. We therefore bet on open approaches that, on the one hand, can make legacy (and new) data LOUDer and FAIRer, and that, on the other, can transcend outreach limitations.
Conclusions
In this article, we explored one of the main outcomes of the PERAIA project. Our aim was not just to develop a project capable of resolving some historical questions, which is fundamental. Inspired by previous efforts, the project was designed since its inception to apply open approaches to archaeological and heritage data management. The FAIR and LOUD data principles have been our main guides in this endeavor. As this article shows, the project has developed a relational geodatabase divided into two gazetteers—one for sites and one for historical routes—in which to store data (much of which came from legacy sources). The development of this open database and the management of our data were influenced by historical questions as much as by our ethical commitment to open, social research—that is, open science.
Our effort to obtain open data is thus part of recent trends in archaeology that promote open science, a movement that seeks open access, data, and methods (Marwick et al. Reference Marwick, d'Alpoim, Michael Barton, Bates, Baxter, Bewan and Bollwerk2017). The premises of this movement touch on three pillars: accessibility + (FAIR and LO(U)D) data + open methodologies—that is, approaches that encourage open data and, at least for us, explain the methodologies followed. Moreover, for open science's advocates, openness normally happens at different levels: it involves not only academics but also the public (i.e., all societal sectors). In other words, open science seeks the involvement of different social agents in the production and reproduction of knowledge.
Open practices entail, among other things, accessible, findable, and interoperable data. Moreover, to us, there is no doubt that open methodologies enhance reproducibility and trustiness (from other researchers and interested audiences). Nonetheless, a fundamental principle of this approach is that open science transforms epistemology and transcends its limits by putting society at its center, thereby arguably reinforcing politically engaged cultural research. Such a consideration must also take appropriate lessons stemming from the CARE principles. All this explains the strategy implemented by the PERAIA project, which is developing pathways to achieve open science for different social communities.
Acknowledgments
We extend our gratitude to Maurizio Toscano from the Spanish Foundation for Science and Technology (FECYT) and Carlos Rodríguez Rellán from the University of Granada for their contributions to this project. A permit was not required for this work.
Funding Statement
This project was funded by the Spanish Ministry of Science, Innovation and Universities (FPU17/06503) and the Vice-Rectorate for Research and Knowledge Transfer of the University of Granada (Ref: PPJIB2020.18). Funding for open access charge: Universidad de Granada.
Data Availability Statement
All the data and associated information are available at https://peraia.es/gazetteer/ and https://zenodo.org/record/7678852.
Competing Interests
The authors have no competing interests to declare.