Skip to main content Accessibility help
×
Hostname: page-component-745bb68f8f-b95js Total loading time: 0 Render date: 2025-01-28T22:14:26.377Z Has data issue: false hasContentIssue false

Machine Learning for Archaeological Applications in R

Published online by Cambridge University Press:  10 December 2024

Denisse L. Argote
Affiliation:
Instituto Nacional de Antropología e Historia
Pedro A. López-­García
Affiliation:
Escuela Nacional de Antropología e Historia
Manuel A. Torres-­García
Affiliation:
Instituto Nacional de Antropología e Historia
Michael C. Thrun
Affiliation:
Philipps-Universität Marburg, Germany

Summary

This Element highlights the employment within archaeology of classification methods developed in the field of chemometrics, artificial intelligence, and Bayesian statistics. These run in both high- and low-dimensional environments and often have better results than traditional methods. Instead of a theoretical approach, it provides examples of how to apply these methods to real data using lithic and ceramic archaeological materials as case studies. A detailed explanation of how to process data in R (The R Project for Statistical Computing), as well as the respective code, are also provided in this Element.
Get access
Type
Element
Information
Online ISBN: 9781009506625
Publisher: Cambridge University Press
Print publication: 16 January 2025

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abascal, R. (1974). Análisis por Activación de Neutrones: Una Aportación para la Arqueología Moderna, B.A. Thesis in Archaeology, Escuela Nacional de Antropologia e Historia, Mexico City, Mexico.Google Scholar
Aitchison, J. (1986). The Statistical Analysis of Compositional Data. London: Chapman & Hall.CrossRefGoogle Scholar
Argote-Espino, D. L., Solé, J., Sterpone, O. & López García, P. (2010). Análisis composicional de seis yacimientos de obsidiana del centro de México y su clasificación con DBSCAN. Arqueología, 43, 197215.Google Scholar
Argote-Espino, D. L., Solé, J., López-García, P. & Sterpone, O. (2012). Obsidian sub-source identification in the Sierra de Pachuca and Otumba volcanic regions, Central Mexico, by ICP-MS and DBSCAN statistical analysis. Geoarchaeology, 27, 4862.CrossRefGoogle Scholar
Aubert, A. H., Thrun, M. C., Breuer, L. & Ultsch, A. (2016). Knowledge discovery from high-frequency stream nitrate concentrations: Hydrology and biology contributions. Scientific Reports, 6, 31536.CrossRefGoogle ScholarPubMed
Biernacki, C., Marbac, M. & Vandewalle, V. (2021). Gaussian-based visualization of gaussian and non-gaussian-based clustering. Journal of Classification, 38(1), 129157.CrossRefGoogle Scholar
Brambila, R. (1988). Los estudios de la cerámica Anaranjada Delgada: ensayo bibliográfico. In Puche, M. C. Serra and Navarrete, C., coords., Ensayos de Alfarería prehispánica e histórica de Mesoamérica. Homenaje a Eduardo Noguera Auza. México: Instituto de Investigaciones Antropológicas and Universidad Nacional Autónoma de México, pp. 221247.Google Scholar
Brinkmann, L., Stier, Q. & Thrun, M. C. (2023). Computing sensitive color transitions for the identification of two-dimensional structures. In Proceedings of Data Science, Statistics & Visualisation and the European Conference on Data Analysis. Antwerp: University of Antwerp, DSSV-ECDA, pp. 57.Google Scholar
Callaghan, M. G., Pierce, D. E., Kovacevich, B. & Glascock, M. D. (2017). Chemical paste characterization of late middle Preclassic-period ceramics from Holtun, Guatemala and its implications for production and exchange. Journal of Archaeological Science: Reports, 12, 334345.Google Scholar
Carr, S. (2015). Geochemical Characterization of Obsidian Subsources in Highland Guatemala, B.A. Thesis, Pennsylvania State University, US.Google Scholar
Cavallo, M. & Demiralp, Ç. (2018). Clustrophile 2: Guided visual clustering analysis. IEEE Transactions on Visualization and Computer Graphics, 25(1), 267276.CrossRefGoogle Scholar
Choo, J., Lee, H., Liu, Z., Stasko, J. & Park, H. (2013). An interactive visual testbed system for dimension reduction and clustering of large-scale high-dimensional data. In Proceedings of SPIE-IS and T Electronic Imaging – Visualization and Data Analysis 2013 [8654]. Burlingame, CA: The International Society for Optics and Photonics, 865402.Google Scholar
Cobean, R. H. (2002). A World of Obsidian: The Mining and Trade of a Volcanic Glass in Ancient Mexico. Mexico: Instituto Nacional de Antropologia e Historia and Pittsburgh University.Google Scholar
Cook de Leonard, C. (1953). Los popolocas de Puebla (ensayo de una identificación etnodemográfica e histórico arqueológica), Huastecos, Totonacos y sus vecinos. Revista Mexicana de Estudios Antropológicos, 13(2–3), 423445.Google Scholar
Dasgupta, S. & Gupta, A. (2003). An elementary proof of a theorem of Johnson and Lindenstrauss. Random Structures & Algorithms, 22(1), 6065.CrossRefGoogle Scholar
Davies, D. L. & Bouldin, D. W. (1979). A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1(2), 224227.CrossRefGoogle ScholarPubMed
Dunn, J. C. (1974). Well-separated clusters and optimal fuzzy partitions. Journal of Cybernetics, 4(1), 95104.CrossRefGoogle Scholar
Egozcue, J. J., Pawlowsky-Glahn, V., Mateu-Figueras, G. & Barceló-Vidal, C. (2003). Isometric logratio transformations for compositional data analysis. Mathematical Geology, 35(3), 279300.CrossRefGoogle Scholar
Glascock, M. D., Braswell, G. E. & Cobean, R. H. (1998). A systematic approach to obsidian source characterization. In Shackley, M. S., ed., Archaeological Obsidian Studies. Vol 3 of Advances in Archaeological and Museum Science. Boston, MA: Springer, pp. 1565.Google Scholar
Grün, B. (2019). Chapter 8: Model-based clustering. In Frühwirth-Schnatter, S., Celeux, G., and Robert, C. P., eds., Handbook of Mixture Analysis. Boca Raton, FL: Chapman and Hall/CRC Press, pp. 136.Google Scholar
Harbottle, G., Sayre, E. V. & Abascal, R. (1976). Neutron Activation Analysis of Thin Orange Pottery. Upton, NY: Brookhaven National Lab.Google Scholar
Holzinger, A. (2018). From machine learning to explainable AI. IEEE, Proceedings of the 2018 World Symposium on Digital Intelligence for Systems and Machines (DISA), 5566.CrossRefGoogle Scholar
Holzinger, A., Plass, M., Kickmeier-Rust, M. et al. (2019). Interactive machine learning: Experimental evidence for the human in the algorithmic loop. Applied Intelligence, 49(7), 24012414.CrossRefGoogle Scholar
Horikoshi, M., Tang, Y., Dickey, A. et al. (2023). Package ‘ggfortify’ Version 0.4.16: Data Visualization Tools for Statistical Analysis Results. https://cran.r-project.org/web/packages/ggfortify/ (Accessed: August 03).Google Scholar
Hubert, M., Rousseeuw, P. J. & Vanden Branden, K. (2005). ROBPCA: A new approach to robust principal component analysis. Technometrics, 47(1), 6479.CrossRefGoogle Scholar
Hunt, A. M. W. & Speakman, R. J. (2015). Portable XRF analysis of archaeological sediments and ceramics. Journal of Archaeological Science, 53, 626638.CrossRefGoogle Scholar
Jain, A. K. & Dubes, R. C. (1988). Algorithms for Clustering Data. Vol. 3. Englewood Cliffs, NJ: Prentice Hall College Division.Google Scholar
Jeong, D. H., Ziemkiewicz, C., Fisher, B., Ribarsky, W. & Chang, R. (2009). iPCA: An Interactive system for PCA‐based visual analytics. Computer Graphics Forum, 28(3), 767774.CrossRefGoogle Scholar
Johnson, W. B. & Lindenstrauss, J. (1984). Extensions of Lipschitz mappings into a Hilbert space. Contemporary Mathematics, 26(1), 189206.CrossRefGoogle Scholar
Kaufman, L. & Rousseeuw, P. J. (2005). Finding Groups in Data: An Introduction to Cluster Analysis. Hoboken, NJ: Wiley-Interscience.Google Scholar
Kessler, D. (2019). Introducing the MBC Procedure for Model-Based Clustering. www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2019/3016-2019.pdf (Accessed: December 09, 2020).Google Scholar
Kolb, C. (1973). Thin Orange Pottery at Teotihuacan. In Sanders, W., ed., Miscellaneous Papers in Anthropology, Vol. 8. Pennsylvania: Pennsylvania State University, pp. 309377.Google Scholar
Kucheryavskiy, S. (2020). mdatools – R package for chemometrics. Chemometrics and Intelligent Laboratory Systems, 198, 103937.CrossRefGoogle Scholar
Kwon, B. C., Eysenbach, B., Verma, J. et al. (2017). Clustervision: Visual supervision of unsupervised clustering. IEEE Transactions on Visualization and Computer Graphics, 24(1), 142151.CrossRefGoogle ScholarPubMed
Lebret, R., Lovleff, S., Langrognet, F. et al. (2015). Rmixmod: The R package of the model-based unsupervised, supervised, and semi-supervised classification Mixmod library. Journal of Statistical Software, 67(6), 129.CrossRefGoogle Scholar
Liland, K. H. & Indahl, U. G. (2020). Package “EMSC”. Extended Multiplicative Signal Correction. https://cran.r-project.org/web/packages/EMSC/EMSC.pdf (Accessed: August 22, 2020).Google Scholar
Linné, S. (2003). Mexican Highland Cultures: Archaeological Researches at Teotihuacan, Calpulalpan and Chalchicomula in 1934–1935. Alabama: The University of Alabama Press.Google Scholar
López-García, P. A. & Argote, D. L. (2023). Cluster analysis for the selection of potential discriminatory variables and the identification of subgroups in archaeometry. Journal of Archaeological Science: Reports, 49, 104022.Google Scholar
López-García, P., Argote, D. L. & Beirnaert, C. (2019). Chemometric analysis of Mesoamerican obsidian sources. Quaternary International, 510, 100118.CrossRefGoogle Scholar
López-García, P., Argote, D. L. & Thrun, M. C. (2020). Projection-based classification of chemical groups and provenance analysis of archaeological materials. IEEE Access, 8, 152439152451.CrossRefGoogle Scholar
López Luján, L., Neff, H. & Sugiyama, S. (2000). The 9-Xi Vase: A Classic Thin Orange vessel found at Tenochtitlan. In Carrasco, D., Jones, L., and Sessions, S., coords., Mesoamerica’s Classic Heritage: From Teotihuacan to the Aztecs. Boulder, CO: University Press of Colorado, pp. 219249.Google Scholar
Lukas-Tooth, H. J. & Price, B. J. (1961). A mathematical method for the investigation of interelement effects in x-ray fluorescence analysis. Metallurgia, 64(2), 149152.Google Scholar
Mac Aodha, O., Stathopoulos, V., Brostow, G. J., et al. (2014). Putting the scientist in the loop–accelerating scientific progress with interactive machine learning. In Proceedings of the 2014 22nd International Conference on Pattern Recognition. Stockholm: IEEE, pp. 917.CrossRefGoogle Scholar
Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M. & Hornik, K. (2022). Cluster: Cluster Analysis Basics and Extensions. R Package Version 2.1.4. https://CRAN.R-project.org/package=cluster (Accessed: August 03, 2023).Google Scholar
Minc, L. D., Sherman, R. J., Elson, C., et al. (2016). Ceramic provenance and the regional organization of pottery production during the later Formative periods in the Valley of Oaxaca, Mexico: Results of trace-element and mineralogical analyses. Journal of Archaeological Science: Reports, 8, 2846.Google Scholar
Müller, F., 1978. La Cerámica del Centro Ceremonial de Teotihuacán. México: Instituto Nacional de Antropología e Historia.Google Scholar
Müller, E., Assent, I., Krieger, R., Jansen, T. & Seidl, T. (2008). Morpheus: Interactive exploration of subspace clustering. Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Las Vegas, NV: ACM, pp. 10891092.CrossRefGoogle Scholar
Murtagh, F. (2004). On ultrametricity, data coding, and computation. Journal of Classification, 21(2), 167184.CrossRefGoogle Scholar
Nance, R. D., Mille, B. V., Keppie, J. D., Murphy, J. B. & Dostal, J. (2006). Acatlán Complex, southern Mexico: Record spanning the assembly and breakup of Pangea. Geology, 34(10), 857860.CrossRefGoogle Scholar
Parent, S. É., Parent, L. E., Rozanne, D. E., Hernandes, A. & Natale, W. (2012). Chapter 4: Nutrient balance as paradigm of soil and plant chemometrics. In Issaka, R. Nuhu, ed., Soil Fertility. London: InTechOpen, pp. 83114.Google Scholar
Partovi Nia, V. & Davison, A. C. (2012). High-dimensional bayesian clustering with variable selection: The R package bclust. Journal of Statistical Software, 47(5), 122.Google Scholar
Partovi Nia, V. & Davison, A. C. (2015). Package “bclust”: Bayesian Hierarchical Clustering Using Spike and Slab Models. https://cran.microsoft.com/snapshot/2017-07-05/web/packages/bclust/bclust.pdf (Accessed: January 15, 2020).Google Scholar
Pawlowsky-Glahn, V., Egozcue, J. (2011). Exploring compositional data with the CoDa-dendrogram. Austrian Journal of Statistics, 40(1–2), 103113.Google Scholar
R Development Core Team (2011). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. www.R-project.org (Accessed: November 24, 2016).Google Scholar
Rasmussen, M. & Karypis, G. (2004). Gcluto: An interactive clustering, visualization, and analysis system. CSE/UMN Technical Report: TR# 04–021, Department of Computer Science & Engineering, University of Minnesota, Minnesota, USA. https://hdl.handle.net/11299/215615 (Accessed: November 24, 2016).Google Scholar
Rattray, E. C. (1979). La cerámica de Teotihuacan: relaciones externas y cronología. Anales de Antropología, 16, 5170.Google Scholar
Rattray, E. C. (2001). Teotihuacan: Ceramics, Chronology and Cultural Trends. Mexico: Instituto Nacional de Antropología e Historia – University of Pittsburgh.Google Scholar
Rattray, E. C. & Harbottle, G. (1992). Neutron Activation Analysis and numerical taxonomy of Thin Orange ceramics from the manufacturing sites of Rio Carnero, Puebla, Mexico. In Neff, H., ed., Chemical Characterization of Ceramic Pastes in Archaeology. Madison, WI: Prehistory Press, pp. 221231.Google Scholar
Rousseeuw, P. J. & van Zomeren, B. C. (1990). Unmasking multivariate outliers and leverage points. Journal of the American Statistical Association, 85(411), 633639.CrossRefGoogle Scholar
Rowe, H., Hughes, N. & Robinson, K. (2012). The quantification and application of handheld energy-dispersive X-ray fluorescence (ED-XRF) in Mudrock Chemostratigraphy and Geochemistry. Chemical Geology, 324325, 122131.CrossRefGoogle Scholar
Ruvalcaba-Sil, J. L., Ontalba Salamanca, M. A., Manzanilla, L. et al. (1999). Characterization of pre-Hispanic pottery from Teotihuacan, Mexico, by a combined PIXE-RBS and XRD analysis. Nuclear Instruments and Methods in Physics Research B: Beam Interactions with Materials and Atoms, 150(1–4), 591596.CrossRefGoogle Scholar
Shepard, A. O. (1946). Technological features of Thin Orange Ware. In Kidder, A., Jenning, J., and Shook, E., eds., Excavations at Kaminaljuyu. Publication 561. Washington, DC: Carnegie Institution of Washington, pp. 198201.Google Scholar
Sotomayor, A. & Castillo, N. (1963). Estudio Petrográfico de la Cerámica “Anaranjada Delgada”. Publicaciones del Departamento de Prehistoria no. 12. México: Instituto Nacional de Antropología e Historia.Google Scholar
Stevens, A. & Ramirez-Lopez, L. (2015). Packageprospectr: Miscellaneous functions for processing and sample selection of Vis-NIR Diffuse Reflectance Data. https://cran.rproject.org/web/packages/prospectr/prospectr.pdf (Accessed: April 24, 2017).Google Scholar
Stevens, A., Ramirez-Lopez, L. & Hans, G. (2022). Package “prospectr”: Miscellaneous Functions for Processing and Sample Selection of Spectroscopic Data Version 0.2.4. https://mran.microsoft.com/web/packages/prospectr/prospectr.pdf (Accessed: April 17, 2022).Google Scholar
Stoner, W. D. (2016). The analytical nexus of ceramic paste composition studies: A comparison of NAA, LA-ICP-MS, and petrography in the prehispanic Basin of Mexico. Journal of Archaeological Science, 76, 3147.CrossRefGoogle Scholar
Thrun, M. C. (2018). Projection Based Clustering through Self-Organization and Swarm Intelligence: Combining Cluster Analysis with the Visualization of High-Dimensional Data (Ultsch, A. & Hüllermeier, E., eds.). Heidelberg: Springer Vieweg.CrossRefGoogle Scholar
Thrun, M. C. (2021a). The exploitation of distance distributions for clustering. International Journal of Computational Intelligence and Applications, 20(3), 2150016.CrossRefGoogle Scholar
Thrun, M. C. (2021b). Distance-based clustering challenges for unbiased benchmarking studies. Scientific Reports, 11, 18988.CrossRefGoogle ScholarPubMed
Thrun, M. C. (2022). Exploiting distance-based structures in data using an explainable AI for stock picking. Information, 13(2), 51CrossRefGoogle Scholar
Thrun, M. C. & Ultsch, A. (2021). Swarm intelligence for self-organized clustering. Artificial Intelligence, 290, 103237.CrossRefGoogle Scholar
Thrun, M. C., Lerch, F., Lötsch, J. & Ultsch, A. (2016). Visualization and 3D printing of multivariate data of biomarkers. In Skala, V., ed., 24th International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision (WSCG). Plzen, Czech Republic, pp. 716.Google Scholar
Thrun, M. C., Pape, F. & Ultsch, A. (2020). Interactive machine learning tool for clustering in visual analytics. In 7th IEEE International Conference on Data Science and Advanced Analytics (DSAA). Sydney: IEEE, pp. 672680.Google Scholar
Thrun, M. C., Pape, F. & Ultsch, A. (2021a). Conventional displays of structures in data compared with interactive projection-based clustering (IPBC). International Journal of Data Science and Analytics, 12(3), 249271.CrossRefGoogle Scholar
Thrun, M. C., Ultsch, A., & Breuer, L. (2021b). Explainable AI framework for multivariate hydrochemical time series. Machine Learning and Knowledge Extraction, 3(1), 170205.CrossRefGoogle Scholar
Thrun, M. C., Mack, E., Neubauer, A. et al. (2022). A bioinformatics view on acute myeloid leukemia surface molecules by combined Bayesian and ABC analysis. Bioengineering, 9(11), 642.CrossRefGoogle ScholarPubMed
Thrun, M. C., Märte, J. & Stier, Q. (2023). Analyzing quality measurements for dimensionality reduction. Machine Learning and Knowledge Extraction, 5(3), 10761118.CrossRefGoogle Scholar
Todorov, V. (2020). Package ‘rrcov’: Scalable Robust Estimators with High Breakdown Point. https://cran.r-project.org/web/packages/rrcov/rrcov.pdf (Accessed: September 24).Google Scholar
Todorov, V. & Filzmoser, P. (2009). An object-oriented framework for robust multivariate analysis. Journal of Statistical Software, 32(3), 147.CrossRefGoogle Scholar
Ultsch, A. (1999). Data mining and knowledge discovery with emergent self-organizing feature maps for multivariate time series. In Oja, E. and Kaski, S., eds., Kohonen Maps. Amsterdam: Elsevier Science B.V., pp. 3346.CrossRefGoogle Scholar
Ultsch, A. (2003). U*-matrix: A tool to Visualize Clusters in High Dimensional Data. Technical report 36, Department of Mathematics and Computer Science, University of Marburg, Germany.Google Scholar
Ultsch, A. & Siemon, H. P. (1990). Kohonen’s self-organizing feature maps for exploratory data analysis. In Proceedings of the International Neural Network Conference. Paris: Kluwer Academic Press, pp. 305308.Google Scholar
van den Boogaart, K. G. & Tolosana-Delgado, R. (2013). Analyzing Compositional Data with R. Heidelberg: Springer-Verlag.CrossRefGoogle Scholar
van den Boogaart, K. G., Tolosana-Delgado, R. & Bren, M. (2023). Package “compositions” versión 2.0–6: Compositional Data Analysis. https://cran.r-project.org/web/packages/compositions/ (Accessed: August 03).Google Scholar
Walesiak, M. & Dudek, A. (2020). Package “clusterSim”: Searching for Optimal Clustering Procedure for a Data Set. https://cran.r-project.org/web/packages/clusterSim/clusterSim.pdf (Accessed: November 15).Google Scholar
Wilkinson, L. & Friendly, M. (2009). The history of the cluster heat map. The American Statistician, 63(2), 179184.CrossRefGoogle Scholar
Wiwie, C., Baumbach, J. & Röttger, R. (2015). Comparing the performance of biomedical clustering methods. Nature Methods, 12, 10331038.CrossRefGoogle ScholarPubMed
Yang, Y., Kandogan, E., Li, Y., Sen, P. & Lasecki, W. S. (2019). A study on interaction in human-in-the-loop machine learning for text analytics. In Proceedings of the ACM IUI Workshops ‘19. Los Angeles, CA: ACM, pp. 17.Google Scholar
Zanzotto, F. M. (2019). Human-in-the-loop artificial intelligence. Journal of Artificial Intelligence Research, 64, 243252.CrossRefGoogle Scholar
Zhang, L., Tang, C., Shi, Y. et al. (2002). VizCluster: An interactive visualization approach to cluster analysis and its application on microarray data. In Proceedings of the 2002 SIAM International Conference on Data Mining (SDM). Philadelphia, PA: Society for Industrial and Applied Mathematics, pp. 1940.Google Scholar

Save element to Kindle

To save this element to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Machine Learning for Archaeological Applications in R
Available formats
×

Save element to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Machine Learning for Archaeological Applications in R
Available formats
×

Save element to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Machine Learning for Archaeological Applications in R
Available formats
×