Published online by Cambridge University Press: 03 May 2019
X-ray diffraction-X-ray fluorescence (XRD-XRF) data sets obtained from surface scans of synthetic samples have been analysed by means of different data clustering algorithms, with the aim to propose a methodology for automatic crystallographic and chemical classification of surfaces. Three data clustering strategies have been evaluated, namely hierarchical, k-means, and density-based clustering; all of them have been applied to the distance matrix calculated from the single XRD and XRF data sets as well as the combined distance matrix. Classification performance is reported for each strategy both in numerical form as the corrected Rand index and as a visual reconstruction of the surface maps. Hierarchical and k-means clustering offered comparable results, depending on both sample complexity and data quality. When applied to XRF data collected on a two-phases test sample, both algorithms allowed to obtain Rand index values above 0.8, whereas XRD data collected on the same sample gave values around 0.5; application to the combined distance matrix improved the correlation to about 0.9. In the case of a more complex multi-phase sample, it has also been found that classification performance strongly depends on both data quality and signal contrast between different regions; again, the adoption of the combined dissimilarity matrix offered improved classification performance.