Diluvian Clustering: A Fast, Effective Algorithm for Clustering Compositional and Other Data

Nicholas W. M. Ritchie

doi:10.1017/S1431927615014701

Diluvian Clustering: A Fast, Effective Algorithm for Clustering Compositional and Other Data

Published online by Cambridge University Press: 24 August 2015

Nicholas W. M. Ritchie

Show author details

Nicholas W. M. Ritchie*: Affiliation:
Materials Measurement Science Division, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, MD 20899-8372, USA
*: *Corresponding author. nicholas.ritchie@nist.gov

Article contents

Abstract
Footnotes
References

Get access

Abstract

Diluvian Clustering is an unsupervised grid-based clustering algorithm well suited to interpreting large sets of noisy compositional data. The algorithm is notable for its ability to identify clusters that are either compact or diffuse and clusters that have either a large number or a small number of members. Diluvian Clustering is fundamentally different from most algorithms previously applied to cluster compositional data in that its implementation does not depend upon a metric. The algorithm reduces in two-dimensions to a case for which there is an intuitive, real-world parallel. Furthermore, the algorithm has few tunable parameters and these parameters have intuitive interpretations. By eliminating the dependence on an explicit metric, it is possible to derive reasonable clusters with disparate variances like those in real-world compositional data sets. The algorithm is computationally efficient. While the worst case scales as O(N2) most cases are closer to O(N) where N is the number of discrete data points. On a mid-range 2014 vintage computer, a typical 20,000 particle, 30 element data set can be clustered in a fraction of a second.

Keywords

data mining clustering composition EPMA particle analysis

Type: Equipment and Software Development
Information: Microscopy and Microanalysis , Volume 21 , Issue 5 , October 2015 , pp. 1173 - 1183

DOI: https://doi.org/10.1017/S1431927615014701 [Opens in a new window]
Copyright: © Microscopy Society of America 2015

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Official contribution of the National Institute of Standards and Technology; not subject to copyright in the United States.

References

Aggarwal, C.C. & Reddy, C.K (2014). Data Clustering: Algorithms and Applications. Boca Raton, FL: CRC Press.Google Scholar

Bright, D.S. & Newbury, D.E (2004). Maximum pixel spectrum: A new tool for detecting and recovering rare, unanticipated features from spectrum image data cubes. J Microsc 216(2), 186–193.CrossRef Google Scholar PubMed

Cortes, C. & Vapnik, V (1995). Support-vector networks. Mach Lear 20(3), 273–297.CrossRef Google Scholar

Dempster, A., Laird, N. & Rubin, D (1977). Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc BMethodol 39(1), 1–38.Google Scholar

Gan, G., Ma, C. & Wu, J (2007). Data Clustering: Theory, Algorithms and Applications. Philadelphia, PA: ASA-SIAM Series on Statistics and Applied Probability.Google Scholar

Goldstein, J.I., Newbury, D.E., Joy, D.C., Lyman, C.E., Echlin, P., Lifshin, E., Sawyer, L. & Michael, J.R (2003). Scanning Electron Microscopy and X-ray Microanalysis. New York, NY: Kluwer Academic/Plenum Publishers.CrossRef Google Scholar

Kotula, P., Keenan, M. & Michael, J.R (2003). Automated analysis of SEM X-ray spectral images: A powerful new microanalysis tool. Microsc Microanal 9, 1–17.Google Scholar

MacQueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297, University of California Press, Berkley, CA, USA.Google Scholar

Mott, R.B., Waldman, C.G., Batcheler, R. & Friel, J.J (1995). Position-tagged spectrometry: A new approach for EDS spectrum-imaging. In Proc. Microscopy and Microanalysis, Bailey G.W., Ellisman M.H., Hennigar R.A. & Zaluzec N.J. (Eds.), pp. 592–593. New York, NY: Jones and Begell Publishing.Google Scholar

Newbury, D.E (2005). X-ray spectrometry and spectrum image mapping at output count rates above 100 kHz with a silicon drift detector on a scanning electron microscope. Scanning 27, 227–239.CrossRef Google Scholar

Schamber, F.H (1977). A modification of the linear least squares fitting method which provides continuum suppression. In X-Ray Fluorescence Analysis of Environmental Samples, Dzubay, T. (Ed.), pp. 241–257. Ann Arbor, MI: Ann Arbor Science Publishers.Google Scholar

Schikuta, E (1996). Grid-clustering: a fast hierarchical clustering method for very large data sets. In Proceedings 15th International Conference on Pattern Recognition, IEEE Computer Society Press, Los Alamitos, CRPC-TR93358, pp. 101–105.Google Scholar

Vandecreme, A., Bajcsy, P., Ritchie, N.W.M. & Scott, J.H (2014). Interactive analysis of terabyte-sized SEM-EDS hyperspectral images. Microsc Microanal 20–S3, 654–655.CrossRef Google Scholar

Wilson, N.C., MacRae, C.M., Torpy, A., Davidson, C.J. & Vicenzi, E.P. (2012). Hyperspectral cathodoluminescence examination of defects in a carbonado diamond. Microsc Microanal 18(6), 1–10.Google Scholar

Zimek, A (2014). Clustering high-dimension data. In Data Clustering: Algorithms and Applications, Aggarwal, C. & Reddy, C. (Eds.), pp 201–230. Boca Raton, FL: CRC Press.Google Scholar

Article contents

Diluvian Clustering: A Fast, Effective Algorithm for Clustering Compositional and Other Data

Abstract

Keywords

Access options

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests