How the result of graph clustering methods dependson the construction of the graph

Markus Maier; Ulrike von Luxburg; Matthias Hein

doi:10.1051/ps/2012001

How the result of graph clustering methods dependson the construction of the graph

Published online by Cambridge University Press: 21 May 2013

Markus Maier ,

Ulrike von Luxburg and

Matthias Hein

Show author details

Markus Maier: Affiliation:
Max Planck Institute for Intelligent Systems, Spemannstr. 38, 72076 Tübingen, Germany. mmaier@tuebingen.mpg.de
Ulrike von Luxburg: Affiliation:
Max Planck Institute for Intelligent Systems, Spemannstr. 38, 72076 Tübingen, Germany. mmaier@tuebingen.mpg.de Department of Computer Science, University of Hamburg, Vogt-Kölln-Str. 30, 22527 Hamburg, Germany; luxburg@informatik.uni-hamburg.de
Matthias Hein: Affiliation:
Faculty of Mathematics and Computer Science, Saarland University, Postfach 151150, 66041, Saarbrücken, Germany; hein@cs.uni-sb.de

Article contents

Abstract
References

Get access

Abstract

We study the scenario of graph-based clustering algorithms such as spectral clustering. Given a set of data points, one first has to construct a graph on the data points and then apply a graph clustering algorithm to find a suitable partition of the graph. Our main question is if and how the construction of the graph (choice of the graph, choice of parameters, choice of weights) influences the outcome of the final clustering result. To this end we study the convergence of cluster quality measures such as the normalized cut or the Cheeger cut on various kinds of random geometric graphs as the sample size tends to infinity. It turns out that the limit values of the same objective function are systematically different on different types of graphs. This implies that clustering results systematically depend on the graph and can be very different for different types of graph. We provide examples to illustrate the implications on spectral clustering.

Keywords

Random geometric graph clustering graph cuts

Information

Type: Research Article
Information: ESAIM: Probability and Statistics , Volume 17 , 2013 , pp. 370 - 418

DOI: https://doi.org/10.1051/ps/2012001 [Opens in a new window]
Copyright: © EDP Sciences, SMAI, 2013

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Angluin, D. and Valiant, L., Fast probabilistic algorithms for Hamiltonian circuits. J. Comput. Syst. Sci. 18 (1979) 155–193. Google Scholar

Biau, G., Cadre, B. and Pelletier, B., A graph-based estimator of the number of clusters. ESAIM: PS 11 (2007) 272–280. Google Scholar

Brito, M., Chavez, E., Quiroz, A. and Yukich, J., Connectivity of the mutual k-nearest-neighbor graph in clustering and outlier detection. Stat. Probab. Lett. 35 (1997) 33–42. Google Scholar

Bubeck, S. and von Luxburg, U., Nearest neighbor clustering: a baseline method for consistent clustering with arbitrary objective functions. J. Mach. Learn. Res. 10 (2009) 657–698. Google Scholar

J.W. Harris and H. Stocker, Handbook of Mathematics and Computational Science. Springer (1998).

Hoeffding, W., Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58 (1963) 13–30. Google Scholar

Loftsgaarden, D.O. and Quesenberry, C.P., A nonparametric estimate of a multivariate density function. Ann. Math. Stat. 36 (1965) 1049–1051. Google Scholar

Maier, M., Hein, M. and von Luxburg, U., Optimal construction of k-nearest neighbor graphs for identifying noisy clusters. Theoret. Comput. Sci. 410 (2009) 1749–1764. Google Scholar

M. Maier, U. von Luxburg and M. Hein, Influence of graph construction on graph-based clustering measures, in Advances in Neural Information Processing Systems, vol. 21, edited by D. Koller, D. Schuurmans, Y. Bengio and L. Bottou. MIT Press (2009) 1025–1032.

Miller, G., Teng, S., Thurston, W. and Vavasis, S., Separators for sphere-packings and nearest neighbor graphs. J. ACM 44 (1997) 1–29. Google Scholar

H. Narayanan, M. Belkin and P. Niyogi, On the relation between low density separation, spectral clustering and graph cuts, in Advances in Neural Information Processing Systems, vol. 19, edited by B. Schölkopf, J. Platt and T. Hoffman. MIT Press (2007) 1025–1032.

Srivastav, A. and Stangier, P., Algorithmic Chernoff-Hoeffding inequalities in integer programming. Random Struct. Algorithms 8 (1996) 27–58. Google Scholar

von Luxburg, U., A tutorial on spectral clustering. Stat. Comput. 17 (2007) 395–416. Google Scholar

von Luxburg, U., Belkin, M. and Bousquet, O., Consistency of spectral clustering. Ann. Stat. 36 (2008) 555–586. Google Scholar

Article contents

How the result of graph clustering methods dependson the construction of the graph

Abstract

Keywords

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests