Optimal Partitioning of a Data Set Based on the p-Median Model

Michael J. Brusco; Hans-Friedrich Köhn

doi:10.1007/s11336-007-9021-4

Optimal Partitioning of a Data Set Based on the p-Median Model

Published online by Cambridge University Press: 01 January 2025

Michael J. Brusco and

Hans-Friedrich Köhn

Show author details

Michael J. Brusco*: Affiliation:
Florida State University
Hans-Friedrich Köhn: Affiliation:
University of Missouri-Columbia
*: Requests for reprints should be sent to Michael J. Brusco, Department of Marketing, Florida State University, Tallahassee, FL 32306-1110, USA. E-mail: mbrusco@cob.fsu.edu

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Although the K-means algorithm for minimizing the within-cluster sums of squared deviations from cluster centroids is perhaps the most common method for applied cluster analyses, a variety of other criteria are available. The p-median model is an especially well-studied clustering problem that requires the selection of p objects to serve as cluster centers. The objective is to choose the cluster centers such that the sum of the Euclidean distances (or some other dissimilarity measure) of objects assigned to each center is minimized. Using 12 data sets from the literature, we demonstrate that a three-stage procedure consisting of a greedy heuristic, Lagrangian relaxation, and a branch-and-bound algorithm can produce globally optimal solutions for p-median problems of nontrivial size (several hundred objects, five or more variables, and up to 10 clusters). We also report the results of an application of the p-median model to an empirical data set from the telecommunications industry.

Keywords

combinatorial data analysis cluster analysis p-median problem Lagrangian relaxation branch and bound heuristics

Information

Type: Theory and Methods
Information: Psychometrika , Volume 73 , Issue 1 , March 2008 , pp. 89 - 105

DOI: https://doi.org/10.1007/s11336-007-9021-4 [Opens in a new window]
Copyright: Copyright © 2007 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Agmon, S. (1954). The relaxation method for linear inequalities. Canadian Journal of Mathematics, 6, 382–392.CrossRef Google Scholar

Anderson, E. (1935). The irises of the Gaspé peninsula. Bulletin of the American Iris Society, 59, 2–5.Google Scholar

Avella, P., Sassano, A., & Vasil’ev, I. (2003). Computational study of large-scale p -median problems. Technical Report, Dipartimento di Informatica e Sistemistica, Università di Roma “La Sapienza.”Google Scholar

Beltran, C., Tadonki, C., & Vial, J. (2006). Solving the p-median problem with a semi-Lagrangian relaxation. Computational Optimization and Applications, June 5, 2006, DOI: 10.1007/s10589-006-6513-6.CrossRef Google Scholar

Brusco, M.J. (2006). A repetitive branch-and-bound algorithm for minimum within-cluster sums of squares partitioning. Psychometrika, 71, 347–363.CrossRef Google Scholar PubMed

Brusco, M.J., & Stahl, S. (2005). Branch-and-bound applications in combinatorial data analysis, New York: Springer.Google Scholar

Brusco, M.J., Cradit, J.D., & Tashchian, A. (2003). Multicriterion clusterwise regression for joint segmentation settings: An application to customer value. Journal of Marketing Research, 40, 225–234.CrossRef Google Scholar

Christofides, N., & Beasley, J.E. (1982). A tree search algorithm for the p-median problem. European Journal of Operational Research, 10, 196–204.CrossRef Google Scholar

Cornuejols, G., Fisher, M.L., & Nemhauser, G.L. (1977). Location of bank accounts to optimize float: An analytic study of exact and approximate algorithms. Management Science, 23, 789–810.CrossRef Google Scholar

Du Merle, O., & Vial, J.-P. (2002). Proximal-ACCPM, a cutting plane method for column generation and Lagrangian relaxation: Application to the p -median problem. Technical Report 2002.23, HEC Genève, University of Genève.Google Scholar

Du Merle, O., Hansen, P., Jaumard, B., & Mladenović, N. (2000). An interior point algorithm for minimum sum-of-squares clustering. SIAM Journal on Scientific Computing, 21, 1485–1505.CrossRef Google Scholar

Erlenkotter, D. (1977). Facility location with price-sensitive demands: Private, public, and quasi-public. Management Science, 24, 378–386.CrossRef Google Scholar

Fisher, R.A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, 179–188.CrossRef Google Scholar

Fisher, M.L. (1981). The Lagrangian relaxation method for solving integer programming problems. Management Science, 27, 1–18.CrossRef Google Scholar

Forgy, E.W. (1965). Cluster analyses of multivariate data: Efficiency versus interpretability of classifications. Biometrics, 21, 768.Google Scholar

Grötschel, M., & Holland, O. (1991). Solution of large-scale symmetric traveling salesman problems. Mathematical Programming, 51, 141–202.CrossRef Google Scholar

Hair, J.F., Anderson, R.E., Tatham, R.L., & Black, W.C. (1998). Multivariate data analysis, (5th ed.). Upper Saddle River: Prentice-Hall.Google Scholar

Hakimi, S.L. (1964). Optimum locations of switching centers and the absolute centers and medians of a graph. Operations Research, 12, 450–459.CrossRef Google Scholar

Hanjoul, P., &Peeters, D. (1985). A comparison of two dual-based procedures for solving the p-median problem. European Journal of Operational Research, 20, 387–396.CrossRef Google Scholar

Hansen, P., & Jaumard, B. (1997). Cluster analysis and mathematical programming. Mathematical Programming, 79, 191–215.CrossRef Google Scholar

Hansen, P., Mladenoviĉ, N., & Perez-Brito, D. (2001). Variable neighborhood decomposition search. Journal of Heuristics, 7, 335–350.CrossRef Google Scholar

Hartigan, J.A. (1975). Clustering algorithms, New York: Wiley.Google Scholar

Hartigan, J.A., & Wong, M.A. (1979). Algorithm AS136: A k-means clustering program. Applied Statistics, 28, 100–128.CrossRef Google Scholar

Heinz, G., Peterson, L.J., Johnson, R.W., & Kerk, C.J. (2003). Exploring relationships in body dimensions. Journal of Statistics Education, 11. Available at: www.amstat.org/publications/jse/v11n2/datasets.heinz.html.Google Scholar

Held, M., & Karp, R.M. (1970). The traveling salesman problem and minimum spanning trees. Operations Research, 18, 1138–1162.CrossRef Google Scholar

Held, M., Wolfe, P., & Crowder, H.P. (1974). Validation of subgradient optimization. Mathematical Programming, 6, 62–88.CrossRef Google Scholar

Hubert, L.J. (1987). Assignment methods in combinatorial data analysis, New York: Marcel Dekker.Google Scholar

Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218.CrossRef Google Scholar

Hubert, L.J., & Baker, F.B. (1978). Applications of combinatorial programming to data analysis: The traveling salesman and related problems. Psychometrika, 43, 81–91.CrossRef Google Scholar

Hubert, L.J., & Schultz, J.V. (1976). Quadratic assignment as a general data analysis strategy. British Journal of Mathematical and Statistical Psychology, 29, 190–241.CrossRef Google Scholar

Hubert, L., Arabie, P., & Meulman, J. (2001). Combinatorial data analysis: Optimization by dynamic programming, Philadelphia: SIAM.CrossRef Google Scholar

Hubert, L., Arabie, P., & Meulman, J. (2006). The structural representation of proximity matrices with MATLAB, Philadelphia: SIAM.CrossRef Google Scholar

Johnson, S.C. (1967). Hierarchical clustering schemes. Psychometrika, 32, 241–254.CrossRef Google Scholar PubMed

Klastorin, T. (1985). The p-median problem for cluster analysis: A comparative test using the mixture model approach. Management Science, 31, 84–95.CrossRef Google Scholar

Lin, S., & Kernighan, B.W. (1973). An effective heuristic algorithm for the traveling salesman problem. Operations Research, 21, 498–516.CrossRef Google Scholar

MacQueen, J.B. (1967). Some methods for classification and analysis of multivariate observations. In Le Cam, L.M., Neyman, J. (Eds.), Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (pp. 281–297). Berkeley: University of California Press.Google Scholar

Motzkin, T., & Schoenberg, I.J. (1954). The relaxation method for linear inequalities. Canadian Journal of Mathematics, 6, 393–404.CrossRef Google Scholar

Mulvey, J.M., & Crowder, H.P. (1979). Cluster analysis: An application of Lagrangian relaxation. Management Science, 25, 329–340.CrossRef Google Scholar

Narula, S.C., Ogbu, U.I., & Samuelson, H.M. (1977). An algorithm for the p-median problem. Operations Research, 25, 709–713.CrossRef Google Scholar

Rao, M.R. (1971). Cluster analysis and mathematical programming. Journal of the American Statistical Association, 66, 622–626.CrossRef Google Scholar

Reinelt, G. (2001). TSPLIB. http://www.iwr.uni-heidelberg.de/groups/comopt/software/TSPLIB95.Google Scholar

Sokal, R.R., & Sneath, P.H.A. (1963). Principles of numerical taxonomy, San Francisco: Freeman.Google Scholar

Späth, H. (1980). Cluster analysis algorithms for data reduction and classification of objects, New York: Wiley.Google Scholar

Steinley, D. (2004). Properties of the Hubert–Arabie adjusted Rand index. Psychological Methods, 9, 386–396.CrossRef Google Scholar PubMed

Steinley, D. (2006). K-Means clustering: A half-century synthesis. British Journal of Mathematical and Statistical Psychology, 59, 1–34.CrossRef Google Scholar PubMed

Steinley, D. (2006). Profiling local optima in K-means clustering: Developing a diagnostic technique. Psychological Methods, 11, 178–192.CrossRef Google Scholar PubMed

Teitz, M.B., & Bart, P. (1968). Heuristic methods for estimating the generalized vertex median of a weighted graph. Operations Research, 16, 955–961.CrossRef Google Scholar

Ward, J.H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58, 236–244.CrossRef Google Scholar

Article contents

Optimal Partitioning of a Data Set Based on the p-Median Model

Abstract

Keywords

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests