On model selection for dense stochastic block models

Ilkka Norros; Hannu Reittu; Fülöp Bazsó

doi:10.1017/apr.2021.29

On model selection for dense stochastic block models

Part of: Sufficiency and information Probability theory on algebraic and topological structures

Published online by Cambridge University Press: 14 January 2022

and

Ilkka Norros*: Affiliation:
University of Helsinki
Hannu Reittu*: Affiliation:
VTT Technical Research Centre of Finland
Fülöp Bazsó*: Affiliation:
Wigner Research Centre for Physics
*: *Postal address: Department of Mathematics and Statistics, P.O. Box 64, FI-00014 University of Helsinki, Finland. Email address: ilkka.norros@elisanet.fi
**Postal address: VTT Technical Research Centre of Finland, Espoo, 02044 VTT, Finland. Email address: hannu.reittu@vtt.fi
***Postal address: Department of Computational Sciences, Institute for Particle and Nuclear Physics, Wigner Research Centre for Physics, P.O. Box 49, H-1525 Budapest, Hungary. Email address: bazso.fulop@wigner.hu

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

This paper studies estimation of stochastic block models with Rissanen’s minimum description length (MDL) principle in the dense graph asymptotics. We focus on the problem of model specification, i.e., identification of the number of blocks. Refinements of the true partition always decrease the code part corresponding to the edge placement, and thus a respective increase of the code part specifying the model should overweight that gain in order to yield a minimum at the true partition. The balance between these effects turns out to be delicate. We show that the MDL principle identifies the true partition among models whose relative block sizes are bounded away from zero. The results are extended to models with Poisson-distributed edge weights.

Keywords

Minimum description length principle random graph Szemerédi’s regularity lemma regular decomposition

MSC classification

Primary: 60B20: Random matrices (probabilistic aspects; for algebraic aspects see )

Secondary: 62B10: Information-theoretic topics

Information

Type: Original Article
Information: Advances in Applied Probability , Volume 54 , Issue 1 , March 2022 , pp. 202 - 226

DOI: https://doi.org/10.1017/apr.2021.29 [Opens in a new window]
Copyright: © The Author(s), 2022. Published by Cambridge University Press on behalf of Applied Probability Trust

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Alon, N., Fischer, E., Newman, I. and Shapira, A. (2006). A combinatorial characterization of the testable graph properties: it’s all about regularity. In Proc. 38th Annual ACM Symposium on Theory of Computing (STOC ’06), Association for Computing Machinery, New York, pp. 251–260.Google Scholar

Bolla, M. (2016). Relating multiway discrepancy and singular values of nonnegative rectangular matrices. Discrete Appl. Math. 203, 26–34.10.1016/j.dam.2015.09.013CrossRef Google Scholar

Bolla, M. and Elbanna, A. (2015). Estimating parameters of a probabilistic heterogeneous block model via the EM algorithm. J. Prob. Statist. 2015, 657965.10.1155/2015/657965CrossRef Google Scholar

Cover, T. and Thomas, J. (1991). Elements of Information Theory. John Wiley, New York.10.1002/0471200611CrossRef Google Scholar

Gao, C., Ma, Z., Zhang, A. and Zhou, H. (2017). Achieving optimal misclassification proportion in stochastic block model. J. Mach. Learning Res. 18, 1–45.Google Scholar

Grünwald, P. (2007). Minimum Description Length Principle. MIT Press.10.7551/mitpress/4643.001.0001CrossRef Google Scholar

Heimlicher, S., Lelarge, M. and Massoulié, L. (2012). Community detection in the labelled stochastic block model. In NIPS Workshop on Algorithmic and Statistical Approaches for Large Social Networks, Lake Tahoe, NV. Available at https://arxiv.org/abs/1209.2910.Google Scholar

Holland, P., Laskey, K. and Leinhardt, S. (1983). Stochastic blockmodels: first steps. Social Networks 5, 109–137.10.1016/0378-8733(83)90021-7CrossRef Google Scholar

Komlós, J. and Simonovits, M. (1996). Szemerédi’s regularity lemma and its applications in graph theory. In Combinatorics, Paul Erdös Is Eighty, eds Miklós, D., Sós, V. and Szonyi, T., János Bolyai Mathematical Society, Budapest, pp. 295–352.Google Scholar

Massoulié, L. (2014). Community detection thresholds and the weak Ramanujan property. In Proc. 46th Annual ACM Symposium on Theory of Computing (STOC ’14), Association for Computing Machinery, New York, pp. 694–703.10.1145/2591796.2591857CrossRef Google Scholar

Nepusz, T., Négyessy, L., Tusnády, G. and Bazsó, F. (2008). Reconstructing cortical networks: case of directed graphs with high level of reciprocity. In Handbook of Large-Scale Random Networks (Bolyai Society Mathematical Studies 18), eds Bollobás, B., Kozma, R. and Miklós, D., Springer, Berlin, Heidelberg, pp. 325–368.10.1007/978-3-540-69395-6_8CrossRef Google Scholar

Pehkonen, V. and Reittu, H. (2011). Szemerédi-type clustering of peer-to-peer streaming system. In Proc. 2011 International Workshop on Modeling, Analysis, and Control of Complex Networks (Cnet ’11), Association for Computing Machinery, New York, pp. 23–30.Google Scholar

Peixoto, T. P. (2012). Entropy of stochastic blockmodel ensembles. Phys. Rev. E 85, 056122.10.1103/PhysRevE.85.056122CrossRef Google Scholar PubMed

Peixoto, T. P. (2013). Parsimonious module inference in large networks. Phys. Rev. Lett. 110, 148701.10.1103/PhysRevLett.110.148701CrossRef Google Scholar PubMed

Reittu, H., Bazsó, F. and Weiss, R. (2014). Regular decomposition of multivariate time series and other matrices. In Structural, Syntactic, and Statistical Pattern Recognition (S+SSPR 2014), eds Fränti, P., Brown, G., Loog, M., Escolano, F. and Pelillo, M., Springer, Berlin, Heidelberg, pp. 424–433.10.1007/978-3-662-44415-3_43CrossRef Google Scholar

Reittu, H., Leskelä, L., Räty, T. and Fiorucci, M. (2018). Analysis of large sparse graphs using regular decomposition of graph distance matrices. In Proc. 2018 IEEE International Conference on Big Data, Institute of Electrical and Electronics Engineers, New York, pp. 3784–3792.10.1109/BigData.2018.8622118CrossRef Google Scholar

Reittu, H., Norros, I. and Bazsó, F. (2017). Regular decomposition of large graphs and other structures: scalability and robustness towards missing data. In Proc. 2017 IEEE International Conference on Big Data, Institute of Electrical and Electronics Engineers, New York, pp. 3352–3357.10.1109/BigData.2017.8258320CrossRef Google Scholar

Reittu, H. et al. (2019). Regular decomposition of large graphs: foundation of a sampling approach to stochastic block model fitting. Data Sci. Eng. 4, 44–60.10.1007/s41019-019-0084-xCrossRef Google Scholar

Rissanen, J. (1983). A universal prior for integers and estimation by minimum description length. Ann. Statist. 11, 416–431.10.1214/aos/1176346150CrossRef Google Scholar

Rissanen, J. (1998). Stochastic Complexity in Statistical Inquiry. World Scientific, Singapore.10.1142/0822CrossRef Google Scholar

Rosvall, M. and Bergstrom, C. T. (2007). An information-theoretic framework for resolving community structure in complex networks. Proc. Nat. Acad. Sci. USA 104, 7327–7331.10.1073/pnas.0611034104CrossRef Google Scholar

Szemerédi, E. (1976). Regular partitions of graphs. In Problèmes Combinatoires et Théorie des Graphes (Colloq. Int. CNRS 260), CNRS, Orsay, pp. 399–401.Google Scholar

Wang, Y. X. R. and Bickel, P. J. (2017). Likelihood-based model selection for stochastic block models. Ann. Statist. 45, 500–528.10.1214/16-AOS1457CrossRef Google Scholar

Yaglom, A. and Yaglom, I. (1983). Probability and Information. D. Reidel, Dordrecht.Google Scholar

Article contents

On model selection for dense stochastic block models

Abstract

Keywords

MSC classification

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests