Hostname: page-component-78c5997874-v9fdk Total loading time: 0 Render date: 2024-11-10T13:56:20.245Z Has data issue: false hasContentIssue false

Artificial Benchmark for Community Detection (ABCD)—Fast random graph model with community structure

Published online by Cambridge University Press:  26 January 2021

Bogumił Kamiński
Affiliation:
Decision Analysis and Support Unit, SGH Warsaw School of Economics, Warsaw, Poland (e-mail: bkamins@sgh.waw.pl)
Paweł Prałat*
Affiliation:
Department of Mathematics, Ryerson University, Toronto, Ontario, Canada
François Théberge
Affiliation:
Tutte Institute for Mathematics and Computing, Ottawa, Ontario, Canada (e-mail: theberge@ieee.org)
*
*Corresponding author. Email: pralat@ryerson.ca
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Most of the current complex networks that are of interest to practitioners possess a certain community structure that plays an important role in understanding the properties of these networks. For instance, a closely connected social communities exhibit faster rate of transmission of information in comparison to loosely connected communities. Moreover, many machine learning algorithms and tools that are developed for complex networks try to take advantage of the existence of communities to improve their performance or speed. As a result, there are many competing algorithms for detecting communities in large networks. Unfortunately, these algorithms are often quite sensitive and so they cannot be fine-tuned for a given, but a constantly changing, real-world network at hand. It is therefore important to test these algorithms for various scenarios that can only be done using synthetic graphs that have built-in community structure, power law degree distribution, and other typical properties observed in complex networks. The standard and extensively used method for generating artificial networks is the LFR graph generator. Unfortunately, this model has some scalability limitations and it is challenging to analyze it theoretically. Finally, the mixing parameter μ, the main parameter of the model guiding the strength of the communities, has a non-obvious interpretation and so can lead to unnaturally defined networks. In this paper, we provide an alternative random graph model with community structure and power law distribution for both degrees and community sizes, the Artificial Benchmark for Community Detection (ABCD graph). The model generates graphs with similar properties as the LFR one, and its main parameter ξ can be tuned to mimic its counterpart in the LFR model, the mixing parameter μ. We show that the new model solves the three issues identified above and more. In particular, we test the speed of our algorithm and do a number of experiments comparing basic properties of both ABCD and LFR. The conclusion is that these models produce graphs with comparable properties but ABCD is fast, simple, and can be easily tuned to allow the user to make a smooth transition between the two extremes: pure (independent) communities and random graph with no community structure.

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
© The Author(s), 2021. Published by Cambridge University Press

Footnotes

Action Editor: Ulrik Brandes

References

Aiello, W., Bonato, A., Cooper, C., Janssen, J. C. M., & Prałat, P. (2008). A spatial web graph model with local influence regions. Internet Mathematics, 5(1), 175196.CrossRefGoogle Scholar
Albert-László Barabási, R. A. (1999). Emergence of scaling in random networks. Science, 286(5439), 509512.CrossRefGoogle Scholar
Bae, S.-H., & Howe, B. (2015). Gossipmap: A distributed community detection algorithm for billion-edge directed graphs. In SC’15, vol. 27 (pp. 1–12). ACM.CrossRefGoogle Scholar
Barabási, A.-L. (2016). Network science. Cambridge: Cambridge University Press.Google Scholar
Bender, E. A. & Canfield, E. R. (1978). The asymptotic number of labeled graphs with given degree sequences. Journal of Combinatorial Theory, Series A, 24(3), 296307.CrossRefGoogle Scholar
Bezanson, J., Edelman, A., Karpinski, S., & Shah, V. B. (2017). Julia: A fresh approach to numerical computing. SIAM Review, 69, 6598.CrossRefGoogle Scholar
Bollobás, B. (1980). A probabilistic proof of an asymptotic formula for the number of labelled regular graphs. European Journal of Combinatorics, 1, 311316.CrossRefGoogle Scholar
Buzun, N., Korshunov, A., Avanesov, V., Filonenko, I., Kozlov, I., Turdakov, D., & Kim, H. (2014). Egolp: Fast and distributed community detection in billion-node social networks. In IEEE ICDM mining workshop (pp. 533–540).CrossRefGoogle Scholar
Chakrabarti, D., Zhan, Y., & Faloutsos, C. (2004). R-mat: A recursive model for graph mining. In proceedings of the 2004 SIAM international conference on data mining (pp. 442–446). SIAM.CrossRefGoogle Scholar
Chung, F., & Lu, L. (2006). Complex graphs and networks. Providence, RI: American Mathematical Society.CrossRefGoogle Scholar
Dao, V. L., Bothorel, C., & Lenca, P. (2020). Community structure: A comparative evaluation of community detection methods. Network Science, 8(1) 141.CrossRefGoogle Scholar
Emmons, S., Kobourov, S. G., Gallant, M., & Börner, K. (2016). Analysis of network clustering algorithms and cluster quality metrics at scale. Plos One, 11, 118.CrossRefGoogle ScholarPubMed
Fortunato, S. (2010). Community detection in graphs. Physics Reports, 486, 75174.CrossRefGoogle Scholar
Funke, D., Lamm, S., Sanders, P., Schultz, C., Strash, D., & von Looz, M. (2018). Communication-free massively distributed graph generation. In international parallel and distributed processing symposium (IPDPS).CrossRefGoogle Scholar
Girvan, M., & Newman, M. E. J. (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99, 7821–7826.CrossRefGoogle Scholar
Gkantsidis, C., Mihail, M., & Zegura, E. W. (2003). The markov chain simulation method for generating connected power law random graphs. In ALENEX’03 (pp. 16–25). SIAM.Google Scholar
Greenhill, C., & Sfragara, M. (2018). The switch Markov chain for sampling irregular graphs and digraphs. Theoretical Computer Science, 719, 120.CrossRefGoogle Scholar
Hamann, M., Meyer, U., Penschuck, M., Tran, H., & Wagner, D. (2018). I/o-efficient generation of massive graphs following the lfr benchmark. Journal of Experimental Algorithmics, 23, 2.5:1–2.5:33.CrossRefGoogle Scholar
Janson, S. (2020). Random graphs with given vertex degrees and switchings. Random Structures Algorithms, 57, 331.CrossRefGoogle Scholar
Kaminski, B., Poulin, V., Prałat, P., Szufel, P., & Theberge, F. (2019). Clustering via hypergraph modularity. Plos One, 14, e0224307.CrossRefGoogle ScholarPubMed
Kaminski, B., Prałat, P., & Theberge, F. (2021). Community detection algorithm using hypergraph modularity. In Proceedings of the 9th international conference on complex networks and their applications, Studies in Computational Intelligence 943 (pp. 152–163). Springer.Google Scholar
Krioukov, D. V., Papadopoulos, F., Kitsak, M., Vahdat, A., & Boguñá, M. (2010). Hyperbolic geometry of complex networks. Physical Review E, 82(036106).CrossRefGoogle ScholarPubMed
Kolda, T. G., Pinar, A., Plantenga, T., & Seshadhri, C. (2014). A scalable generative graph model with community structure. SIAM Journal on Scientific Computing, 36, C424–C452.CrossRefGoogle Scholar
Lancichinetti, A., & Fortunato, S. (2009). Benchmark graphs for testing community detection algorithms on directed and weighted graphs with overlapping communities. Physical Review E, 80, 016118.CrossRefGoogle ScholarPubMed
Lancichinetti, A., Fortunato, S., & Radicchi, F. (2008). Benchmark graphs for testing community detection algorithms. Physical Review E, 78.CrossRefGoogle ScholarPubMed
Milo, R., Kashtan, N., Itzkovitz, S., Newman, M. E. J., & Alon, U. (2003). On the uniform generation of random graphs with prescribed degree sequences. arxiv:cond-mat/0312028.Google Scholar
Newman, M. (2010). Networks: An introduction. Oxford: Oxford University Press.CrossRefGoogle Scholar
Newman, M. E. J. & Girvan, M. (2004). Finding and evaluating community structure in networks. Physical Review E, 69, 26113.Google ScholarPubMed
Penschuck, M., Brandes, U., Hamann, M., Lamm, S., Meyer, U., … Schulz, C. (2020). Recent advances in scalable network generation. Tech. rept. https://arxiv.org/abs/2003.00736.Google Scholar
Prokhorenkova, L. O., Prałat, P., & Raigorodskii, A. (2017). Modularity of complex networks models. Internet Mathematics. doi: 10.24166/im.12.2017 Google Scholar
Ray, J., Pinar, A., & Seshadhri, C. (2012). Are we there yet? when to stop a markov chain while generating random graphs. In WAW’12. Lecture Notes in Computer Science. Springer (pp. 153–164).CrossRefGoogle Scholar
Seshadhri, C. K., Tamara, G., & Pinar, A. (2012). Community structure and scale-free collections of erdös-rényi graphs. Physical Review E, 85, 056109.CrossRefGoogle ScholarPubMed
Slota, G. M., Berry, J., Hammond, S. D., Olivier, S., Phillips, C., & Rajamanickam, S. (2019). Scalable generation of graphs for benchmarking HPC community-detection algorithms. In IEEE international conference for high performance computing, networking, storage and analysis (SC).CrossRefGoogle Scholar
Staudt, C. L., Hamann, M., Gutfraind, A., Safro, I., & Meyerhenke, H. (2017). Generating realistic scaled complex networks. Applied Network Science, 2(36), 129.CrossRefGoogle ScholarPubMed
Viger, F., & Latapy, M. (2005). Efficient and simple generation of random simple connected graphs with prescribed degree sequence. In L. Wang (Eds.), Computing and combinatorics (pp. 440–449). Berlin, Heidelberg: Springer.Google Scholar
West, D. B. (2001). Introduction to graph theory (2nd ed.). Upper Saddle River: Prentice Hall.Google Scholar
Winlaw, M., DeSterck, H., & Sanders, G. (2015). An in-depth analysis of the chung-lu model. Tech. rept. LLNL-TR-678729. Lawrence Livermore Technical Report, doi: 10.2172/1239211.CrossRefGoogle Scholar
Wormald, N. C. (1984). Generating random regular graphs. Journal of Algorithms, 5(2), 247280.CrossRefGoogle Scholar
Zeng, J., & Yu, H. (2016). A study of graph partitioning schemes for parallel graph community detection. Parallel Computing, 58, 131139.CrossRefGoogle Scholar