Community detection in graphs, also known as graph partitioning, is a well-studied NP-hard problem. Various heuristic approaches have been adopted %for solving to tackle this problem in polynomial time. One such approach, as outlined in the IEEE HPEC Graph Challenge, is Bayesian statistics-based stochastic block partitioning. This method delivers high-quality partitions in sub-quadratic runtime, but it fails to scale to very large graphs. In this paper, we present sampling as an avenue for speeding up the algorithm on large graphs. We first show that existing sampling techniques can preserve a graph’s community structure. We then show that sampling for stochastic block partitioning can be used to produce a speedup of between 2.18X and 7.26X for graph sizes between 5,000 and 50,000 vertices without a significant loss in the accuracy of community detection.
Supplementary notes can be added here, including code and math.