Total: 1
A fundamental step in knowledge discovery is statistically assessing data mining results. In network analysis, such evaluation compares the outcome of a given procedure with the outcomes obtained from randomized versions of the observed network. Despite its importance, available graph null models only preserve simple characteristics of the observed graph, such as its degree sequence. In this paper we introduce the Colored Configuration Model (CCM), a new null model for vertex-colored multigraphs. Our main motivation is the study of online social networks, where the color of a user represents their side in a debate. The key novelty of CCM is preserving the Colored Degree Matrix (CDM), which encodes, for each vertex, the number of neighbors of any given color. Preserving the CDM allows fixing the color assortativity of all nodes, e.g., the propensity of each user to interact with other like-minded users. This allows testing whether a given phenomenon is explained by the observed CDM, or whether other characteristics of the network might play a key role. Available graph null models do not preserve the CDM, so they cannot assess its impact on real-world tasks, such as testing the significance of network polarization measures. To sample from the CCM, we develop Sirius-B, a simple baseline adapting the Metropolis-Hastings approach, and Sirius, a refined algorithm tailored to preserve the CDM, thus achieving provably faster mixing. In our experimental evaluation, we test Sirius on real-world networks, comparing it with related network null models. We observed that the evaluation of the statistical significance of polarization measures with Sirius may lead to different insights compared to available null models. Thus, Sirius is an effective tool for the statistically-sound analysis of social networks.