Processing math: 100%

2506.08618

Total: 1

#1 HSG-12M: A Large-Scale Spatial Multigraph Dataset [PDF] [Copy] [Kimi] [REL]

Authors: Xianquan Yan, Hakan Akgün, Kenji Kawaguchi, N. Duane Loh, Ching Hua Lee

Existing graph benchmarks assume non-spatial, simple edges, collapsing physically distinct paths into a single link. We introduce HSG-12M, the first large-scale dataset of spatial multigraphsgraphs embedded in a metric space where multiple geometrically distinct trajectories between two nodes are retained as separate edges. HSG-12M contains 11.6 million static and 5.1 million dynamic Hamiltonian spectral graphs across 1401 characteristic-polynomial classes, derived from 177 TB of spectral potential data. Each graph encodes the full geometry of a 1-D crystal's energy spectrum on the complex plane, producing diverse, physics-grounded topologies that transcend conventional node-coordinate datasets. To enable future extensions, we release Poly2Graph: a high-performance, open-source pipeline that maps arbitrary 1-D crystal Hamiltonians to spectral graphs. Benchmarks with popular GNNs expose new challenges in learning from multi-edge geometry at scale. Beyond its practical utility, we show that spectral graphs serve as universal topological fingerprints of polynomials, vectors, and matrices, forging a new algebra-to-graph link. HSG-12M lays the groundwork for geometry-aware graph learning and new opportunities of data-driven scientific discovery in condensed matter physics and beyond.

Subjects: Machine Learning , Mesoscale and Nanoscale Physics , Other Condensed Matter , Artificial Intelligence , Computer Vision and Pattern Recognition

Publish: 2025-06-10 09:25:19 UTC