Total: 1
Local search is a powerful clustering technique that provides high-quality solutions with theoretical guarantees. With distance-based sampling strategies, local search methods can achieve constant approximations for clustering with linear running time in data size. Despite their effectiveness, existing algorithms still face scalability issues as they require scanning the entire dataset for iterative center swaps. This typically leads to an O(ndk) running time, where n is the data size, d is the dimension, k is the number of clusters. To further improve the efficiency of local search algorithms, we propose new methods based on adaptive sampling and bandit strategies. Specifically, adaptive sampling can well approximate the distance-based sampling distribution without maintaining pairwise distances between data points and the centers, enabling fast and accurate sampling in sublinear time after an $\tilde{O}(nd)$ time preprocessing step. The bandit strategy models the best swap pair selection as a bandit problem, where a grouping strategy is proposed for fast identification of the optimal swap pair. With these techniques, our proposed algorithm can achieve constant approximation in expected running time $\tilde{O}(nd + k^4)$ under mild assumptions on optimal clusters and swap pair distributions. Our approach also extends naturally to the k-median objective, achieving constant approximation in expected running time $\tilde{O}(nd + \sqrt{n}k^3)$ without distributional assumptions. Empirical results demonstrate that our algorithm achieves up to 1000× speedup over existing local search methods on datasets with 100 million points, while delivering comparable clustering quality. Compared to coreset-based approaches, it also provides up to 80× speedup and consistently yields better clustering results.