Processing math: 100%

2507.06226

Total: 1

#1 Consistency and Inconsistency in K-Means Clustering [PDF1] [Copy] [Kimi] [REL]

Authors: Moïse Blanchard, Adam Quinn Jaffe, Nikita Zhivotovskiy

A celebrated result of Pollard proves asymptotic consistency for k-means clustering when the population distribution has finite variance. In this work, we point out that the population-level k-means clustering problem is, in fact, well-posed under the weaker assumption of a finite expectation, and we investigate whether some form of asymptotic consistency holds in this setting. As we illustrate in a variety of negative results, the complete story is quite subtle; for example, the empirical k-means cluster centers may fail to converge even if there exists a unique set of population k-means cluster centers. A detailed analysis of our negative results reveals that inconsistency arises because of an extreme form of cluster imbalance, whereby the presence of outlying samples leads to some empirical k-means clusters possessing very few points. We then give a collection of positive results which show that some forms of asymptotic consistency, under only the assumption of finite expectation, may be recovered by imposing some a priori degree of balance among the empirical k-means clusters.

Subjects: Statistics Theory , Probability , Machine Learning

Publish: 2025-07-08 17:59:02 UTC