Total: 1
Measuring the similarity between data points plays a vital role in lots of popular representation learning tasks such as metric learning and contrastive learning. Most existing approaches utilize point-level distances to learn the point-to-point similarity between pairwise instances. However, since the finite number of training data points cannot fully cover the whole sample space consisting of an infinite number of points, the generalizability of the learned distance is usually limited by the sample size. In this paper, we thus extend the conventional form of data point to the new form of data ball with a predictable volume, so that we can naturally generalize the existing point-level distance to a new volume-aware distance (VAD) which measures the field-to-field geometric similarity. The learned VAD not only takes into account the relationship between observed instances but also uncovers the similarity among those unsampled neighbors surrounding the training data. This practice significantly enriches the coverage of sample space and thus improves the model generalizability. Theoretically, we prove that VAD tightens the error bound of traditional similarity learning and preserves crucial topological properties. Experiments on multi-domain data demonstrate the superiority of VAD over existing approaches in both supervised and unsupervised tasks.