Controlled Paraphrase Geometry in Sentence Embedding Space: Local Manifold Modeling and Latent Probing

#1 Controlled Paraphrase Geometry in Sentence Embedding Space: Local Manifold Modeling and Latent Probing [PDF] [Copy] [Kimi¹] [REL]

The paper studies the local geometry of embedding clouds induced by \emph{controlled local classes of semantically close sentences}. The central question is how controlled paraphrase-like semantic variation is organized in sentence embedding space and whether this local structure can be explicitly modeled by low-degree fitted carriers. We introduce a local geometric modeling scheme based on affine, quadratic, and cubic fitted models. We also use a surface-based latent probing procedure that constructs synthetic latent points in a reduced local PCA space with respect to the fitted carrier. The procedure is intended as an offline method for representation-space analysis, local manifold modeling, and geometry-aware latent probing. Generated latent points are evaluated using criteria that measure consistency with the fitted surface, preservation of neighborhood structure, agreement with the empirical distribution, stability of Hessian-based second-order shape descriptors, and stability of fitted-model coefficients. Experiments on controlled sets of semantically close sentences show that nonlinear local models describe embedding clouds more accurately than affine models. Surface-based generation provides strong fitted-geometry fidelity, including surface consistency, Hessian-based shape consistency, and coefficient consistency. Downstream experiments show that geometric validity of synthetic latent points does not automatically translate into improved classification performance. The results support explicit local geometric modeling of sentence embedding space and highlight the need to distinguish geometric validity from discriminative utility. As a resource contribution, we introduce \textbf{CoPaGE-300K}, a controlled template-based dataset of semantically close sentence variants with slot-level annotations and precomputed sentence embeddings.

Subject: Computation and Language

Publish: 2026-05-01 20:12:06 UTC

2605.01073

#1 Controlled Paraphrase Geometry in Sentence Embedding Space: Local Manifold Modeling and Latent Probing [PDF] [Copy] [Kimi1] [REL]

#1 Controlled Paraphrase Geometry in Sentence Embedding Space: Local Manifold Modeling and Latent Probing [PDF] [Copy] [Kimi¹] [REL]