Total: 1
We introduce a framework for the data-driven discovery of stochastic differential equations (SDEs) that unifies, for the first time, the weak-form integration-by-parts approach of Weak SINDy with the stochastic system identification goal of stochastic SINDy. The central novelty is the adoption of spatial Gaussian test functions $K_j(x)=\exp(-|x-x_j|^2/2h^2)$ in place of temporal test functions. Because the kernel weight $K_j(X_{t_n})$ is $\mathcal{F}_{t_n}$-measurable and the Brownian innovation $ξ_n$ is independent of $\mathcal{F}_{t_n}$, every noise term in the projected response has zero conditional mean given the current state -- a property that guarantees unbiasedness in expectation and prevents the structural regression bias that afflicts temporal test functions in the stochastic setting. This design choice converts the SDE identification problem into two sparse linear systems -- one for the drift $b(x)$ and one for the diffusion tensor $a(x)$ -- that share a single design matrix and are solved jointly via $\ell_1$-regularised regression with grouped cross-validation. A two-step bias-correction procedure handles state-dependent diffusion. Validated on the Ornstein--Uhlenbeck process, the double-well Langevin system, and a multiplicative diffusion process, the method recovers all active polynomial generators with coefficient errors below 4\%, stationary-density total-variation distances below 0.01, and autocorrelation functions that faithfully reproduce true relaxation timescales across all three benchmarks.