Total: 1
This thesis focuses on the discovery of stochastic differential equations (SDEs) and stochastic partial differential equations (SPDEs) from noisy and discrete time series. A major challenge is selecting the simplest possible correct model from vast libraries of candidate models, where standard information criteria (AIC, BIC) are often limited. We introduce PASTIS (Parsimonious Stochastic Inference), a new information criterion derived from extreme value theory. Its penalty term, nBln(n0/p), explicitly incorporates the size of the initial library of candidate parameters (n0), the number of parameters in the considered model (nB), and a significance threshold (p). This significance threshold represents the probability of selecting a model containing more parameters than necessary when comparing many models. Benchmarks on various systems (Lorenz, Ornstein-Uhlenbeck, Lotka-Volterra for SDEs; Gray-Scott for SPDEs) demonstrate that PASTIS outperforms AIC, BIC, cross-validation (CV), and SINDy (a competing method) in terms of exact model identification and predictive capability. Furthermore, real-world data can be subject to large sampling intervals (Δt) or measurement noise (σ), which can impair model learning and selection capabilities. To address this, we have developed robust variants of PASTIS, PASTIS-Δt and PASTIS-σ, thus extending the applicability of the approach to imperfect experimental data. PASTIS thus provides a statistically grounded, validated, and practical methodological framework for discovering simple models for processes with stochastic dynamics.