Data Analysis, Statistics and Probability

2025-12-05 | | Total: 8

#1 CNN on `Top': In Search of Scalable & Lightweight Image-based Jet Taggers [PDF] [Copy] [Kimi] [REL]

Authors: Rajneil Baruah, Subhadeep Mondal, Sunando Kumar Patra, Satyajit Roy

While Transformer-based and standard Graph Neural Networks (GNNs) have proven to be the best performers in classifying different types of jets, they require substantial computational power. We explore the scope of using a lightweight and scalable version of the EfficientNet architecture, along with global features of the jet. The end product is computationally inexpensive but is capable of competitive performance. We showcase the efficacy of our network for tagging top-quark jets in a sea of other light-quark and gluon jets.

Subjects: High Energy Physics - Phenomenology , Computational Physics , Data Analysis, Statistics and Probability

Publish: 2025-12-04 17:48:08 UTC


#2 Amortized Inference of Multi-Modal Posteriors using Likelihood-Weighted Normalizing Flows [PDF] [Copy] [Kimi] [REL]

Author: Rajneil Baruah

We present a novel technique for amortized posterior estimation using Normalizing Flows trained with likelihood-weighted importance sampling. This approach allows for the efficient inference of theoretical parameters in high-dimensional inverse problems without the need for posterior training samples. We implement the method on multi-modal benchmark tasks in 2D and 3D to check for the efficacy. A critical observation of our study is the impact of the topology of the base distributions on the modelled posteriors. We find that standard unimodal base distributions fail to capture disconnected support, resulting in spurious probability bridges between modes. We demonstrate that initializing the flow with a Gaussian Mixture Model that matches the cardinality of the target modes significantly improves reconstruction fidelity, as measured by some distance and divergence metrics.

Subjects: Machine Learning , High Energy Physics - Experiment , High Energy Physics - Phenomenology , Computational Physics , Data Analysis, Statistics and Probability

Publish: 2025-12-04 16:22:53 UTC


#3 Bayesian stepwise estimation of qubit rotations [PDF1] [Copy] [Kimi] [REL]

Authors: Mylenne Manrique, Marco Barbieri, Assunta Di Vizio, Miranda Parisi, Gabriele Bizzarri, Ilaria Gianani, Matteo G. A. Paris

This work investigates Bayesian stepwise estimation (Se) for measuring the two parameters of a unitary qubit rotation. While asymptotic analysis predicts a precision advantage for SE over joint estimation (JE) in regimes where the quantum Fisher information matrix is near-singular ("sloppy" models), we demonstrate that this advantage is mitigated within a practical Bayesian framework with limited resources. We experimentally implement a SE protocol using polarisation qubits, achieving uncertainties close to the classical Van Trees bounds. However, comparing the total error to the ultimate quantum Van Trees bound for JE reveals that averaging over prior distributions erases the asymptotic SE advantage. Nevertheless, the stepwise strategy retains a significant practical benefit as it operates effectively with simple, fixed measurements, whereas saturating the JE bound typically requires complex, parameter-dependent operations.

Subjects: Quantum Physics , Data Analysis, Statistics and Probability

Publish: 2025-12-04 15:26:06 UTC


#4 The dynamical memory of tidal stellar streams: Joint inference of the Galactic potential and the progenitor of GD-1 with flow matching [PDF] [Copy] [Kimi] [REL]

Authors: Giuseppe Viterbo, Tobias Buck

Stellar streams offer one of the most sensitive probes of the Milky Way`s gravitational potential, as their phase-space morphology encodes both the tidal field of the host galaxy and the internal structure of their progenitors. In this work, we introduce a framework that leverages Flow Matching and Simulation-Based Inference (SBI) to jointly infer the parameters of the GD-1 progenitor and the global properties of the Milky Way potential. Our aim is to move beyond traditional techniques (e.g. orbit-fitting and action-angle methods) by constructing a fully Bayesian, likelihood-free posterior over both host-galaxy parameters and progenitor properties, thereby capturing the intrinsic coupling between tidal stripping dynamics and the underlying potential. To achieve this, we generate a large suite of mock GD-1-like streams using our differentiable N-body code \textsc{\texttt{Odisseo}}, sampling self-consistent initial conditions from a Plummer sphere and evolving them in a flexible Milky Way potential model. We then apply conditional Flow Matching to learn the vector field that transports a base Gaussian distribution into the posterior, enabling efficient, amortized inference directly from stream phase-space data. We demonstrate that our method successfully recovers the true parameters of a fiducial GD-1 simulation, producing well-calibrated posteriors and accurately reproducing parameter degeneracies arising from progenitor-host interactions. Flow Matching provides a powerful, flexible framework for Galactic Archaeology. Our approach enables joint inference on progenitor and Galactic parameters, capturing complex dependencies that are difficult to model with classical likelihood-based methods.

Subjects: Astrophysics of Galaxies , Classical Physics , Data Analysis, Statistics and Probability , Space Physics

Publish: 2025-12-04 09:21:35 UTC


#5 Reliable Statistical Guarantees for Conformal Predictors with Small Datasets [PDF1] [Copy] [Kimi] [REL]

Authors: Miguel Sánchez-Domínguez, Lucas Lacasa, Javier de Vicente, Gonzalo Rubio, Eusebio Valero

Surrogate models (including deep neural networks and other machine learning algorithms in supervised learning) are capable of approximating arbitrarily complex, high-dimensional input-output problems in science and engineering, but require a thorough data-agnostic uncertainty quantification analysis before these can be deployed for any safety-critical application. The standard approach for data-agnostic uncertainty quantification is to use conformal prediction (CP), a well-established framework to build uncertainty models with proven statistical guarantees that do not assume any shape for the error distribution of the surrogate model. However, since the classic statistical guarantee offered by CP is given in terms of bounds for the marginal coverage, for small calibration set sizes (which are frequent in realistic surrogate modelling that aims to quantify error at different regions), the potentially strong dispersion of the coverage distribution around its average negatively impacts the reliability of the uncertainty model, often obtaining coverages below the expected value, resulting in a less applicable framework. After providing a gentle presentation of uncertainty quantification for surrogate models for machine learning practitioners, in this paper we bridge the gap by proposing a new statistical guarantee that offers probabilistic information for the coverage of a single conformal predictor. We show that the proposed framework converges to the standard solution offered by CP for large calibration set sizes and, unlike the classic guarantee, still offers reliable information about the coverage of a conformal predictor for small data sizes. We illustrate and validate the methodology in a suite of examples, and implement an open access software solution that can be used alongside common conformal prediction libraries to obtain uncertainty models that fulfil the new guarantee.

Subjects: Machine Learning , Data Analysis, Statistics and Probability , Machine Learning

Publish: 2025-12-04 08:29:17 UTC


#6 Resummed Distribution Functions: Making Perturbation Theory Positive and Normalized [PDF] [Copy] [Kimi] [REL]

Authors: Rikab Gambhir, Radha Mastandrea

Fixed-order perturbative calculations for differential cross sections can suffer from non-physical artifacts: they can be non-positive, non-normalizable, and non-finite, none of which occur in experimental measurements. We propose a framework, the Resummed Distribution Function (RDF), that, given a perturbative calculation for an observable to some finite order in $α_s$, will ``resum'' the expression in a way that is guaranteed to match the original expression order-by-order and be positive, normalized, and finite. Moreover, our ansatz parameterizes all possible finite, positive, and normalized completions consistent with the original fixed-order expression, which can include N$^n$LL resummed expressions. The RDF also enables a more direct notion of perturbative uncertainties, as we can directly vary higher-order parameters and treat them as nuisance parameters. We demonstrate the power of the RDF ansatz by matching to thrust to $\mathcal{O}(α_s^3)$ and extracting $α_s$ with perturbative uncertainties from fitting the RDF to ALEPH data.

Subjects: High Energy Physics - Phenomenology , High Energy Physics - Experiment , Data Analysis, Statistics and Probability

Publish: 2025-12-03 19:00:02 UTC


#7 Enhancing next token prediction based pre-training for jet foundation models [PDF] [Copy] [Kimi] [REL]

Authors: Joschka Birk, Anna Hallin, Gregor Kasieczka, Nikol Madzharova, Ian Pang, David Shih

Next token prediction is an attractive pre-training task for jet foundation models, in that it is simulation free and enables excellent generative capabilities that can transfer across datasets. Here we study multiple improvements to next token prediction, building on the initial work of OmniJet-$α$. Instead of tokenizing particles and subsequently only using the token-ID as the model input for both the generative and the classification task, we adopt a hybrid setup, which allows us to use continuous feature vectors as model input while only using token-IDs in the next token prediction target. Secondly, we explore a combined pre-training strategy that combines masked particle modeling and generative learning objectives. Taken together, these changes greatly improve the performance in downstream classification tasks without any loss in generative performance.

Subjects: High Energy Physics - Phenomenology , Machine Learning , High Energy Physics - Experiment , Data Analysis, Statistics and Probability

Publish: 2025-12-03 19:00:00 UTC


#8 Quasi-stationarity of the Dyson Brownian Motion With Collisions [PDF] [Copy] [Kimi] [REL]

Authors: LMBP Arnaud Guillin, LMBP Boris Nectoux, LMBP Liming Wu

In this work, we investigate the ergodic behavior of a system of particules, subject to collisions, before it exits a fixed subdomain of its state space. This system is composed of several one-dimensional ordered Brownian particules in interaction with electrostatic repulsions, which is usually referred as the (generalized) Dyson Brownian motion. The starting points of our analysis are the work [E. Cépa and D. Lépingle, 1997 Probab. Theory Relat. Fields] which provides existence and uniqueness of such a system subject to collisions via the theory of multivalued SDEs and a Krein-Rutman type theorem derived in [A. Guillin, B. Nectoux, L. Wu, 2020 J. Eur. Math. Soc.].

Subjects: Probability , Mathematical Physics

Publish: 2025-04-11 11:56:32 UTC