Data Analysis, Statistics and Probability

2024-10-22 | | Total: 6

#1 Machine Learning-Powered Data Cleaning for LEGEND [PDF] [Copy] [Kimi] [REL]

Authors: E. León ; A. Li ; M. A. Bahena Schott ; B. Bos ; M. Busch ; J. R. Chapman ; G. L. Duran ; J. Gruszko ; R. Henning ; E. L. Martin ; J. F. Wilkerson

Neutrinoless double-beta decay ($0\nu\beta\beta$) is a rare nuclear process that, if observed, will provide insight into the nature of neutrinos and help explain the matter-antimatter asymmetry in the universe. The Large Enriched Germanium Experiment for Neutrinoless Double-Beta Decay (LEGEND) will operate in two phases to search for $0\nu\beta\beta$. The first (second) stage will employ 200 (1000) kg of High-Purity Germanium (HPGe) enriched in $^{76}$Ge to achieve a half-life sensitivity of 10$^{27}$ (10$^{28}$) years. In this study, we present a semi-supervised data-driven approach to remove non-physical events captured by HPGe detectors powered by a novel artificial intelligence model. We utilize Affinity Propagation to cluster waveform signals based on their shape and a Support Vector Machine to classify them into different categories. We train, optimize, test our model on data taken from a natural abundance HPGe detector installed in the Full Chain Test experimental stand at the University of North Carolina at Chapel Hill. We demonstrate that our model yields a maximum physics event sacrifice of $0.024 ^{+0.004}_{-0.003} \%$ when performing data cleaning cuts. Our model is being used to accelerate data cleaning development for LEGEND-200.

Subjects: Data Analysis, Statistics and Probability ; Nuclear Experiment ; Instrumentation and Detectors

Publish: 2024-10-05 16:40:34 UTC

#2 Deep Multimodal Representation Learning for Stellar Spectra [PDF] [Copy] [Kimi] [REL]

Authors: Tobias Buck ; Christian Schwarz

Recently, contrastive learning (CL), a technique most prominently used in natural language and computer vision, has been used to train informative representation spaces for galaxy spectra and images in a self-supervised manner. Following this idea, we implement CL for stars in the Milky Way, for which recent astronomical surveys have produced a huge amount of heterogeneous data. Specifically, we investigate Gaia XP coefficients and RVS spectra. Thus, the methods presented in this work lay the foundation for aggregating the knowledge implicitly contained in the multimodal data to enable downstream tasks like cross-modal generation or fused stellar parameter estimation. We find that CL results in a highly structured representation space that exhibits explicit physical meaning. Evaluating Using this representation space to perform cross-modal generation and stellar label regression results in excellent performance with high-quality generated samples as well as accurate and precise label predictions.

Subjects: Solar and Stellar Astrophysics ; Astrophysics of Galaxies ; Instrumentation and Methods for Astrophysics ; Computational Physics ; Data Analysis, Statistics and Probability

Publish: 2024-10-21 15:00:32 UTC

#3 Latency correction in sparse neuronal spike trains with overlapping global events [PDF] [Copy] [Kimi] [REL]

Authors: Arturo Mariani ; Federico Senocrate ; Jason Mikiel-Hunter ; David McAlpine ; Barbara Beiderbeck ; Michael Pecka ; Kevin Lin ; Thomas Kreuz

Background: In Kreuz et al., J Neurosci Methods 381, 109703 (2022) two methods were proposed that perform latency correction, i.e., optimize the spike time alignment of sparse neuronal spike trains with well defined global spiking events. The first one based on direct shifts is fast but uses only partial latency information, while the other one makes use of the full information but relies on the computationally costly simulated annealing. Both methods reach their limits and can become unreliable when successive global events are not sufficiently separated or even overlap. New Method: Here we propose an iterative scheme that combines the advantages of the two original methods by using in each step as much of the latency information as possible and by employing a very fast extrapolation direct shift method instead of the much slower simulated annealing. Results: We illustrate the effectiveness and the improved performance, measured in terms of the relative shift error, of the new iterative scheme not only on simulated data with known ground truths but also on single-unit recordings from two medial superior olive neurons of a gerbil. Comparison with Existing Method(s): The iterative scheme outperforms the existing approaches on both the simulated and the experimental data. Due to its low computational demands, and in contrast to simulated annealing, it can also be applied to very large datasets. Conclusions: The new method generalizes and improves on the original method both in terms of accuracy and speed. Importantly, it is the only method that allows to disentangle global events with overlap.

Subjects: Neurons and Cognition ; Biological Physics ; Data Analysis, Statistics and Probability ; Medical Physics ; Applications

Publish: 2024-10-19 07:21:20 UTC

#4 Martingale drift of Langevin dynamics and classical canonical spin statistics -- II [PDF] [Copy] [Kimi] [REL]

Author: Ken Sekimoto

In the previous paper we have shown analytically that, if the drift function of the d-dimensional Langevin equation is the Langevin function with a properly chosen scale factor, then the evolution of the drift function is a martingale associated with the histories generated by the very Langevin equation. Moreover, we numerically demonstrated that those generated histories from a common initial data become asymptotically ballistic, whose orientations obey the classical canonical spin statistics under the external field corresponding to the initial data. In the present paper we provide with an analytical explanation of the latter numerical finding by introducing a martingale in the spin functional space. In a specific context the present result elucidates a new physical aspect of martingale theory.

Subjects: Statistical Mechanics ; Data Analysis, Statistics and Probability

Publish: 2024-10-19 05:16:11 UTC

#5 Enhancing Precision of Signal Correction in PVES Experiments: The Impact of Bayesian Analysis on the Results of the QWeak and MOLLER Experiments [PDF] [Copy] [Kimi] [REL]

Authors: Elham Gorgannejad ; Wouter Deconinck ; David S. Armstrong

The precise measurement of parity-violating asymmetries in parity-violating electron scattering experiments is a powerful tool for probing new physics beyond the Standard Model. Achieving the expected precision requires both experimental and post-processing signal corrections. This includes using auxiliary detectors to distinguish the main signal from background signals and implementing post-measurement corrections, such as the Bayesian statistics method, to address uncontrolled factors during the experiments. Asymmetry values in the scattering of electrons off proton targets in QWeak and P2 and off electron targets in MOLLER are influenced by detector array configurations, beam polarization angles, and beam spin variations. The Bayesian framework refines full probabilistic models to account for all necessary factors, thereby extracting asymmetry values and the underlying physics under specified conditions. For the QWeak experiment, a reanalysis of the inelastic asymmetry measurement using the Bayesian method has yielded a closer fit to measured asymmetries, with uncertainties reduced by 40\% compared to the Monte Carlo minimization method. This approach was successfully applied to simulated data for the MOLLER experiment and is predicted to be similarly effective in P2.

Subjects: High Energy Physics - Phenomenology ; Nuclear Experiment ; Data Analysis, Statistics and Probability

Publish: 2024-10-18 16:59:06 UTC

#6 The Typicality of Regimes Associated with Northern Hemisphere Heatwaves [PDF] [Copy] [Kimi] [REL]

Authors: Christopher C. Chapman ; Didier P. Monselesan ; James S. Risbey ; Abdelwaheb Hannachi ; Valerio Lucarini ; Richard Matear

We study the hemispheric to continental scale regimes that lead to summertime heatwaves in the Northern Hemisphere. By using a powerful data mining methodology - archetype analysis - we identify characteristic spatial patterns consisting of a blocking high pressure systems embedded within a meandering upper atmosphere circulation that is longitudinally modulated by coherent Rossby Wave Packets. Periods when these atmospheric regimes are strongly expressed correspond to large increases in the likelihood of extreme surface temperature. Most strikingly, these regimes are shown to be typical of surface extremes and frequently reoccur. Three well publicised heatwaves are studied in detail - the June-July 2003 western European heatwave, the August 2010 "Russian" heatwave, and the June 2021 "Heatdome" event across western North America, and are shown to be driven by blocking high pressure systems linked to stalled Rossby Wave Packets. We discuss the implications of our work for long-range prediction or early warning, climate model assessment and post-event diagnosis.

Subject: Atmospheric and Oceanic Physics

Publish: 2024-10-16 02:48:43 UTC