Processing math: 2%

Machine Learning

2025-04-07 | | Total: 19

#1 Operator Learning: A Statistical Perspective [PDF] [Copy] [Kimi] [REL]

Authors: Unique Subedi, Ambuj Tewari

Operator learning has emerged as a powerful tool in scientific computing for approximating mappings between infinite-dimensional function spaces. A primary application of operator learning is the development of surrogate models for the solution operators of partial differential equations (PDEs). These methods can also be used to develop black-box simulators to model system behavior from experimental data, even without a known mathematical model. In this article, we begin by formalizing operator learning as a function-to-function regression problem and review some recent developments in the field. We also discuss PDE-specific operator learning, outlining strategies for incorporating physical and mathematical constraints into architecture design and training processes. Finally, we end by highlighting key future directions such as active data collection and the development of rigorous uncertainty quantification frameworks.

Subjects: Machine Learning , Machine Learning

Publish: 2025-04-04 14:58:45 UTC


#2 Conditioning Diffusions Using Malliavin Calculus [PDF] [Copy] [Kimi] [REL]

Authors: Jakiw Pidstrigach, Elizabeth Baker, Carles Domingo-Enrich, George Deligiannidis, Nikolas Nüsken

In stochastic optimal control and conditional generative modelling, a central computational task is to modify a reference diffusion process to maximise a given terminal-time reward. Most existing methods require this reward to be differentiable, using gradients to steer the diffusion towards favourable outcomes. However, in many practical settings, like diffusion bridges, the reward is singular, taking an infinite value if the target is hit and zero otherwise. We introduce a novel framework, based on Malliavin calculus and path-space integration by parts, that enables the development of methods robust to such singular rewards. This allows our approach to handle a broad range of applications, including classification, diffusion bridges, and conditioning without the need for artificial observational noise. We demonstrate that our approach offers stable and reliable training, outperforming existing techniques.

Subjects: Machine Learning , Machine Learning , Probability

Publish: 2025-04-04 14:10:21 UTC


#3 Block Toeplitz Sparse Precision Matrix Estimation for Large-Scale Interval-Valued Time Series Forecasting [PDF] [Copy] [Kimi] [REL]

Authors: Wan Tian, Zhongfeng Qin

Modeling and forecasting interval-valued time series (ITS) have attracted considerable attention due to their growing presence in various contexts. To the best of our knowledge, there have been no efforts to model large-scale ITS. In this paper, we propose a feature extraction procedure for large-scale ITS, which involves key steps such as auto-segmentation and clustering, and feature transfer learning. This procedure can be seamlessly integrated with any suitable prediction models for forecasting purposes. Specifically, we transform the automatic segmentation and clustering of ITS into the estimation of Toeplitz sparse precision matrices and assignment set. The majorization-minimization algorithm is employed to convert this highly non-convex optimization problem into two subproblems. We derive efficient dynamic programming and alternating direction method to solve these two subproblems alternately and establish their convergence properties. By employing the Joint Recurrence Plot (JRP) to image subsequence and assigning a class label to each cluster, an image dataset is constructed. Then, an appropriate neural network is chosen to train on this image dataset and used to extract features for the next step of forecasting. Real data applications demonstrate that the proposed method can effectively obtain invariant representations of the raw data and enhance forecasting performance.

Subjects: Machine Learning , Machine Learning

Publish: 2025-04-04 09:57:05 UTC


#4 Adaptive Classification of Interval-Valued Time Series [PDF] [Copy] [Kimi] [REL]

Authors: Wan Tian, Zhongfeng Qin

In recent years, the modeling and analysis of interval-valued time series have garnered significant attention in the fields of econometrics and statistics. However, the existing literature primarily focuses on regression tasks while neglecting classification aspects. In this paper, we propose an adaptive approach for interval-valued time series classification. Specifically, we represent interval-valued time series using convex combinations of upper and lower bounds of intervals and transform these representations into images based on point-valued time series imaging methods. We utilize a fine-grained image classification neural network to classify these images, to achieve the goal of classifying the original interval-valued time series. This proposed method is applicable to both univariate and multivariate interval-valued time series. On the optimization front, we treat the convex combination coefficients as learnable parameters similar to the parameters of the neural network and provide an efficient estimation method based on the alternating direction method of multipliers (ADMM). On the theoretical front, under specific conditions, we establish a margin-based multiclass generalization bound for generic CNNs composed of basic blocks involving convolution, pooling, and fully connected layers. Through simulation studies and real data applications, we validate the effectiveness of the proposed method and compare its performance against a wide range of point-valued time series classification methods.

Subjects: Machine Learning , Machine Learning

Publish: 2025-04-04 09:52:40 UTC


#5 Bayesian Optimization of Robustness Measures Using Randomized GP-UCB-based Algorithms under Input Uncertainty [PDF] [Copy] [Kimi] [REL]

Author: Yu Inatsu

Bayesian optimization based on Gaussian process upper confidence bound (GP-UCB) has a theoretical guarantee for optimizing black-box functions. Black-box functions often have input uncertainty, but even in this case, GP-UCB can be extended to optimize evaluation measures called robustness measures. However, GP-UCB-based methods for robustness measures include a trade-off parameter β, which must be excessively large to achieve theoretical validity, just like the original GP-UCB. In this study, we propose a new method called randomized robustness measure GP-UCB (RRGP-UCB), which samples the trade-off parameter \beta from a probability distribution based on a chi-squared distribution and avoids explicitly specifying \beta. The expected value of \beta is not excessively large. Furthermore, we show that RRGP-UCB provides tight bounds on the expected value of regret based on the optimal solution and estimated solutions. Finally, we demonstrate the usefulness of the proposed method through numerical experiments.

Subjects: Machine Learning , Machine Learning

Publish: 2025-04-04 05:01:54 UTC


#6 Accelerating Particle-based Energetic Variational Inference [PDF] [Copy] [Kimi] [REL]

Authors: Xuelian Bao, Lulu Kang, Chun Liu, Yiwei Wang

In this work, we propose a novel particle-based variational inference (ParVI) method that accelerates the EVI-Im. Inspired by energy quadratization (EQ) and operator splitting techniques for gradient flows, our approach efficiently drives particles towards the target distribution. Unlike EVI-Im, which employs the implicit Euler method to solve variational-preserving particle dynamics for minimizing the KL divergence, derived using a "discretize-then-variational" approach, the proposed algorithm avoids repeated evaluation of inter-particle interaction terms, significantly reducing computational cost. The framework is also extensible to other gradient-based sampling techniques. Through several numerical experiments, we demonstrate that our method outperforms existing ParVI approaches in efficiency, robustness, and accuracy.

Subjects: Machine Learning , Machine Learning

Publish: 2025-04-04 04:31:19 UTC


#7 A computational transition for detecting multivariate shuffled linear regression by low-degree polynomials [PDF] [Copy] [Kimi] [REL]

Author: Zhangsong Li

In this paper, we study the problem of multivariate shuffled linear regression, where the correspondence between predictors and responses in a linear model is obfuscated by a latent permutation. Specifically, we investigate the model Y=\tfrac{1}{\sqrt{1+\sigma^2}}(\Pi_* X Q_* + \sigma Z), where X is an n*d standard Gaussian design matrix, Z is an n*m Gaussian noise matrix, \Pi_* is an unknown n*n permutation matrix, and Q_* is an unknown d*m on the Grassmanian manifold satisfying Q_*^{\top} Q_* = \mathbb I_m. Consider the hypothesis testing problem of distinguishing this model from the case where X and Y are independent Gaussian random matrices of sizes n*d and n*m, respectively. Our results reveal a phase transition phenomenon in the performance of low-degree polynomial algorithms for this task. (1) When m=o(d), we show that all degree-D polynomials fail to distinguish these two models even when \sigma=0, provided with D^4=o\big( \tfrac{d}{m} \big). (2) When m=d and \sigma=\omega(1), we show that all degree-D polynomials fail to distinguish these two models provided with D=o(\sigma). (3) When m=d and \sigma=o(1), we show that there exists a constant-degree polynomial that strongly distinguish these two models. These results establish a smooth transition in the effectiveness of low-degree polynomial algorithms for this problem, highlighting the interplay between the dimensions m and d, the noise level \sigma, and the computational complexity of the testing task.

Subjects: Machine Learning , Machine Learning , Probability , Statistics Theory

Publish: 2025-04-04 00:32:38 UTC


#8 High-dimensional ridge regression with random features for non-identically distributed data with a variance profile [PDF] [Copy] [Kimi] [REL]

Authors: Issa-Mbenard Dabo, Jérémie Bigot

The behavior of the random feature model in the high-dimensional regression framework has become a popular issue of interest in the machine learning literature}. This model is generally considered for feature vectors x_i = \Sigma^{1/2} x_i', where x_i' is a random vector made of independent and identically distributed (iid) entries, and \Sigma is a positive definite matrix representing the covariance of the features. In this paper, we move beyond {\CB this standard assumption by studying the performances of the random features model in the setting of non-iid feature vectors}. Our approach is related to the analysis of the spectrum of large random matrices through random matrix theory (RMT) {\CB and free probability} results. We turn to the analysis of non-iid data by using the notion of variance profile {\CB which} is {\CB well studied in RMT.} Our main contribution is then the study of the limits of the training and {\CB prediction} risks associated to the ridge estimator in the random features model when its dimensions grow. We provide asymptotic equivalents of these risks that capture the behavior of ridge regression with random features in a {\CB high-dimensional} framework. These asymptotic equivalents, {\CB which prove to be sharp in numerical experiments}, are retrieved by adapting, to our setting, established results from operator-valued free probability theory. Moreover, {\CB for various classes of random feature vectors that have not been considered so far in the literature}, our approach allows to show the appearance of the double descent phenomenon when the ridge regularization parameter is small enough.

Subjects: Machine Learning , Machine Learning , Probability , Statistics Theory , Methodology

Publish: 2025-04-03 21:20:08 UTC


#9 ConfEviSurrogate: A Conformalized Evidential Surrogate Model for Uncertainty Quantification [PDF] [Copy] [Kimi] [REL]

Authors: Yuhan Duan, Xin Zhao, Neng Shi, Han-Wei Shen

Surrogate models, crucial for approximating complex simulation data across sciences, inherently carry uncertainties that range from simulation noise to model prediction errors. Without rigorous uncertainty quantification, predictions become unreliable and hence hinder analysis. While methods like Monte Carlo dropout and ensemble models exist, they are often costly, fail to isolate uncertainty types, and lack guaranteed coverage in prediction intervals. To address this, we introduce ConfEviSurrogate, a novel Conformalized Evidential Surrogate Model that can efficiently learn high-order evidential distributions, directly predict simulation outcomes, separate uncertainty sources, and provide prediction intervals. A conformal prediction-based calibration step further enhances interval reliability to ensure coverage and improve efficiency. Our ConfEviSurrogate demonstrates accurate predictions and robust uncertainty estimates in diverse simulations, including cosmology, ocean dynamics, and fluid dynamics.

Subjects: Machine Learning , Graphics , Machine Learning

Publish: 2025-04-03 15:44:14 UTC


#10 Stochastic Optimization with Optimal Importance Sampling [PDF1] [Copy] [Kimi] [REL]

Authors: Liviu Aolaritei, Bart P. G. Van Parys, Henry Lam, Michael I. Jordan

Importance Sampling (IS) is a widely used variance reduction technique for enhancing the efficiency of Monte Carlo methods, particularly in rare-event simulation and related applications. Despite its power, the performance of IS is often highly sensitive to the choice of the proposal distribution and frequently requires stochastic calibration techniques. While the design and analysis of IS have been extensively studied in estimation settings, applying IS within stochastic optimization introduces a unique challenge: the decision and the IS distribution are mutually dependent, creating a circular optimization structure. This interdependence complicates both the analysis of convergence for decision iterates and the efficiency of the IS scheme. In this paper, we propose an iterative gradient-based algorithm that jointly updates the decision variable and the IS distribution without requiring time-scale separation between the two. Our method achieves the lowest possible asymptotic variance and guarantees global convergence under convexity of the objective and mild assumptions on the IS distribution family. Furthermore, we show that these properties are preserved under linear constraints by incorporating a recent variant of Nesterov's dual averaging method.

Subjects: Optimization and Control , Machine Learning , Statistics Theory , Machine Learning

Publish: 2025-04-04 16:10:18 UTC


#11 Adaptive sparse variational approximations for Gaussian process regression [PDF] [Copy] [Kimi] [REL]

Authors: Dennis Nieman, Botond Szabó

Accurate tuning of hyperparameters is crucial to ensure that models can generalise effectively across different settings. In this paper, we present theoretical guarantees for hyperparameter selection using variational Bayes in the nonparametric regression model. We construct a variational approximation to a hierarchical Bayes procedure, and derive upper bounds for the contraction rate of the variational posterior in an abstract setting. The theory is applied to various Gaussian process priors and variational classes, resulting in minimax optimal rates. Our theoretical results are accompanied with numerical analysis both on synthetic and real world data sets.

Subjects: Statistics Theory , Machine Learning

Publish: 2025-04-04 09:57:00 UTC


#12 Weak instrumental variables due to nonlinearities in panel data: A Super Learner Control Function estimator [PDF] [Copy] [Kimi] [REL]

Author: Monika Avila Marquez

A triangular structural panel data model with additive separable individual-specific effects is used to model the causal effect of a covariate on an outcome variable when there are unobservable confounders with some of them time-invariant. In this setup, a linear reduced-form equation might be problematic when the conditional mean of the endogenous covariate and the instrumental variables is nonlinear. The reason is that ignoring the nonlinearity could lead to weak instruments As a solution, we propose a triangular simultaneous equation model for panel data with additive separable individual-specific fixed effects composed of a linear structural equation with a nonlinear reduced form equation. The parameter of interest is the structural parameter of the endogenous variable. The identification of this parameter is obtained under the assumption of available exclusion restrictions and using a control function approach. Estimating the parameter of interest is done using an estimator that we call Super Learner Control Function estimator (SLCFE). The estimation procedure is composed of two main steps and sample splitting. We estimate the control function using a super learner using sample splitting. In the following step, we use the estimated control function to control for endogeneity in the structural equation. Sample splitting is done across the individual dimension. We perform a Monte Carlo simulation to test the performance of the estimators proposed. We conclude that the Super Learner Control Function Estimators significantly outperform Within 2SLS estimators.

Subjects: Econometrics , Machine Learning

Publish: 2025-04-04 07:22:18 UTC


#13 The Ground Cost for Optimal Transport of Angular Velocity [PDF] [Copy] [Kimi] [REL]

Authors: Karthik Elamvazhuthi, Abhishek Halder

We revisit the optimal transport problem over angular velocity dynamics given by the controlled Euler equation. The solution of this problem enables stochastic guidance of spin states of a rigid body (e.g., spacecraft) over hard deadline constraint by transferring a given initial state statistics to a desired terminal state statistics. This is an instance of generalized optimal transport over a nonlinear dynamical system. While prior work has reported existence-uniqueness and numerical solution of this dynamical optimal transport problem, here we present structural results about the equivalent Kantorovich a.k.a. optimal coupling formulation. Specifically, we focus on deriving the ground cost for the associated Kantorovich optimal coupling formulation. The ground cost equals to the cost of transporting unit amount of mass from a specific realization of the initial or source joint probability measure to a realization of the terminal or target joint probability measure, and determines the Kantorovich formulation. Finding the ground cost leads to solving a structured deterministic nonlinear optimal control problem, which is shown to be amenable to an analysis technique pioneered by Athans et. al. We show that such techniques have broader applicability in determining the ground cost (thus Kantorovich formulation) for a class of generalized optimal mass transport problems involving nonlinear dynamics with translated norm-invariant drift.

Subjects: Optimization and Control , Machine Learning , Systems and Control , Machine Learning

Publish: 2025-04-04 05:38:00 UTC


#14 Safe Screening Rules for Group OWL Models [PDF] [Copy] [Kimi] [REL]

Authors: Runxue Bao, Quanchao Lu, Yanfu Zhang

Group Ordered Weighted L_{1}-Norm (Group OWL) regularized models have emerged as a useful procedure for high-dimensional sparse multi-task learning with correlated features. Proximal gradient methods are used as standard approaches to solving Group OWL models. However, Group OWL models usually suffer huge computational costs and memory usage when the feature size is large in the high-dimensional scenario. To address this challenge, in this paper, we are the first to propose the safe screening rule for Group OWL models by effectively tackling the structured non-separable penalty, which can quickly identify the inactive features that have zero coefficients across all the tasks. Thus, by removing the inactive features during the training process, we may achieve substantial computational gain and memory savings. More importantly, the proposed screening rule can be directly integrated with the existing solvers both in the batch and stochastic settings. Theoretically, we prove our screening rule is safe and also can be safely applied to the existing iterative optimization algorithms. Our experimental results demonstrate that our screening rule can effectively identify the inactive features and leads to a significant computational speedup without any loss of accuracy.

Subjects: Machine Learning , Machine Learning

Publish: 2025-04-04 04:07:37 UTC


#15 From Observation to Orientation: an Adaptive Integer Programming Approach to Intervention Design [PDF] [Copy] [Kimi] [REL]

Authors: Abdelmonem Elrefaey, Rong Pan

Using both observational and experimental data, a causal discovery process can identify the causal relationships between variables. A unique adaptive intervention design paradigm is presented in this work, where causal directed acyclic graphs (DAGs) are for effectively recovered with practical budgetary considerations. In order to choose treatments that optimize information gain under these considerations, an iterative integer programming (IP) approach is proposed, which drastically reduces the number of experiments required. Simulations over a broad range of graph sizes and edge densities are used to assess the effectiveness of the suggested approach. Results show that the proposed adaptive IP approach achieves full causal graph recovery with fewer intervention iterations and variable manipulations than random intervention baselines, and it is also flexible enough to accommodate a variety of practical constraints.

Subjects: Machine Learning , Machine Learning

Publish: 2025-04-04 02:35:35 UTC


#16 Graph Network Modeling Techniques for Visualizing Human Mobility Patterns [PDF] [Copy] [Kimi] [REL]

Authors: Sinjini Mitra, Anuj Srivastava, Avipsa Roy, Pavan Turaga

Human mobility analysis at urban-scale requires models to represent the complex nature of human movements, which in turn are affected by accessibility to nearby points of interest, underlying socioeconomic factors of a place, and local transport choices for people living in a geographic region. In this work, we represent human mobility and the associated flow of movements as a grapyh. Graph-based approaches for mobility analysis are still in their early stages of adoption and are actively being researched. The challenges of graph-based mobility analysis are multifaceted - the lack of sufficiently high-quality data to represent flows at high spatial and teporal resolution whereas, limited computational resources to translate large voluments of mobility data into a network structure, and scaling issues inherent in graph models etc. The current study develops a methodology by embedding graphs into a continuous space, which alleviates issues related to fast graph matching, graph time-series modeling, and visualization of mobility dynamics. Through experiments, we demonstrate how mobility data collected from taxicab trajectories could be transformed into network structures and patterns of mobility flow changes, and can be used for downstream tasks reporting approx 40% decrease in error on average in matched graphs vs unmatched ones.

Subjects: Social and Information Networks , Artificial Intelligence , Machine Learning

Publish: 2025-04-04 02:21:44 UTC


#17 Graph Attention for Heterogeneous Graphs with Positional Encoding [PDF] [Copy] [Kimi] [REL]

Author: Nikhil Shivakumar Nayak

Graph Neural Networks (GNNs) have emerged as the de facto standard for modeling graph data, with attention mechanisms and transformers significantly enhancing their performance on graph-based tasks. Despite these advancements, the performance of GNNs on heterogeneous graphs often remains complex, with networks generally underperforming compared to their homogeneous counterparts. This work benchmarks various GNN architectures to identify the most effective methods for heterogeneous graphs, with a particular focus on node classification and link prediction. Our findings reveal that graph attention networks excel in these tasks. As a main contribution, we explore enhancements to these attention networks by integrating positional encodings for node embeddings. This involves utilizing the full Laplacian spectrum to accurately capture both the relative and absolute positions of each node within the graph, further enhancing performance on downstream tasks such as node classification and link prediction.

Subjects: Machine Learning , Artificial Intelligence , Discrete Mathematics , Differential Geometry , Machine Learning

Publish: 2025-04-03 18:00:02 UTC


#18 Comparative Analysis of Deepfake Detection Models: New Approaches and Perspectives [PDF] [Copy] [Kimi] [REL]

Author: Matheus Martins Batista

The growing threat posed by deepfake videos, capable of manipulating realities and disseminating misinformation, drives the urgent need for effective detection methods. This work investigates and compares different approaches for identifying deepfakes, focusing on the GenConViT model and its performance relative to other architectures present in the DeepfakeBenchmark. To contextualize the research, the social and legal impacts of deepfakes are addressed, as well as the technical fundamentals of their creation and detection, including digital image processing, machine learning, and artificial neural networks, with emphasis on Convolutional Neural Networks (CNNs), Generative Adversarial Networks (GANs), and Transformers. The performance evaluation of the models was conducted using relevant metrics and new datasets established in the literature, such as WildDeep-fake and DeepSpeak, aiming to identify the most effective tools in the battle against misinformation and media manipulation. The obtained results indicated that GenConViT, after fine-tuning, exhibited superior performance in terms of accuracy (93.82%) and generalization capacity, surpassing other architectures in the DeepfakeBenchmark on the DeepSpeak dataset. This study contributes to the advancement of deepfake detection techniques, offering contributions to the development of more robust and effective solutions against the dissemination of false information.

Subjects: Computer Vision and Pattern Recognition , Machine Learning , Computation , Machine Learning

Publish: 2025-04-03 02:10:27 UTC


#19 ModelRadar: Aspect-based Forecast Evaluation [PDF] [Copy] [Kimi1] [REL]

Authors: Vitor Cerqueira, Luis Roque, Carlos Soares

Accurate evaluation of forecasting models is essential for ensuring reliable predictions. Current practices for evaluating and comparing forecasting models focus on summarising performance into a single score, using metrics such as SMAPE. While convenient, averaging performance over all samples dilutes relevant information about model behavior under varying conditions. This limitation is especially problematic for time series forecasting, where multiple layers of averaging--across time steps, horizons, and multiple time series in a dataset--can mask relevant performance variations. We address this limitation by proposing ModelRadar, a framework for evaluating univariate time series forecasting models across multiple aspects, such as stationarity, presence of anomalies, or forecasting horizons. We demonstrate the advantages of this framework by comparing 24 forecasting methods, including classical approaches and different machine learning algorithms. NHITS, a state-of-the-art neural network architecture, performs best overall but its superiority varies with forecasting conditions. For instance, concerning the forecasting horizon, we found that NHITS (and also other neural networks) only outperforms classical approaches for multi-step ahead forecasting. Another relevant insight is that classical approaches such as ETS or Theta are notably more robust in the presence of anomalies. These and other findings highlight the importance of aspect-based model evaluation for both practitioners and researchers. ModelRadar is available as a Python package.

Subject: Machine Learning

Publish: 2025-03-31 11:50:45 UTC