Processing math: 100%

Machine Learning

2025-04-03 | | Total: 12

#1 KD2M: An unifying framework for feature knowledge distillation [PDF] [Copy] [Kimi] [REL]

Author: Eduardo Fernandes Montesuma

Knowledge Distillation (KD) seeks to transfer the knowledge of a teacher, towards a student neural net. This process is often done by matching the networks' predictions (i.e., their output), but, recently several works have proposed to match the distributions of neural nets' activations (i.e., their features), a process known as \emph{distribution matching}. In this paper, we propose an unifying framework, Knowledge Distillation through Distribution Matching (KD2M), which formalizes this strategy. Our contributions are threefold. We i) provide an overview of distribution metrics used in distribution matching, ii) benchmark on computer vision datasets, and iii) derive new theoretical results for KD.

Subjects: Machine Learning , Machine Learning

Publish: 2025-04-02 14:14:46 UTC


#2 Sparse Gaussian Neural Processes [PDF] [Copy] [Kimi] [REL]

Authors: Tommy Rochussen, Vincent Fortuin

Despite significant recent advances in probabilistic meta-learning, it is common for practitioners to avoid using deep learning models due to a comparative lack of interpretability. Instead, many practitioners simply use non-meta-models such as Gaussian processes with interpretable priors, and conduct the tedious procedure of training their model from scratch for each task they encounter. While this is justifiable for tasks with a limited number of data points, the cubic computational cost of exact Gaussian process inference renders this prohibitive when each task has many observations. To remedy this, we introduce a family of models that meta-learn sparse Gaussian process inference. Not only does this enable rapid prediction on new tasks with sparse Gaussian processes, but since our models have clear interpretations as members of the neural process family, it also allows manual elicitation of priors in a neural process for the first time. In meta-learning regimes for which the number of observed tasks is small or for which expert domain knowledge is available, this offers a crucial advantage.

Subjects: Machine Learning , Machine Learning

Publish: 2025-04-02 12:00:09 UTC


#3 Density estimation via mixture discrepancy and moments [PDF] [Copy] [Kimi] [REL]

Authors: Zhengyang Lei, Sihong Shao

With the aim of generalizing histogram statistics to higher dimensional cases, density estimation via discrepancy based sequential partition (DSP) has been proposed [D. Li, K. Yang, W. Wong, Advances in Neural Information Processing Systems (2016) 1099-1107] to learn an adaptive piecewise constant approximation defined on a binary sequential partition of the underlying domain, where the star discrepancy is adopted to measure the uniformity of particle distribution. However, the calculation of the star discrepancy is NP-hard and it does not satisfy the reflection invariance and rotation invariance either. To this end, we use the mixture discrepancy and the comparison of moments as a replacement of the star discrepancy, leading to the density estimation via mixture discrepancy based sequential partition (DSP-mix) and density estimation via moments based sequential partition (MSP), respectively. Both DSP-mix and MSP are computationally tractable and exhibit the reflection and rotation invariance. Numerical experiments in reconstructing the d-D mixture of Gaussians and Betas with d=2,3,,6 demonstrate that DSP-mix and MSP both run approximately ten times faster than DSP while maintaining the same accuracy.

Subjects: Machine Learning , Machine Learning , Computational Physics , Methodology

Publish: 2025-04-02 10:15:03 UTC


#4 Denoising guarantees for optimized sampling schemes in compressed sensing [PDF] [Copy] [Kimi] [REL]

Authors: Yaniv Plan, Matthew S. Scott, Xia Sheng, Ozgur Yilmaz

Compressed sensing with subsampled unitary matrices benefits from \emph{optimized} sampling schemes, which feature improved theoretical guarantees and empirical performance relative to uniform subsampling. We provide, in a first of its kind in compressed sensing, theoretical guarantees showing that the error caused by the measurement noise vanishes with an increasing number of measurements for optimized sampling schemes, assuming that the noise is Gaussian. We moreover provide similar guarantees for measurements sampled with-replacement with arbitrary probability weights. All our results hold on prior sets contained in a union of low-dimensional subspaces. Finally, we demonstrate that this denoising behavior appears in empirical experiments with a rate that closely matches our theoretical guarantees when the prior set is the range of a generative ReLU neural network and when it is the set of sparse vectors.

Subjects: Machine Learning , Information Theory , Machine Learning , Signal Processing , Probability

Publish: 2025-04-01 02:04:03 UTC


#5 Estimating Unbounded Density Ratios: Applications in Error Control under Covariate Shift [PDF] [Copy] [Kimi] [REL]

Authors: Shuntuo Xu, Zhou Yu, Jian Huang

The density ratio is an important metric for evaluating the relative likelihood of two probability distributions, with extensive applications in statistics and machine learning. However, existing estimation theories for density ratios often depend on stringent regularity conditions, mainly focusing on density ratio functions with bounded domains and ranges. In this paper, we study density ratio estimators using loss functions based on least squares and logistic regression. We establish upper bounds on estimation errors with standard minimax optimal rates, up to logarithmic factors. Our results accommodate density ratio functions with unbounded domains and ranges. We apply our results to nonparametric regression and conditional flow models under covariate shift and identify the tail properties of the density ratio as crucial for error control across domains affected by covariate shift. We provide sufficient conditions under which loss correction is unnecessary and demonstrate effective generalization capabilities of a source estimator to any suitable target domain. Our simulation experiments support these theoretical findings, indicating that the source estimator can outperform those derived from loss correction methods, even when the true density ratio is known.

Subjects: Machine Learning , Machine Learning

Publish: 2025-03-29 11:35:39 UTC


#6 Fair Sufficient Representation Learning [PDF] [Copy] [Kimi] [REL]

Authors: Xueyu Zhou, Chun Yin IP, Jian Huang

The main objective of fair statistical modeling and machine learning is to minimize or eliminate biases that may arise from the data or the model itself, ensuring that predictions and decisions are not unjustly influenced by sensitive attributes such as race, gender, age, or other protected characteristics. In this paper, we introduce a Fair Sufficient Representation Learning (FSRL) method that balances sufficiency and fairness. Sufficiency ensures that the representation should capture all necessary information about the target variables, while fairness requires that the learned representation remains independent of sensitive attributes. FSRL is based on a convex combination of an objective function for learning a sufficient representation and an objective function that ensures fairness. Our approach manages fairness and sufficiency at the representation level, offering a novel perspective on fair representation learning. We implement this method using distance covariance, which is effective for characterizing independence between random variables. We establish the convergence properties of the learned representations. Experiments conducted on healthcase and text datasets with diverse structures demonstrate that FSRL achieves a superior trade-off between fairness and accuracy compared to existing approaches.

Subjects: Machine Learning , Machine Learning

Publish: 2025-03-29 10:37:49 UTC


#7 A Unified Approach to Analysis and Design of Denoising Markov Models [PDF] [Copy] [Kimi] [REL]

Authors: Yinuo Ren, Grant M. Rotskoff, Lexing Ying

Probabilistic generative models based on measure transport, such as diffusion and flow-based models, are often formulated in the language of Markovian stochastic dynamics, where the choice of the underlying process impacts both algorithmic design choices and theoretical analysis. In this paper, we aim to establish a rigorous mathematical foundation for denoising Markov models, a broad class of generative models that postulate a forward process transitioning from the target distribution to a simple, easy-to-sample distribution, alongside a backward process particularly constructed to enable efficient sampling in the reverse direction. Leveraging deep connections with nonequilibrium statistical mechanics and generalized Doob's h-transform, we propose a minimal set of assumptions that ensure: (1) explicit construction of the backward generator, (2) a unified variational objective directly minimizing the measure transport discrepancy, and (3) adaptations of the classical score-matching approach across diverse dynamics. Our framework unifies existing formulations of continuous and discrete diffusion models, identifies the most general form of denoising Markov models under certain regularity assumptions on forward generators, and provides a systematic recipe for designing denoising Markov models driven by arbitrary Lévy-type processes. We illustrate the versatility and practical effectiveness of our approach through novel denoising Markov models employing geometric Brownian motion and jump processes as forward dynamics, highlighting the framework's potential flexibility and capability in modeling complex distributions.

Subjects: Machine Learning , Numerical Analysis , Machine Learning

Publish: 2025-04-02 17:46:43 UTC


#8 Client Selection in Federated Learning with Data Heterogeneity and Network Latencies [PDF] [Copy] [Kimi] [REL]

Authors: Harsh Vardhan, Xiaofan Yu, Tajana Rosing, Arya Mazumdar

Federated learning (FL) is a distributed machine learning paradigm where multiple clients conduct local training based on their private data, then the updated models are sent to a central server for global aggregation. The practical convergence of FL is challenged by multiple factors, with the primary hurdle being the heterogeneity among clients. This heterogeneity manifests as data heterogeneity concerning local data distribution and latency heterogeneity during model transmission to the server. While prior research has introduced various efficient client selection methods to alleviate the negative impacts of either of these heterogeneities individually, efficient methods to handle real-world settings where both these heterogeneities exist simultaneously do not exist. In this paper, we propose two novel theoretically optimal client selection schemes that can handle both these heterogeneities. Our methods involve solving simple optimization problems every round obtained by minimizing the theoretical runtime to convergence. Empirical evaluations on 9 datasets with non-iid data distributions, 2 practical delay distributions, and non-convex neural network models demonstrate that our algorithms are at least competitive to and at most 20 times better than best existing baselines.

Subjects: Machine Learning , Machine Learning

Publish: 2025-04-02 17:31:15 UTC


#9 Barrier Certificates for Unknown Systems with Latent States and Polynomial Dynamics using Bayesian Inference [PDF] [Copy] [Kimi] [REL]

Authors: Robert Lefringhausen, Sami Leon Noel Aziz Hanna, Elias August, Sandra Hirche

Certifying safety in dynamical systems is crucial, but barrier certificates - widely used to verify that system trajectories remain within a safe region - typically require explicit system models. When dynamics are unknown, data-driven methods can be used instead, yet obtaining a valid certificate requires rigorous uncertainty quantification. For this purpose, existing methods usually rely on full-state measurements, limiting their applicability. This paper proposes a novel approach for synthesizing barrier certificates for unknown systems with latent states and polynomial dynamics. A Bayesian framework is employed, where a prior in state-space representation is updated using input-output data via a targeted marginal Metropolis-Hastings sampler. The resulting samples are used to construct a candidate barrier certificate through a sum-of-squares program. It is shown that if the candidate satisfies the required conditions on a test set of additional samples, it is also valid for the true, unknown system with high probability. The approach and its probabilistic guarantees are illustrated through a numerical simulation.

Subjects: Systems and Control , Machine Learning , Machine Learning

Publish: 2025-04-02 15:12:34 UTC


#10 Proper scoring rules for estimation and forecast evaluation [PDF] [Copy] [Kimi] [REL]

Authors: Kartik Waghmare, Johanna Ziegel

Proper scoring rules have been a subject of growing interest in recent years, not only as tools for evaluation of probabilistic forecasts but also as methods for estimating probability distributions. In this article, we review the mathematical foundations of proper scoring rules including general characterization results and important families of scoring rules. We discuss their role in statistics and machine learning for estimation and forecast evaluation. Furthermore, we comment on interesting developments of their usage in applications.

Subjects: Statistics Theory , Machine Learning

Publish: 2025-04-02 14:46:14 UTC


#11 Time-to-event prediction for grouped variables using Exclusive Lasso [PDF] [Copy] [Kimi] [REL]

Authors: Dayasri Ravi, Andreas Groll

The integration of high-dimensional genomic data and clinical data into time-to-event prediction models has gained significant attention due to the growing availability of these datasets. Traditionally, a Cox regression model is employed, concatenating various covariate types linearly. Given that much of the data may be redundant or irrelevant, feature selection through penalization is often desirable. A notable characteristic of these datasets is their organization into blocks of distinct data types, such as methylation and clinical predictors, which requires selecting a subset of covariates from each group due to high intra-group correlations. For this reason, we propose utilizing Exclusive Lasso regularization in place of standard Lasso penalization. We apply our methodology to a real-life cancer dataset, demonstrating enhanced survival prediction performance compared to the conventional Cox regression model.

Subjects: Methodology , Computation , Machine Learning

Publish: 2025-04-02 09:07:05 UTC


#12 Identifying Obfuscated Code through Graph-Based Semantic Analysis of Binary Code [PDF] [Copy] [Kimi] [REL]

Authors: Roxane Cohen, Robin David, Florian Yger, Fabrice Rossi

Protecting sensitive program content is a critical issue in various situations, ranging from legitimate use cases to unethical contexts. Obfuscation is one of the most used techniques to ensure such protection. Consequently, attackers must first detect and characterize obfuscation before launching any attack against it. This paper investigates the problem of function-level obfuscation detection using graph-based approaches, comparing algorithms, from elementary baselines to promising techniques like GNN (Graph Neural Networks), on different feature choices. We consider various obfuscation types and obfuscators, resulting in two complex datasets. Our findings demonstrate that GNNs need meaningful features that capture aspects of function semantics to outperform baselines. Our approach shows satisfactory results, especially in a challenging 11-class classification task and in a practical malware analysis example.

Subjects: Cryptography and Security , Machine Learning , Machine Learning

Publish: 2025-04-02 08:36:27 UTC