https://papers.cool/arxiv/statStatistics2024-09-18T00:00:00+00:00python-feedgenCool Papers - Immersive Paper Discoveryhttps://papers.cool/arxiv/2311.03851An AFM-based approach for quantification of guest particle deformation during mechano-fusion2024-09-18T00:00:00+00:00Phillip GräfensteinerJudith FriebelLisa DitscherleinOrkun FuratUrs A. PeukerVolker SchmidtDuring the mechano-fusion process for dry particle coating, (hetero)-aggregates are formed consisting of host particles which are coated by smaller guest particles. During this process, the latter are exposed to intense particle-particle interactions and particle-wall impacts, which lead to deformation of the guest particles original shape. These deformations on the nano- and microscale can heavily influence the effective macroscopic properties of the resutling coated particles. We present a method to quantify the shape deformation of guest particles during mechano-fusion based on measurements acquired by atomic force microscopy before and after mechano-fusion. To this end, we reconstruct the 3D shape of guest particles by means of an ellipsoidal fit, constrained by the known volume of the guest particles. Using these reconstructed shapes, we can quantify the degree of deformation by comparing the aspect ratios of the ellipsoidal fits before and after mechano-fusion. Such a quantification enhances the understanding of how process-related parameters influence the geometric descriptors of the involved particles, which in turn impact the overall macroscopic properties of the material.https://papers.cool/arxiv/2409.10559Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in Transformers2024-09-18T00:00:00+00:00Siyu ChenHeejune SheenTianhao WangZhuoran YangIn-context learning (ICL) is a cornerstone of large language model (LLM) functionality, yet its theoretical foundations remain elusive due to the complexity of transformer architectures. In particular, most existing work only theoretically explains how the attention mechanism facilitates ICL under certain data models. It remains unclear how the other building blocks of the transformer contribute to ICL. To address this question, we study how a two-attention-layer transformer is trained to perform ICL on $n$-gram Markov chain data, where each token in the Markov chain statistically depends on the previous $n$ tokens. We analyze a sophisticated transformer model featuring relative positional embedding, multi-head softmax attention, and a feed-forward layer with normalization. We prove that the gradient flow with respect to a cross-entropy ICL loss converges to a limiting model that performs a generalized version of the induction head mechanism with a learned feature, resulting from the congruous contribution of all the building blocks. In the limiting model, the first attention layer acts as a $\mathit{copier}$, copying past tokens within a given window to each position, and the feed-forward network with normalization acts as a $\mathit{selector}$ that generates a feature vector by only looking at informationally relevant parents from the window. Finally, the second attention layer is a $\mathit{classifier}$ that compares these features with the feature at the output position, and uses the resulting similarity scores to generate the desired output. Our theory is further validated by experiments.https://papers.cool/arxiv/2409.10580Veridical Data Science for Medical Foundation Models2024-09-18T00:00:00+00:00Ahmed AlaaBin YuThe advent of foundation models (FMs) such as large language models (LLMs) has led to a cultural shift in data science, both in medicine and beyond. This shift involves moving away from specialized predictive models trained for specific, well-defined domain questions to generalist FMs pre-trained on vast amounts of unstructured data, which can then be adapted to various clinical tasks and questions. As a result, the standard data science workflow in medicine has been fundamentally altered; the foundation model lifecycle (FMLC) now includes distinct upstream and downstream processes, in which computational resources, model and data access, and decision-making power are distributed among multiple stakeholders. At their core, FMs are fundamentally statistical models, and this new workflow challenges the principles of Veridical Data Science (VDS), hindering the rigorous statistical analysis expected in transparent and scientifically reproducible data science practices. We critically examine the medical FMLC in light of the core principles of VDS: predictability, computability, and stability (PCS), and explain how it deviates from the standard data science workflow. Finally, we propose recommendations for a reimagined medical FMLC that expands and refines the PCS principles for VDS including considering the computational and accessibility constraints inherent to FMs.https://papers.cool/arxiv/2409.10584Manifold-Constrained Nucleus-Level Denoising Diffusion Model for Structure-Based Drug Design2024-09-18T00:00:00+00:00Shengchao LiuDivin YanWeitao DuWeiyang LiuZhuoxinran LiHongyu GuoChristian BorgsJennifer ChayesAnima AnandkumarArtificial intelligence models have shown great potential in structure-based drug design, generating ligands with high binding affinities. However, existing models have often overlooked a crucial physical constraint: atoms must maintain a minimum pairwise distance to avoid separation violation, a phenomenon governed by the balance of attractive and repulsive forces. To mitigate such separation violations, we propose NucleusDiff. It models the interactions between atomic nuclei and their surrounding electron clouds by enforcing the distance constraint between the nuclei and manifolds. We quantitatively evaluate NucleusDiff using the CrossDocked2020 dataset and a COVID-19 therapeutic target, demonstrating that NucleusDiff reduces violation rate by up to 100.00% and enhances binding affinity by up to 22.16%, surpassing state-of-the-art models for structure-based drug design. We also provide qualitative analysis through manifold sampling, visually confirming the effectiveness of NucleusDiff in reducing separation violations and improving binding affinities.https://papers.cool/arxiv/2409.10673A Bayesian Interpretation of Adaptive Low-Rank Adaptation2024-09-18T00:00:00+00:00Haolin ChenPhilip N. GarnerMotivated by the sensitivity-based importance score of the adaptive low-rank adaptation (AdaLoRA), we utilize more theoretically supported metrics, including the signal-to-noise ratio (SNR), along with the Improved Variational Online Newton (IVON) optimizer, for adaptive parameter budget allocation. The resulting Bayesian counterpart not only has matched or surpassed the performance of using the sensitivity-based importance metric but is also a faster alternative to AdaLoRA with Adam. Our theoretical analysis reveals a significant connection between the two metrics, providing a Bayesian perspective on the efficacy of sensitivity as an importance score. Furthermore, our findings suggest that the magnitude, rather than the variance, is the primary indicator of the importance of parameters.https://papers.cool/arxiv/2409.10773Tight Lower Bounds under Asymmetric High-Order Hölder Smoothness and Uniform Convexity2024-09-18T00:00:00+00:00Site BaiBrian BullinsIn this paper, we provide tight lower bounds for the oracle complexity of minimizing high-order Hölder smooth and uniformly convex functions. Specifically, for a function whose $p^{th}$-order derivatives are Hölder continuous with degree $\nu$ and parameter $H$, and that is uniformly convex with degree $q$ and parameter $\sigma$, we focus on two asymmetric cases: (1) $q > p + \nu$, and (2) $q < p+\nu$. Given up to $p^{th}$-order oracle access, we establish worst-case oracle complexities of $\Omega\left( \left( \frac{H}{\sigma}\right)^\frac{2}{3(p+\nu)-2}\left( \frac{\sigma}{\epsilon}\right)^\frac{2(q-p-\nu)}{q(3(p+\nu)-2)}\right)$ with a truncated-Gaussian smoothed hard function in the first case and $\Omega\left(\left(\frac{H}{\sigma}\right)^\frac{2}{3(p+\nu)-2}+ \log^2\left(\frac{\sigma^{p+\nu}}{H^q}\right)^\frac{1}{p+\nu-q}\right)$ in the second case, for reaching an $\epsilon$-approximate solution in terms of the optimality gap. Our analysis generalizes previous lower bounds for functions under first- and second-order smoothness as well as those for uniformly convex functions, and furthermore our results match the corresponding upper bounds in the general setting.https://papers.cool/arxiv/2409.11008Latent mixed-effect models for high-dimensional longitudinal data2024-09-18T00:00:00+00:00Priscilla OngManuel HaußmannOtto LönnrothHarri LähdesmäkiModelling longitudinal data is an important yet challenging task. These datasets can be high-dimensional, contain non-linear effects and time-varying covariates. Gaussian process (GP) prior-based variational autoencoders (VAEs) have emerged as a promising approach due to their ability to model time-series data. However, they are costly to train and struggle to fully exploit the rich covariates characteristic of longitudinal data, making them difficult for practitioners to use effectively. In this work, we leverage linear mixed models (LMMs) and amortized variational inference to provide conditional priors for VAEs, and propose LMM-VAE, a scalable, interpretable and identifiable model. We highlight theoretical connections between it and GP-based techniques, providing a unified framework for this class of methods. Our proposal performs competitively compared to existing approaches across simulated and real-world datasets.https://papers.cool/arxiv/2409.11080Data-driven stochastic 3D modeling of the nanoporous binder-conductive additive phase in battery cathodes2024-09-18T00:00:00+00:00Phillip GräfensteinerMarkus OsenbergAndré HilgerNicole BohnJoachim R. BinderIngo MankeVolker SchmidtMatthias NeumannA stochastic 3D modeling approach for the nanoporous binder-conductive additive phase in hierarchically structured cathodes of lithium-ion batteries is presented. The binder-conductive additive phase of these electrodes consists of carbon black, polyvinylidene difluoride binder and graphite particles. For its stochastic 3D modeling, a three-step procedure based on methods from stochastic geometry is used. First, the graphite particles are described by a Boolean model with ellipsoidal grains. Second, the mixture of carbon black and binder is modeled by an excursion set of a Gaussian random field in the complement of the graphite particles. Third, large pore regions within the mixture of carbon black and binder are described by a Boolean model with spherical grains. The model parameters are calibrated to 3D image data of cathodes in lithium-ion batteries acquired by focused ion beam scanning electron microscopy. Subsequently, model validation is performed by comparing model realizations with measured image data in terms of various morphological descriptors that are not used for model fitting. Finally, we use the stochastic 3D model for predictive simulations, where we generate virtual, yet realistic, image data of nanoporous binder-conductive additives with varying amounts of graphite particles. Based on these virtual nanostructures, we can investigate structure-property relationships. In particular, we quantitatively study the influence of graphite particles on effective transport properties in the nanoporous binder-conductive additive phase, which have a crucial impact on electrochemical processes in the cathode and thus on the performance of battery cells.https://papers.cool/arxiv/2409.11100Fractional Naive Bayes (FNB): non-convex optimization for a parsimonious weighted selective naive Bayes classifier2024-09-18T00:00:00+00:00Carine HueMarc BoulléWe study supervised classification for datasets with a very large number of input variables. The naïve Bayes classifier is attractive for its simplicity, scalability and effectiveness in many real data applications. When the strong naïve Bayes assumption of conditional independence of the input variables given the target variable is not valid, variable selection and model averaging are two common ways to improve the performance. In the case of the naïve Bayes classifier, the resulting weighting scheme on the models reduces to a weighting scheme on the variables. Here we focus on direct estimation of variable weights in such a weighted naïve Bayes classifier. We propose a sparse regularization of the model log-likelihood, which takes into account prior penalization costs related to each input variable. Compared to averaging based classifiers used up until now, our main goal is to obtain parsimonious robust models with less variables and equivalent performance. The direct estimation of the variable weights amounts to a non-convex optimization problem for which we propose and compare several two-stage algorithms. First, the criterion obtained by convex relaxation is minimized using several variants of standard gradient methods. Then, the initial non-convex optimization problem is solved using local optimization methods initialized with the result of the first stage. The various proposed algorithms result in optimization-based weighted naïve Bayes classifiers, that are evaluated on benchmark datasets and positioned w.r.t. to a reference averaging-based classifier.https://papers.cool/arxiv/2409.11141Sample Complexity Bounds for Linear System Identification from a Finite Set2024-09-18T00:00:00+00:00Nicolas ChatzikiriakosAndrea IannelliThis paper considers a finite sample perspective on the problem of identifying an LTI system from a finite set of possible systems using trajectory data. To this end, we use the maximum likelihood estimator to identify the true system and provide an upper bound for its sample complexity. Crucially, the derived bound does not rely on a potentially restrictive stability assumption. Additionally, we leverage tools from information theory to provide a lower bound to the sample complexity that holds independently of the used estimator. The derived sample complexity bounds are analyzed analytically and numerically.https://papers.cool/arxiv/2409.11381Edge spectra of Gaussian random symmetric matrices with correlated entries2024-09-18T00:00:00+00:00Debapratim BanerjeeSoumendu Sundar MukherjeeDipranjan PalWe study the largest eigenvalue of a Gaussian random symmetric matrix $X_n$, with zero-mean, unit variance entries satisfying the condition $\sup_{(i, j) \ne (i', j')}|\mathbb{E}[X_{ij} X_{i'j'}]| = O(n^{-(1 + \varepsilon)})$, where $\varepsilon > 0$. It follows from Catalano et al. (2024) that the empirical spectral distribution of $n^{-1/2} X_n$ converges weakly almost surely to the standard semi-circle law. Using a Füredi-Komlós-type high moment analysis, we show that the largest eigenvalue $\lambda_1(n^{-1/2} X_n)$ of $n^{-1/2} X_n$ converges almost surely to $2$. This result is essentially optimal in the sense that one cannot take $\varepsilon = 0$ and still obtain an almost sure limit of $2$. We also derive Gaussian fluctuation results for the largest eigenvalue in the case where the entries have a common non-zero mean. Let $Y_n = X_n + \frac{\lambda}{\sqrt{n}}\mathbf{1} \mathbf{1}^\top$. When $\varepsilon \ge 1$ and $\lambda \gg n^{1/4}$, we show that \[ n^{1/2}\bigg(\lambda_1(n^{-1/2} Y_n) - \lambda - \frac{1}{\lambda}\bigg) \xrightarrow{d} \sqrt{2} Z, \] where $Z$ is a standard Gaussian. On the other hand, when $0 < \varepsilon < 1$, we have $\mathrm{Var}(\frac{1}{n}\sum_{i, j}X_{ij}) = O(n^{1 - \varepsilon})$. Assuming that $\mathrm{Var}(\frac{1}{n}\sum_{i, j} X_{ij}) = \sigma^2 n^{1 - \varepsilon} (1 + o(1))$, if $\lambda \gg n^{\varepsilon/4}$, then we have \[ n^{\varepsilon/2}\bigg(\lambda_1(n^{-1/2} Y_n) - \lambda - \frac{1}{\lambda}\bigg) \xrightarrow{d} \sigma Z. \] While the ranges of $\lambda$ in these fluctuation results are certainly not optimal, a striking aspect is that different scalings are required in the two regimes $0 < \varepsilon < 1$ and $\varepsilon \ge 1$.https://papers.cool/arxiv/2409.11384Large Deviations Principle for Bures-Wasserstein Barycenters2024-09-18T00:00:00+00:00Adam Quinn JaffeLeonardo V. SantoroWe prove the large deviations principle for empirical Bures-Wasserstein barycenters of independent, identically-distributed samples of covariance matrices and covariance operators. As an application, we explore some consequences of our results for the phenomenon of dimension-free concentration of measure for Bures-Wasserstein barycenters. Our theory reveals a novel notion of exponential tilting in the Bures-Wasserstein space, which, in analogy with Crámer's theorem in the Euclidean case, solves the relative entropy projection problem under a constraint on the barycenter. Notably, this method of proof is easy to adapt to other geometric settings of interest; with the same method, we obtain large deviations principles for empirical barycenters in Riemannian manifolds and the univariate Wasserstein space, and we obtain large deviations upper bounds for empirical barycenters in the general multivariate Wasserstein space. In fact, our results are the first known large deviations principles for Fréchet means in any non-linear metric space.https://papers.cool/arxiv/2409.10538Fairness in Survival Analysis with Distributionally Robust Optimization2024-09-18T00:00:00+00:00Shu HuGeorge H. ChenWe propose a general approach for encouraging fairness in survival analysis models based on minimizing a worst-case error across all subpopulations that occur with at least a user-specified probability. This approach can be used to convert many existing survival analysis models into ones that simultaneously encourage fairness, without requiring the user to specify which attributes or features to treat as sensitive in the training loss function. From a technical standpoint, our approach applies recent developments of distributionally robust optimization (DRO) to survival analysis. The complication is that existing DRO theory uses a training loss function that decomposes across contributions of individual data points, i.e., any term that shows up in the loss function depends only on a single training point. This decomposition does not hold for commonly used survival loss functions, including for the Cox proportional hazards model, its deep neural network variants, and many other recently developed models that use loss functions involving ranking or similarity score calculations. We address this technical hurdle using a sample splitting strategy. We demonstrate our sample splitting DRO approach by using it to create fair versions of a diverse set of existing survival analysis models including the Cox model (and its deep variant DeepSurv), the discrete-time model DeepHit, and the neural ODE model SODEN. We also establish a finite-sample theoretical guarantee to show what our sample splitting DRO loss converges to. For the Cox model, we further derive an exact DRO approach that does not use sample splitting. For all the models that we convert into DRO variants, we show that the DRO variants often score better on recently established fairness metrics (without incurring a significant drop in accuracy) compared to existing survival analysis fairness regularization techniques.https://papers.cool/arxiv/2409.10572A clustering adaptive Gaussian process regression method: response patterns based real-time prediction for nonlinear solid mechanics problems2024-09-18T00:00:00+00:00Ming-Jian LiYanping LianZhanshan ChengLehui LiZhidong WangRuxin GaoDaining FangNumerical simulation is powerful to study nonlinear solid mechanics problems. However, mesh-based or particle-based numerical methods suffer from the common shortcoming of being time-consuming, particularly for complex problems with real-time analysis requirements. This study presents a clustering adaptive Gaussian process regression (CAG) method aiming for real-time prediction for nonlinear structural responses in solid mechanics. It is a data-driven machine learning method featuring a small sample size, high accuracy, and high efficiency, leveraging nonlinear structural response patterns. Similar to the traditional Gaussian process regression (GPR) method, it operates in offline and online stages. In the offline stage, an adaptive sample generation technique is introduced to cluster datasets into distinct patterns for demand-driven sample allocation. This ensures comprehensive coverage of the critical samples for the solution space of interest. In the online stage, following the divide-and-conquer strategy, a pre-prediction classification categorizes problems into predefined patterns sequentially predicted by the trained multi-pattern Gaussian process regressor. In addition, dimension reduction and restoration techniques are employed in the proposed method to enhance its efficiency. A set of problems involving material, geometric, and boundary condition nonlinearities is presented to demonstrate the CAG method's abilities. The proposed method can offer predictions within a second and attain high precision with only about 20 samples within the context of this study, outperforming the traditional GPR using uniformly distributed samples for error reductions ranging from 1 to 3 orders of magnitude. The CAG method is expected to offer a powerful tool for real-time prediction of nonlinear solid mechanical problems and shed light on the complex nonlinear structural response pattern.https://papers.cool/arxiv/2409.10591Variance Residual Life Ageing Intensity Function2024-09-18T00:00:00+00:00Ashutosh SinghQuantitative measurement of ageing across systems and components is crucial for accurately assessing reliability and predicting failure probabilities. This measurement supports effective maintenance scheduling, performance optimisation, and cost management. Examining the ageing characteristics of a system that operates beyond a specified time $t > 0$ yields valuable insights. This paper introduces a novel metric for ageing, termed the Variance Residual Life Ageing Intensity (VRLAI) function, and explores its properties across various probability distributions. Additionally, we characterise the closure properties of the two ageing classes defined by the VRLAI function. We propose a new ordering, called the Variance Residual Life Ageing Intensity (VRLAI) ordering, and discuss its various properties. Furthermore, we examine the closure of the VRLAI order under coherent systems.https://papers.cool/arxiv/2409.10678Learning with Sparsely Permuted Data: A Robust Bayesian Approach2024-09-18T00:00:00+00:00Abhisek ChakrabortySaptati DattaData dispersed across multiple files are commonly integrated through probabilistic linkage methods, where even minimal error rates in record matching can significantly contaminate subsequent statistical analyses. In regression problems, we examine scenarios where the identifiers of predictors or responses are subject to an unknown permutation, challenging the assumption of correspondence. Many emerging approaches in the literature focus on sparsely permuted data, where only a small subset of pairs ($k << n$) are affected by the permutation, treating these permuted entries as outliers to restore original correspondence and obtain consistent estimates of regression parameters. In this article, we complement the existing literature by introducing a novel generalized robust Bayesian formulation of the problem. We develop an efficient posterior sampling scheme by adapting the fractional posterior framework and addressing key computational bottlenecks via careful use of discrete optimal transport and sampling in the space of binary matrices with fixed margins. Further, we establish new posterior contraction results within this framework, providing theoretical guarantees for our approach. The utility of the proposed framework is demonstrated via extensive numerical experiments.https://papers.cool/arxiv/2409.10771Flexible survival regression with variable selection for heterogeneous population2024-09-18T00:00:00+00:00Abhishek MandalAbhisek ChakrabortySurvival regression is widely used to model time-to-events data, to explore how covariates may influence the occurrence of events. Modern datasets often encompass a vast number of covariates across many subjects, with only a subset of the covariates significantly affecting survival. Additionally, subjects often belong to an unknown number of latent groups, where covariate effects on survival differ significantly across groups. The proposed methodology addresses both challenges by simultaneously identifying the latent sub-groups in the heterogeneous population and evaluating covariate significance within each sub-group. This approach is shown to enhance the predictive accuracy for time-to-event outcomes, via uncovering varying risk profiles within the underlying heterogeneous population and is thereby helpful to device targeted disease management strategies.https://papers.cool/arxiv/2409.10812Statistical Inference for Chi-square Statistics or F-Statistics Based on Multiple Imputation2024-09-18T00:00:00+00:00Binhuan WangYixin FangMan JinMissing data is a common issue in medical, psychiatry, and social studies. In literature, Multiple Imputation (MI) was proposed to multiply impute datasets and combine analysis results from imputed datasets for statistical inference using Rubin's rule. However, Rubin's rule only works for combined inference on statistical tests with point and variance estimates and is not applicable to combine general F-statistics or Chi-square statistics. In this manuscript, we provide a solution to combine F-test statistics from multiply imputed datasets, when the F-statistic has an explicit fractional form (that is, both the numerator and denominator of the F-statistic are reported). Then we extend the method to combine Chi-square statistics from multiply imputed datasets. Furthermore, we develop methods for two commonly applied F-tests, Welch's ANOVA and Type-III tests of fixed effects in mixed effects models, which do not have the explicit fractional form. SAS macros are also developed to facilitate applications.https://papers.cool/arxiv/2409.10835BMRMM: An R Package for Bayesian Markov (Renewal) Mixed Models2024-09-18T00:00:00+00:00Yutong WuAbhra SarkarWe introduce the BMRMM package implementing Bayesian inference for a class of Markov renewal mixed models which can characterize the stochastic dynamics of a collection of sequences, each comprising alternative instances of categorical states and associated continuous duration times, while being influenced by a set of exogenous factors as well as a 'random' individual. The default setting flexibly models the state transition probabilities using mixtures of Dirichlet distributions and the duration times using mixtures of gamma kernels while also allowing variable selection for both. Modeling such data using simpler Markov mixed models also remains an option, either by ignoring the duration times altogether or by replacing them with instances of an additional category obtained by discretizing them by a user-specified unit. The option is also useful when data on duration times may not be available in the first place. We demonstrate the package's utility using two data sets.https://papers.cool/arxiv/2409.10855Calibrated Multivariate Regression with Localized PIT Mappings2024-09-18T00:00:00+00:00Lucas KockG. S. RodriguesScott A. SissonNadja KleinDavid J. NottCalibration ensures that predicted uncertainties align with observed uncertainties. While there is an extensive literature on recalibration methods for univariate probabilistic forecasts, work on calibration for multivariate forecasts is much more limited. This paper introduces a novel post-hoc recalibration approach that addresses multivariate calibration for potentially misspecified models. Our method involves constructing local mappings between vectors of marginal probability integral transform values and the space of observations, providing a flexible and model free solution applicable to continuous, discrete, and mixed responses. We present two versions of our approach: one uses K-nearest neighbors, and the other uses normalizing flows. Each method has its own strengths in different situations. We demonstrate the effectiveness of our approach on two real data applications: recalibrating a deep neural network's currency exchange rate forecast and improving a regression model for childhood malnutrition in India for which the multivariate response has both discrete and continuous components.https://papers.cool/arxiv/2409.10860Cointegrated Matrix Autoregression Models2024-09-18T00:00:00+00:00Zebang LiHan XiaoWe propose a novel cointegrated autoregressive model for matrix-valued time series, with bi-linear cointegrating vectors corresponding to the rows and columns of the matrix data. Compared to the traditional cointegration analysis, our proposed matrix cointegration model better preserves the inherent structure of the data and enables corresponding interpretations. To estimate the cointegrating vectors as well as other coefficients, we introduce two types of estimators based on least squares and maximum likelihood. We investigate the asymptotic properties of the cointegrated matrix autoregressive model under the existence of trend and establish the asymptotic distributions for the cointegrating vectors, as well as other model parameters. We conduct extensive simulations to demonstrate its superior performance over traditional methods. In addition, we apply our proposed model to Fama-French portfolios and develop a effective pairs trading strategy.https://papers.cool/arxiv/2409.10882Spatio-Temporal-Network Point Processes for Modeling Crime Events with Landmarks2024-09-18T00:00:00+00:00Zheng DongJorge MateuYao XieSelf-exciting point processes are widely used to model the contagious effects of crime events living within continuous geographic space, using their occurrence time and locations. However, in urban environments, most events are naturally constrained within the city's street network structure, and the contagious effects of crime are governed by such a network geography. Meanwhile, the complex distribution of urban infrastructures also plays an important role in shaping crime patterns across space. We introduce a novel spatio-temporal-network point process framework for crime modeling that integrates these urban environmental characteristics by incorporating self-attention graph neural networks. Our framework incorporates the street network structure as the underlying event space, where crime events can occur at random locations on the network edges. To realistically capture criminal movement patterns, distances between events are measured using street network distances. We then propose a new mark for a crime event by concatenating the event's crime category with the type of its nearby landmark, aiming to capture how the urban design influences the mixing structures of various crime types. A graph attention network architecture is adopted to learn the existence of mark-to-mark interactions. Extensive experiments on crime data from Valencia, Spain, demonstrate the effectiveness of our framework in understanding the crime landscape and forecasting crime risks across regions.https://papers.cool/arxiv/2409.10943Comparison of g-estimation approaches for handling symptomatic medication at multiple timepoints in Alzheimer's Disease with a hypothetical strategy2024-09-18T00:00:00+00:00Florian LaschLorenzo GuizzaroWen Wei LohFor handling intercurrent events in clinical trials, one of the strategies outlined in the ICH E9(R1) addendum targets the hypothetical scenario of non-occurrence of the intercurrent event. While this strategy is often implemented by setting data after the intercurrent event to missing even if they have been collected, g-estimation allows for a more efficient estimation by using the information contained in post-IE data. As the g-estimation methods have largely developed outside of randomised clinical trials, optimisations for the application in clinical trials are possible. In this work, we describe and investigate the performance of modifications to the established g-estimation methods, leveraging the assumption that some intercurrent events are expected to have the same impact on the outcome regardless of the timing of their occurrence. In a simulation study in Alzheimer disease, the modifications show a substantial efficiency advantage for the estimation of an estimand that applies the hypothetical strategy to the use of symptomatic treatment while retaining unbiasedness and adequate type I error control.https://papers.cool/arxiv/2409.10947Valid Credible Ellipsoids for Linear Functionals by a Renormalized Bernstein-von Mises Theorem2024-09-18T00:00:00+00:00Gustav RømerWe consider a semi-parametric Gaussian regression model, equipped with a high-dimensional Gaussian prior. We address the frequentist validity of posterior credible sets for a vector of linear functionals. We specify conditions for a 'renormalized' Bernstein-von Mises theorem (BvM), where the posterior, centered at its mean, and the posterior mean, centered at the ground truth, have the same normal approximation. This requires neither a solution to the information equation nor a $\sqrt{N}$-consistent estimator. We show that our renormalized BvM implies that a credible ellipsoid, specified by the mean and variance of the posterior, is an asymptotic confidence set. For a single linear functional, we identify such a credible ellipsoid with a symmetric credible interval around the posterior mean. We bound the diameter. We check the conditions for Darcy's problem, where the information equation has no solution in natural settings. For the Schrödinger problem, we recover an efficient semi-parametric BvM from our renormalized BvM.https://papers.cool/arxiv/2409.10972Towards Gaussian Process for operator learning: an uncertainty aware resolution independent operator learning algorithm for computational mechanics2024-09-18T00:00:00+00:00Sawan KumarRajdip NayekSouvik ChakrabortyThe growing demand for accurate, efficient, and scalable solutions in computational mechanics highlights the need for advanced operator learning algorithms that can efficiently handle large datasets while providing reliable uncertainty quantification. This paper introduces a novel Gaussian Process (GP) based neural operator for solving parametric differential equations. The approach proposed leverages the expressive capability of deterministic neural operators and the uncertainty awareness of conventional GP. In particular, we propose a ``neural operator-embedded kernel'' wherein the GP kernel is formulated in the latent space learned using a neural operator. Further, we exploit a stochastic dual descent (SDD) algorithm for simultaneously training the neural operator parameters and the GP hyperparameters. Our approach addresses the (a) resolution dependence and (b) cubic complexity of traditional GP models, allowing for input-resolution independence and scalability in high-dimensional and non-linear parametric systems, such as those encountered in computational mechanics. We apply our method to a range of non-linear parametric partial differential equations (PDEs) and demonstrate its superiority in both computational efficiency and accuracy compared to standard GP models and wavelet neural operators. Our experimental results highlight the efficacy of this framework in solving complex PDEs while maintaining robustness in uncertainty estimation, positioning it as a scalable and reliable operator-learning algorithm for computational mechanics.https://papers.cool/arxiv/2409.11040Estimation and imputation of missing data in longitudinal models with Zero-Inflated Poisson response variable2024-09-18T00:00:00+00:00D. S. Martinez-LoboO. O. MeloN. A. CruzThis research deals with the estimation and imputation of missing data in longitudinal models with a Poisson response variable inflated with zeros. A methodology is proposed that is based on the use of maximum likelihood, assuming that data is missing at random and that there is a correlation between the response variables. In each of the times, the expectation maximization (EM) algorithm is used: in step E, a weighted regression is carried out, conditioned on the previous times that are taken as covariates. In step M, the estimation and imputation of the missing data are performed. The good performance of the methodology in different loss scenarios is demonstrated in a simulation study comparing the model only with complete data, and estimating missing data using the mode of the data of each individual. Furthermore, in a study related to the growth of corn, it is tested on real data to develop the algorithm in a practical scenario.https://papers.cool/arxiv/2409.11053Functional Adaptive Huber Linear Regression2024-09-18T00:00:00+00:00Ling PengXiaohui LiuHeng LianRobust estimation has played an important role in statistical and machine learning. However, its applications to functional linear regression are still under-developed. In this paper, we focus on Huber's loss with a diverging robustness parameter which was previously used in parametric models. Compared to other robust methods such as median regression, the distinction is that the proposed method aims to estimate the conditional mean robustly, instead of estimating the conditional median. We only require $(1+\kappa)$-th moment assumption ($\kappa>0$) on the noise distribution, and the established error bounds match the optimal rate in the least-squares case as soon as $\kappa\ge 1$. We establish convergence rate in probability when the functional predictor has a finite 4-th moment, and finite-sample bound with exponential tail when the functional predictor is Gaussian, in terms of both prediction error and $L^2$ error. The results also extend to the case of functional estimation in a reproducing kernel Hilbert space (RKHS).https://papers.cool/arxiv/2409.11134E-Values for Exponential Families: the General Case2024-09-18T00:00:00+00:00Yunda HaoPeter GrünwaldWe analyze common types of e-variables and e-processes for composite exponential family nulls: the optimal e-variable based on the reverse information projection (RIPr), the conditional (COND) e-variable, and the universal inference (UI) and sequen\-tialized RIPr e-processes. We characterize the RIPr prior for simple and Bayes-mixture based alternatives, either precisely (for Gaussian nulls and alternatives) or in an approximate sense (general exponential families). We provide conditions under which the RIPr e-variable is (again exactly vs. approximately) equal to the COND e-variable. Based on these and other interrelations which we establish, we determine the e-power of the four e-statistics as a function of sample size, exactly for Gaussian and up to $o(1)$ in general. For $d$-dimensional null and alternative, the e-power of UI tends to be smaller by a term of $(d/2) \log n + O(1)$ than that of the COND e-variable, which is the clear winner.https://papers.cool/arxiv/2409.11162Chasing Shadows: How Implausible Assumptions Skew Our Understanding of Causal Estimands2024-09-18T00:00:00+00:00Stijn VansteelandtKelly Van LanckerThe ICH E9 (R1) addendum on estimands, coupled with recent advancements in causal inference, has prompted a shift towards using model-free treatment effect estimands that are more closely aligned with the underlying scientific question. This represents a departure from traditional, model-dependent approaches where the statistical model often overshadows the inquiry itself. While this shift is a positive development, it has unintentionally led to the prioritization of an estimand's theoretical appeal over its practical learnability from data under plausible assumptions. We illustrate this by scrutinizing assumptions in the recent clinical trials literature on principal stratum estimands, demonstrating that some popular assumptions are not only implausible but often inevitably violated. We advocate for a more balanced approach to estimand formulation, one that carefully considers both the scientific relevance and the practical feasibility of estimation under realistic conditions.https://papers.cool/arxiv/2409.11167Poisson and Gamma Model Marginalisation and Marginal Likelihood calculation using Moment-generating Functions2024-09-18T00:00:00+00:00Siyang LiDavid van DykMaximilian AutenriethWe present a new analytical method to derive the likelihood function that has the population of parameters marginalised out in Bayesian hierarchical models. This method is also useful to find the marginal likelihoods in Bayesian models or in random-effect linear mixed models. The key to this method is to take high-order (sometimes fractional) derivatives of the prior moment-generating function if particular existence and differentiability conditions hold. In particular, this analytical method assumes that the likelihood is either Poisson or gamma. Under Poisson likelihoods, the observed Poisson count determines the order of the derivative. Under gamma likelihoods, the shape parameter, which is assumed to be known, determines the order of the fractional derivative. We also present some examples validating this new analytical method.https://papers.cool/arxiv/2409.11265Performance of Cross-Validated Targeted Maximum Likelihood Estimation2024-09-18T00:00:00+00:00Matthew J. SmithRachael V. PhillipsCamille MaringeMiguel Angel Luque FernandezBackground: Advanced methods for causal inference, such as targeted maximum likelihood estimation (TMLE), require certain conditions for statistical inference. However, in situations where there is not differentiability due to data sparsity or near-positivity violations, the Donsker class condition is violated. In such situations, TMLE variance can suffer from inflation of the type I error and poor coverage, leading to conservative confidence intervals. Cross-validation of the TMLE algorithm (CVTMLE) has been suggested to improve on performance compared to TMLE in settings of positivity or Donsker class violations. We aim to investigate the performance of CVTMLE compared to TMLE in various settings. Methods: We utilised the data-generating mechanism as described in Leger et al. (2022) to run a Monte Carlo experiment under different Donsker class violations. Then, we evaluated the respective statistical performances of TMLE and CVTMLE with different super learner libraries, with and without regression tree methods. Results: We found that CVTMLE vastly improves confidence interval coverage without adversely affecting bias, particularly in settings with small sample sizes and near-positivity violations. Furthermore, incorporating regression trees using standard TMLE with ensemble super learner-based initial estimates increases bias and variance leading to invalid statistical inference. Conclusions: It has been shown that when using CVTMLE the Donsker class condition is no longer necessary to obtain valid statistical inference when using regression trees and under either data sparsity or near-positivity violations. We show through simulations that CVTMLE is much less sensitive to the choice of the super learner library and thereby provides better estimation and inference in cases where the super learner library uses more flexible candidates and is prone to overfitting.https://papers.cool/arxiv/2409.11269Testing for racial bias using inconsistent perceptions of race2024-09-18T00:00:00+00:00Nora GeraEmma PiersonTests for racial bias commonly assess whether two people of different races are treated differently. A fundamental challenge is that, because two people may differ in many ways, factors besides race might explain differences in treatment. Here, we propose a test for bias which circumvents the difficulty of comparing two people by instead assessing whether the $\textit{same person}$ is treated differently when their race is perceived differently. We apply our method to test for bias in police traffic stops, finding that the same driver is likelier to be searched or arrested by police when they are perceived as Hispanic than when they are perceived as white. Our test is broadly applicable to other datasets where race, gender, or other identity data are perceived rather than self-reported, and the same person is observed multiple times.https://papers.cool/arxiv/2409.11327Learning Unstable Continuous-Time Stochastic Linear Control Systems2024-09-18T00:00:00+00:00Reza Sadeghi HafshejaniMohamad Kazem Shirani FradonbehWe study the problem of system identification for stochastic continuous-time dynamics, based on a single finite-length state trajectory. We present a method for estimating the possibly unstable open-loop matrix by employing properly randomized control inputs. Then, we establish theoretical performance guarantees showing that the estimation error decays with trajectory length, a measure of excitability, and the signal-to-noise ratio, while it grows with dimension. Numerical illustrations that showcase the rates of learning the dynamics, will be provided as well. To perform the theoretical analysis, we develop new technical tools that are of independent interest. That includes non-asymptotic stochastic bounds for highly non-stationary martingales and generalized laws of iterated logarithms, among others.https://papers.cool/arxiv/2409.11341Leveraging Connected Vehicle Data for Near-Crash Detection and Analysis in Urban Environments2024-09-18T00:00:00+00:00Xinyu LiDayongWuXinyue YeQuan SunUrban traffic safety is a pressing concern in modern transportation systems, especially in rapidly growing metropolitan areas where increased traffic congestion, complex road networks, and diverse driving behaviors exacerbate the risk of traffic incidents. Traditional traffic crash data analysis offers valuable insights but often overlooks a broader range of road safety risks. Near-crash events, which occur more frequently and signal potential collisions, provide a more comprehensive perspective on traffic safety. However, city-scale analysis of near-crash events remains limited due to the significant challenges in large-scale real-world data collection, processing, and analysis. This study utilizes one month of connected vehicle data, comprising billions of records, to detect and analyze near-crash events across the road network in the City of San Antonio, Texas. We propose an efficient framework integrating spatial-temporal buffering and heading algorithms to accurately identify and map near-crash events. A binary logistic regression model is employed to assess the influence of road geometry, traffic volume, and vehicle types on near-crash risks. Additionally, we examine spatial and temporal patterns, including variations by time of day, day of the week, and road category. The findings of this study show that the vehicles on more than half of road segments will be involved in at least one near-crash event. In addition, more than 50% near-crash events involved vehicles traveling at speeds over 57.98 mph, and many occurred at short distances between vehicles. The analysis also found that wider roadbeds and multiple lanes reduced near-crash risks, while single-unit trucks slightly increased the likelihood of near-crash events. Finally, the spatial-temporal analysis revealed that near-crash risks were most prominent during weekday peak hours, especially in downtown areas.https://papers.cool/arxiv/2409.11385Probability-scale residuals for event-time data2024-09-18T00:00:00+00:00Eric S. KawaguchiBryan E. ShepherdChun LiThe probability-scale residual (PSR) is defined as $E\{sign(y, Y^*)\}$, where $y$ is the observed outcome and $Y^*$ is a random variable from the fitted distribution. The PSR is particularly useful for ordinal and censored outcomes for which fitted values are not available without additional assumptions. Previous work has defined the PSR for continuous, binary, ordinal, right-censored, and current status outcomes; however, development of the PSR has not yet been considered for data subject to general interval censoring. We develop extensions of the PSR, first to mixed-case interval-censored data, and then to data subject to several types of common censoring schemes. We derive the statistical properties of the PSR and show that our more general PSR encompasses several previously defined PSR for continuous and censored outcomes as special cases. The performance of the residual is illustrated in real data from the Caribbean, Central, and South American Network for HIV Epidemiology.