Computer Science

2026-01-19 | | Total: 404

#1 UniX: Unifying Autoregression and Diffusion for Chest X-Ray Understanding and Generation [PDF20] [Copy] [Kimi10] [REL]

Authors: Ruiheng Zhang, Jingfeng Yao, Huangxuan Zhao, Hao Yan, Xiao He, Lei Chen, Zhou Wei, Yong Luo, Zengmao Wang, Lefei Zhang, Dacheng Tao, Bo Du

Despite recent progress, medical foundation models still struggle to unify visual understanding and generation, as these tasks have inherently conflicting goals: semantic abstraction versus pixel-level reconstruction. Existing approaches, typically based on parameter-shared autoregressive architectures, frequently lead to compromised performance in one or both tasks. To address this, we present UniX, a next-generation unified medical foundation model for chest X-ray understanding and generation. UniX decouples the two tasks into an autoregressive branch for understanding and a diffusion branch for high-fidelity generation. Crucially, a cross-modal self-attention mechanism is introduced to dynamically guide the generation process with understanding features. Coupled with a rigorous data cleaning pipeline and a multi-stage training strategy, this architecture enables synergistic collaboration between tasks while leveraging the strengths of diffusion models for superior generation. On two representative benchmarks, UniX achieves a 46.1% improvement in understanding performance (Micro-F1) and a 24.2% gain in generation quality (FD-RadDino), using only a quarter of the parameters of LLM-CXR. By achieving performance on par with task-specific models, our work establishes a scalable paradigm for synergistic medical image understanding and generation. Codes and models are available at https://github.com/ZrH42/UniX.

Subject: Computer Vision and Pattern Recognition

Publish: 2026-01-16 18:59:58 UTC


#2 Empirical Coordination over Markov Channel with Independent Source [PDF] [Copy] [Kimi] [REL]

Authors: Mengyuan Zhao, Maël Le Treust, Tobias J. Oechtering

We study joint source-channel coding over Markov channels through the empirical coordination framework. More specifically, we aim at determining the empirical distributions of source and channel symbols that can be induced by a coding scheme. We consider strictly causal encoders that generate channel inputs, without access to the past channel states, henceforth driving the current Markov state evolution. Our main result is the single-letter inner and outer bounds of the set of achievable joint distributions, coordinating all the symbols in the network. To establish the inner bound, we introduce a new notion of typicality, the input-driven Markov typicality, and develop its fundamental properties. Contrary to the classical block-Markov coding schemes that rely on blockwise independence for discrete memoryless channels, our analysis directly exploits the Markov channel structure and improves beyond the independence-based arguments.

Subject: Information Theory

Publish: 2026-01-16 18:59:06 UTC


#3 How Long Is a Piece of String? A Brief Empirical Analysis of Tokenizers [PDF5] [Copy] [Kimi10] [REL]

Authors: Jonathan Roberts, Kai Han, Samuel Albanie

Frontier LLMs are increasingly utilised across academia, society and industry. A commonly used unit for comparing models, their inputs and outputs, and estimating inference pricing is the token. In general, tokens are used as a stable currency, assumed to be broadly consistent across tokenizers and contexts, enabling direct comparisons. However, tokenization varies significantly across models and domains of text, making naive interpretation of token counts problematic. We quantify this variation by providing a comprehensive empirical analysis of tokenization, exploring the compression of sequences to tokens across different distributions of textual data. Our analysis challenges commonly held heuristics about token lengths, finding them to be overly simplistic. We hope the insights of our study add clarity and intuition toward tokenization in contemporary LLMs.

Subject: Computation and Language

Publish: 2026-01-16 18:58:29 UTC


#4 Do explanations generalize across large reasoning models? [PDF6] [Copy] [Kimi18] [REL]

Authors: Koyena Pal, David Bau, Chandan Singh

Large reasoning models (LRMs) produce a textual chain of thought (CoT) in the process of solving a problem, which serves as a potentially powerful tool to understand the problem by surfacing a human-readable, natural-language explanation. However, it is unclear whether these explanations generalize, i.e. whether they capture general patterns about the underlying problem rather than patterns which are esoteric to the LRM. This is a crucial question in understanding or discovering new concepts, e.g. in AI for science. We study this generalization question by evaluating a specific notion of generalizability: whether explanations produced by one LRM induce the same behavior when given to other LRMs. We find that CoT explanations often exhibit this form of generalization (i.e. they increase consistency between LRMs) and that this increased generalization is correlated with human preference rankings and post-training with reinforcement learning. We further analyze the conditions under which explanations yield consistent answers and propose a straightforward, sentence-level ensembling strategy that improves consistency. Taken together, these results prescribe caution when using LRM explanations to yield new insights and outline a framework for characterizing LRM explanation generalization.

Subjects: Computation and Language , Artificial Intelligence

Publish: 2026-01-16 18:55:29 UTC


#5 Building Production-Ready Probes For Gemini [PDF7] [Copy] [Kimi13] [REL]

Authors: János Kramár, Joshua Engels, Zheng Wang, Bilal Chughtai, Rohin Shah, Neel Nanda, Arthur Conmy

Frontier language model capabilities are improving rapidly. We thus need stronger mitigations against bad actors misusing increasingly powerful systems. Prior work has shown that activation probes may be a promising misuse mitigation technique, but we identify a key remaining challenge: probes fail to generalize under important production distribution shifts. In particular, we find that the shift from short-context to long-context inputs is difficult for existing probe architectures. We propose several new probe architecture that handle this long-context distribution shift. We evaluate these probes in the cyber-offensive domain, testing their robustness against various production-relevant shifts, including multi-turn conversations, static jailbreaks, and adaptive red teaming. Our results demonstrate that while multimax addresses context length, a combination of architecture choice and training on diverse distributions is required for broad generalization. Additionally, we show that pairing probes with prompted classifiers achieves optimal accuracy at a low cost due to the computational efficiency of probes. These findings have informed the successful deployment of misuse mitigation probes in user-facing instances of Gemini, Google's frontier language model. Finally, we find early positive results using AlphaEvolve to automate improvements in both probe architecture search and adaptive red teaming, showing that automating some AI safety research is already possible.

Subjects: Machine Learning , Artificial Intelligence , Computation and Language

Publish: 2026-01-16 18:54:29 UTC


#6 ShapeR: Robust Conditional 3D Shape Generation from Casual Captures [PDF11] [Copy] [Kimi6] [REL]

Authors: Yawar Siddiqui, Duncan Frost, Samir Aroudj, Armen Avetisyan, Henry Howard-Jenkins, Daniel DeTone, Pierre Moulon, Qirui Wu, Zhengqin Li, Julian Straub, Richard Newcombe, Jakob Engel

Recent advances in 3D shape generation have achieved impressive results, but most existing methods rely on clean, unoccluded, and well-segmented inputs. Such conditions are rarely met in real-world scenarios. We present ShapeR, a novel approach for conditional 3D object shape generation from casually captured sequences. Given an image sequence, we leverage off-the-shelf visual-inertial SLAM, 3D detection algorithms, and vision-language models to extract, for each object, a set of sparse SLAM points, posed multi-view images, and machine-generated captions. A rectified flow transformer trained to effectively condition on these modalities then generates high-fidelity metric 3D shapes. To ensure robustness to the challenges of casually captured data, we employ a range of techniques including on-the-fly compositional augmentations, a curriculum training scheme spanning object- and scene-level datasets, and strategies to handle background clutter. Additionally, we introduce a new evaluation benchmark comprising 178 in-the-wild objects across 7 real-world scenes with geometry annotations. Experiments show that ShapeR significantly outperforms existing approaches in this challenging setting, achieving an improvement of 2.7x in Chamfer distance compared to state of the art.

Subjects: Computer Vision and Pattern Recognition , Machine Learning

Publish: 2026-01-16 18:51:24 UTC


#7 Capacity Constraints Make Admissions Processes Less Predictable [PDF] [Copy] [Kimi] [REL]

Authors: Evan Dong, Nikhil Garg, Sarah Dean

Machine learning models are often used to make predictions about admissions process outcomes, such as for colleges or jobs. However, such decision processes differ substantially from the conventional machine learning paradigm. Because admissions decisions are capacity-constrained, whether a student is admitted depends on the other applicants who apply. We show how this dependence affects predictive performance even in otherwise ideal settings. Theoretically, we introduce two concepts that characterize the relationship between admission function properties, machine learning representation, and generalization to applicant pool distribution shifts: instability, which measures how many existing decisions can change when a single new applicant is introduced; and variability, which measures the number of unique students whose decisions can change. Empirically, we illustrate our theory on individual-level admissions data from the New York City high school matching system, showing that machine learning performance degrades as the applicant pool increasingly differs from the training data. Furthermore, there are larger performance drops for schools using decision rules that are more unstable and variable. Our work raises questions about the reliability of predicting individual admissions probabilities.

Subject: Computers and Society

Publish: 2026-01-16 18:48:46 UTC


#8 Applying Formal Methods Tools to an Electronic Warfare Codebase (Experience report) [PDF] [Copy] [Kimi] [REL]

Authors: Letitia W. Li, Denley Lam, Vu Le, Daniel Mitchell, Mark J. Gerken, Robert B. Ross

While using formal methods offers advantages over unit testing, their steep learning curve can be daunting to developers and can be a major impediment to widespread adoption. To support integration into an industrial software engineering workflow, a tool must provide useful information and must be usable with relatively minimal user effort. In this paper, we discuss our experiences associated with identifying and applying formal methods tools on an electronic warfare (EW) system with stringent safety requirements and present perspectives on formal methods tools from EW software engineers who are proficient in development yet lack formal methods training. In addition to a difference in mindset between formal methods and unit testing approaches, some formal methods tools use terminology or annotations that differ from their target programming language, creating another barrier to adoption. Input/output contracts, objects in memory affected by a function, and loop invariants can be difficult to grasp and use. In addition to usability, our findings include a comparison of vulnerabilities detected by different tools. Finally, we present suggestions for improving formal methods usability including better documentation of capabilities, decreased manual effort, and improved handling of library code.

Subjects: Logic in Computer Science , Software Engineering

Publish: 2026-01-16 18:46:19 UTC


#9 ReScene4D: Temporally Consistent Semantic Instance Segmentation of Evolving Indoor 3D Scenes [PDF9] [Copy] [Kimi4] [REL]

Authors: Emily Steiner, Jianhao Zheng, Henry Howard-Jenkins, Chris Xie, Iro Armeni

Indoor environments evolve as objects move, appear, or disappear. Capturing these dynamics requires maintaining temporally consistent instance identities across intermittently captured 3D scans, even when changes are unobserved. We introduce and formalize the task of temporally sparse 4D indoor semantic instance segmentation (SIS), which jointly segments, identifies, and temporally associates object instances. This setting poses a challenge for existing 3DSIS methods, which require a discrete matching step due to their lack of temporal reasoning, and for 4D LiDAR approaches, which perform poorly due to their reliance on high-frequency temporal measurements that are uncommon in the longer-horizon evolution of indoor environments. We propose ReScene4D, a novel method that adapts 3DSIS architectures for 4DSIS without needing dense observations. It explores strategies to share information across observations, demonstrating that this shared context not only enables consistent instance tracking but also improves standard 3DSIS quality. To evaluate this task, we define a new metric, t-mAP, that extends mAP to reward temporal identity consistency. ReScene4D achieves state-of-the-art performance on the 3RScan dataset, establishing a new benchmark for understanding evolving indoor scenes.

Subject: Computer Vision and Pattern Recognition

Publish: 2026-01-16 18:45:19 UTC


#10 Industry Influence in High-Profile Social Media Research [PDF1] [Copy] [Kimi] [REL]

Authors: Joseph Bak-Coleman, Jevin West, Cailin O'Connor, Carl T. Bergstrom

To what extent is social media research independent from industry influence? Leveraging openly available data, we show that half of the research published in top journals has disclosable ties to industry in the form of prior funding, collaboration, or employment. However, the majority of these ties go undisclosed in the published research. These trends do not arise from broad scientific engagement with industry, but rather from a select group of scientists who maintain long-lasting relationships with industry. Undisclosed ties to industry are common not just among authors, but among reviewers and academic editors during manuscript evaluation. Further, industry-tied research garners more attention within the academy, among policymakers, on social media, and in the news. Finally, we find evidence that industry ties are associated with a topical focus away from impacts of platform-scale features. Together, these findings suggest industry influence in social media research is extensive, impactful, and often opaque. Going forward there is a need to strengthen disclosure norms and implement policies to ensure the visibility of independent research, and the integrity of industry supported research.

Subject: Social and Information Networks

Publish: 2026-01-16 18:43:21 UTC


#11 MetaboNet: The Largest Publicly Available Consolidated Dataset for Type 1 Diabetes Management [PDF] [Copy] [Kimi2] [REL]

Authors: Miriam K. Wolff, Peter Calhoun, Eleonora Maria Aiello, Yao Qin, Sam F. Royston

Progress in Type 1 Diabetes (T1D) algorithm development is limited by the fragmentation and lack of standardization across existing T1D management datasets. Current datasets differ substantially in structure and are time-consuming to access and process, which impedes data integration and reduces the comparability and generalizability of algorithmic developments. This work aims to establish a unified and accessible data resource for T1D algorithm development. Multiple publicly available T1D datasets were consolidated into a unified resource, termed the MetaboNet dataset. Inclusion required the availability of both continuous glucose monitoring (CGM) data and corresponding insulin pump dosing records. Additionally, auxiliary information such as reported carbohydrate intake and physical activity was retained when present. The MetaboNet dataset comprises 3135 subjects and 1228 patient-years of overlapping CGM and insulin data, making it substantially larger than existing standalone benchmark datasets. The resource is distributed as a fully public subset available for immediate download at https://metabo-net.org/ , and with a Data Use Agreement (DUA)-restricted subset accessible through their respective application processes. For the datasets in the latter subset, processing pipelines are provided to automatically convert the data into the standardized MetaboNet format. A consolidated public dataset for T1D research is presented, and the access pathways for both its unrestricted and DUA-governed components are described. The resulting dataset covers a broad range of glycemic profiles and demographics and thus can yield more generalizable algorithmic performance than individual datasets.

Subjects: Machine Learning , Artificial Intelligence , Systems and Control , Quantitative Methods

Publish: 2026-01-16 18:38:33 UTC


#12 Coding Schemes for the Noisy Torn Paper Channel [PDF] [Copy] [Kimi] [REL]

Authors: Frederik Walter, Maria Abu-Sini, Nils Weinhardt, Antonia Wachter-Zeh

To make DNA a suitable medium for archival data storage, it is essential to consider the decay process of the strands observed in DNA storage systems. This paper studies the decay process as a probabilistic noisy torn paper channel (TPC), which first corrupts the bits of the transmitted sequence in a probabilistic manner by substitutions, then breaks the sequence into a set of noisy unordered substrings. The present work devises coding schemes for the noisy TPC by embedding markers in the transmitted sequence. We investigate the use of static markers and markers connected to the data in the form of hash functions. These two tools have also been recently exploited to tackle the noiseless TPC. Simulations show that static markers excel at higher substitution probabilities, while data-dependent markers are superior at lower noise levels. Both approaches achieve reconstruction rates exceeding $99\%$ with no false decodings observed, primarily limited by computational resources.

Subject: Information Theory

Publish: 2026-01-16 18:32:48 UTC


#13 QUPID: A Partitioned Quantum Neural Network for Anomaly Detection in Smart Grid [PDF] [Copy] [Kimi3] [REL]

Authors: Hoang M. Ngo, Tre' R. Jeter, Jung Taek Seo, My T. Thai

Smart grid infrastructures have revolutionized energy distribution, but their day-to-day operations require robust anomaly detection methods to counter risks associated with cyber-physical threats and system faults potentially caused by natural disasters, equipment malfunctions, and cyber attacks. Conventional machine learning (ML) models are effective in several domains, yet they struggle to represent the complexities observed in smart grid systems. Furthermore, traditional ML models are highly susceptible to adversarial manipulations, making them increasingly unreliable for real-world deployment. Quantum ML (QML) provides a unique advantage, utilizing quantum-enhanced feature representations to model the intricacies of the high-dimensional nature of smart grid systems while demonstrating greater resilience to adversarial manipulation. In this work, we propose QUPID, a partitioned quantum neural network (PQNN) that outperforms traditional state-of-the-art ML models in anomaly detection. We extend our model to R-QUPID that even maintains its performance when including differential privacy (DP) for enhanced robustness. Moreover, our partitioning framework addresses a significant scalability problem in QML by efficiently distributing computational workloads, making quantum-enhanced anomaly detection practical in large-scale smart grid environments. Our experimental results across various scenarios exemplifies the efficacy of QUPID and R-QUPID to significantly improve anomaly detection capabilities and robustness compared to traditional ML approaches.

Subject: Machine Learning

Publish: 2026-01-16 18:30:24 UTC


#14 On the Probability of First Success in Differential Evolution: Hazard Identities and Tail Bounds [PDF] [Copy] [Kimi] [REL]

Authors: Dimitar Nedanovski, Svetoslav Nenov, Dimitar Pilev

We study first-hitting times in Differential Evolution (DE) through a conditional hazard frame work. Instead of analyzing convergence via Markov-chain transition kernels or drift arguments, we ex press the survival probability of a measurable target set $A$ as a product of conditional first-hit probabilities (hazards) $p_t=\Prob(E_t\mid\mathcal F_{t-1})$. This yields distribution-free identities for survival and explicit tail bounds whenever deterministic lower bounds on the hazard hold on the survival event. For the L-SHADE algorithm with current-to-$p$best/1 mutation, we construct a checkable algorithmic witness event $\mathcal L_t$ under which the conditional hazard admits an explicit lower bound depending only on sampling rules, population size, and crossover statistics. This separates theoretical constants from empirical event frequencies and explains why worst-case constant-hazard bounds are typically conservative. We complement the theory with a Kaplan--Meier survival analysis on the CEC2017 benchmark suite . Across functions and budgets, we identify three distinct empirical regimes: (i) strongly clustered success, where hitting times concentrate in short bursts; (ii) approximately geometric tails, where a constant-hazard model is accurate; and (iii) intractable cases with no observed hits within the evaluation horizon. The results show that while constant-hazard bounds provide valid tail envelopes, the practical behavior of L-SHADE is governed by burst-like transitions rather than homogeneous per-generati on success probabilities.

Subjects: Neural and Evolutionary Computing , Machine Learning

Publish: 2026-01-16 18:24:24 UTC


#15 Convergence Properties of Good Quantum Codes for Classical Communication [PDF] [Copy] [Kimi] [REL]

Authors: Alptug Aytekin, Mohamed Nomeir, Lei Hu, Sennur Ulukus

An important part of the information theory folklore had been about the output statistics of codes that achieve the capacity and how the empirical distributions compare to the output distributions induced by the optimal input in the channel capacity problem. Results for a variety of such empirical output distributions of good codes have been known in the literature, such as the comparison of the output distribution of the code to the optimal output distribution in vanishing and non-vanishing error probability cases. Motivated by these, we aim to achieve similar results for the quantum codes that are used for classical communication, that is the setting in which the classical messages are communicated through quantum codewords that pass through a noisy quantum channel. We first show the uniqueness of the optimal output distribution, to be able to talk more concretely about the optimal output distribution. Then, we extend the vanishing error probability results to the quantum case, by using techniques that are close in spirit to the classical case. We also extend non-vanishing error probability results to the quantum case on block codes, by using the second-order converses for such codes based on hypercontractivity results for the quantum generalized depolarizing semi-groups.

Subjects: Information Theory , Networking and Internet Architecture , Signal Processing , Quantum Physics

Publish: 2026-01-16 18:22:05 UTC


#16 The Poisoned Apple Effect: Strategic Manipulation of Mediated Markets via Technology Expansion of AI Agents [PDF] [Copy] [Kimi5] [REL]

Authors: Eilam Shapira, Roi Reichart, Moshe Tennenholtz

The integration of AI agents into economic markets fundamentally alters the landscape of strategic interaction. We investigate the economic implications of expanding the set of available technologies in three canonical game-theoretic settings: bargaining (resource division), negotiation (asymmetric information trade), and persuasion (strategic information transmission). We find that simply increasing the choice of AI delegates can drastically shift equilibrium payoffs and regulatory outcomes, often creating incentives for regulators to proactively develop and release technologies. Conversely, we identify a strategic phenomenon termed the "Poisoned Apple" effect: an agent may release a new technology, which neither they nor their opponent ultimately uses, solely to manipulate the regulator's choice of market design in their favor. This strategic release improves the releaser's welfare at the expense of their opponent and the regulator's fairness objectives. Our findings demonstrate that static regulatory frameworks are vulnerable to manipulation via technology expansion, necessitating dynamic market designs that adapt to the evolving landscape of AI capabilities.

Subjects: Computer Science and Game Theory , Artificial Intelligence , Computation and Language , Multiagent Systems

Publish: 2026-01-16 18:18:03 UTC


#17 Efficient error estimators for Generalized Nyström [PDF] [Copy] [Kimi] [REL]

Authors: Lorenzo Lazzarino, Katherine J. Pearce, Nathaniel Pritchard

Randomized algorithms in numerical linear algebra have proven to be effective in ameliorating issues of scalability when working with large matrices, efficiently producing accurate low-rank approximations. A key remaining challenge, however, is to efficiently assess the approximation accuracy of randomized methods without additional expensive matrix accesses. Recent work has addressed this issue by deriving fast leave-one-out error estimators for the randomized SVD and Nyström decomposition, enabling accurate error estimation with no additional matrix accesses. In this work, we extend the leave-one-out framework to the generalized Nyström decomposition, an approach that can be applied to general rectangular matrices. We do this by deriving three new leave-one-out error estimators and validating their effectiveness through numerical experiments.

Subject: Numerical Analysis

Publish: 2026-01-16 18:16:40 UTC


#18 BoxMind: Closed-loop AI strategy optimization for elite boxing validated in the 2024 Olympics [PDF4] [Copy] [Kimi13] [REL]

Authors: Kaiwen Wang, Kaili Zheng, Rongrong Deng, Qingmin Fan, Milin Zhang, Zongrui Li, Xuesi Zhou, Bo Han, Liren Chen, Chenyi Guo, Ji Wu

Competitive sports require sophisticated tactical analysis, yet combat disciplines like boxing remain underdeveloped in AI-driven analytics due to the complexity of action dynamics and the lack of structured tactical representations. To address this, we present BoxMind, a closed-loop AI expert system validated in elite boxing competition. By defining atomic punch events with precise temporal boundaries and spatial and technical attributes, we parse match footage into 18 hierarchical technical-tactical indicators. We then propose a graph-based predictive model that fuses these explicit technical-tactical profiles with learnable, time-variant latent embeddings to capture the dynamics of boxer matchups. Modeling match outcome as a differentiable function of technical-tactical indicators, we turn winning probability gradients into executable tactical adjustments. Experiments show that the outcome prediction model achieves state-of-the-art performance, with 69.8% accuracy on BoxerGraph test set and 87.5% on Olympic matches. Using this predictive model as a foundation, the system generates strategic recommendations that demonstrate proficiency comparable to human experts. BoxMind is validated through a closed-loop deployment during the 2024 Paris Olympics, directly contributing to the Chinese National Team's historic achievement of three gold and two silver medals. BoxMind establishes a replicable paradigm for transforming unstructured video data into strategic intelligence, bridging the gap between computer vision and decision support in competitive sports.

Subject: Artificial Intelligence

Publish: 2026-01-16 18:14:46 UTC


#19 Extractive summarization on a CMOS Ising machine [PDF] [Copy] [Kimi1] [REL]

Authors: Ziqing Zeng, Abhimanyu Kumar, Chris H. Kim, Ulya R. Karpuzcu, Sachin S. Sapatnekar

Extractive summarization (ES) aims to generate a concise summary by selecting a subset of sentences from a document while maximizing relevance and minimizing redundancy. Although modern ES systems achieve high accuracy using powerful neural models, their deployment typically relies on CPU or GPU infrastructures that are energy-intensive and poorly suited for real-time inference in resource-constrained environments. In this work, we explore the feasibility of implementing McDonald-style extractive summarization on a low-power CMOS coupled oscillator-based Ising machine (COBI) that supports integer-valued, all-to-all spin couplings. We first propose a hardware-aware Ising formulation that reduces the scale imbalance between local fields and coupling terms, thereby improving robustness to coefficient quantization: this method can be applied to any problem formulation that requires k of n variables to be chosen. We then develop a complete ES pipeline including (i) stochastic rounding and iterative refinement to compensate for precision loss, and (ii) a decomposition strategy that partitions a large ES problem into smaller Ising subproblems that can be efficiently solved on COBI and later combined. Experimental results on the CNN/DailyMail dataset show that our pipeline can produce high-quality summaries using only integer-coupled Ising hardware with limited precision. COBI achieves 3-4.5x runtime speedups compared to a brute-force method, which is comparable to software Tabu search, and two to three orders of magnitude reductions in energy, while maintaining competitive summary quality. These results highlight the potential of deploying CMOS Ising solvers for real-time, low-energy text summarization on edge devices.

Subjects: Machine Learning , Emerging Technologies

Publish: 2026-01-16 18:14:02 UTC


#20 CTest-Metric: A Unified Framework to Assess Clinical Validity of Metrics for CT Report Generation [PDF4] [Copy] [Kimi2] [REL]

Authors: Vanshali Sharma, Andrea Mia Bejar, Gorkem Durak, Ulas Bagci

In the generative AI era, where even critical medical tasks are increasingly automated, radiology report generation (RRG) continues to rely on suboptimal metrics for quality assessment. Developing domain-specific metrics has therefore been an active area of research, yet it remains challenging due to the lack of a unified, well-defined framework to assess their robustness and applicability in clinical contexts. To address this, we present CTest-Metric, a first unified metric assessment framework with three modules determining the clinical feasibility of metrics for CT RRG. The modules test: (i) Writing Style Generalizability (WSG) via LLM-based rephrasing; (ii) Synthetic Error Injection (SEI) at graded severities; and (iii) Metrics-vs-Expert correlation (MvE) using clinician ratings on 175 "disagreement" cases. Eight widely used metrics (BLEU, ROUGE, METEOR, BERTScore-F1, F1-RadGraph, RaTEScore, GREEN Score, CRG) are studied across seven LLMs built on a CT-CLIP encoder. Using our novel framework, we found that lexical NLG metrics are highly sensitive to stylistic variations; GREEN Score aligns best with expert judgments (Spearman~0.70), while CRG shows negative correlation; and BERTScore-F1 is least sensitive to factual error injection. We will release the framework, code, and allowable portion of the anonymized evaluation data (rephrased/error-injected CT reports), to facilitate reproducible benchmarking and future metric development.

Subjects: Computation and Language , Computer Vision and Pattern Recognition

Publish: 2026-01-16 18:09:19 UTC


#21 Space-Optimal, Computation-Optimal, Topology-Agnostic, Throughput-Scalable Causal Delivery through Hybrid Buffering [PDF] [Copy] [Kimi] [REL]

Author: Paulo Sérgio Almeida

Message delivery respecting causal ordering (causal delivery) is one of the most classic and widely useful abstraction for inter-process communication in a distributed system. Most approaches tag messages with causality information and buffer them at the receiver until they can be safely delivered. Except for specific approaches that exploit communication topology, therefore not generally applicable, they incur a metadata overhead which is prohibitive for a large number of processes. Much less used are the approaches that enforce causal order by buffering messages at the sender, until it is safe to release them to the network, as the classic algorithm has too many drawbacks. In this paper, first we discuss the limitations of sender-only buffering approaches and introduce the Sender Permission to Send (SPS) enforcement strategy, showing that SPS + FIFO implies Causal. We analyze a recent sender-buffering algorithm, Cykas, which follows SPS + FIFO, albeit very conservatively, pointing out throughput scalability and liveness issues. Then, we introduce a novel SPS + FIFO based algorithm, which adopts a new hybrid approach: enforcing causality by combining sender-buffering to enforce SPS and receiver-buffering to enforce FIFO. The algorithm overcomes limitations of sender-only buffering, and achieves effectively constant metadata size per message. By a careful choice of data-structures, the algorithm is also computationally-optimal, with amortized effectively constant processing overhead. As far as we know, there is no other topology-agnostic causal delivery algorithm with these properties.

Subject: Distributed, Parallel, and Cluster Computing

Publish: 2026-01-16 18:08:09 UTC


#22 Tensor field tomography with attenuation and refraction: adjoint operators for the dynamic case and numerical experiments [PDF] [Copy] [Kimi] [REL]

Authors: Lukas Vierus, Thomas Schuster, Bernadette Hahn

This article is concerned with tensor field tomography in a fairly general setting, that takes refraction, attenuation and time-dependence of tensor fields into account. The mathematical model is given by attenuated ray transforms of the fields along geodesic curves corresponding to a Riemannian metric that is defined by the index of refraction. The data are given at the boundary tangent bundle of the domain and it is well-known that they can be characterized as boundary data of a transport equation turning tensor field tomography into an inverse source problem. This way the adjoint of the forward mapping can be computed using the integral representation or, equivalently, associated to a dual transport equation. The article offers and proves two different representations for the adjoint mappings both in the dynamic and static case. The numerical implementation is demonstrated and evaluated for static fields using the damped Landweber method with Nesterov acceleration applied to both, the integral and PDE-based formulations. The transport equations are solved using a viscosity approximation. The error analysis reveals that the integral representation significantly outperforms PDE-based methods in terms of computational efficiency while achieving comparable reconstruction accuracy. The impact of noise and deviations from straight-line trajectories are investigated confirming improved accuracy if refraction is taken into account. We conclude that the inclusion of refraction to the forward model pays in spite of increased numerical cost.

Subject: Numerical Analysis

Publish: 2026-01-16 18:04:29 UTC


#23 Health Facility Location in Ethiopia: Leveraging LLMs to Integrate Expert Knowledge into Algorithmic Planning [PDF1] [Copy] [Kimi1] [REL]

Authors: Yohai Trabelsi, Guojun Xiong, Fentabil Getnet, Stéphane Verguet, Milind Tambe

Ethiopia's Ministry of Health is upgrading health posts to improve access to essential services, particularly in rural areas. Limited resources, however, require careful prioritization of which facilities to upgrade to maximize population coverage while accounting for diverse expert and stakeholder preferences. In collaboration with the Ethiopian Public Health Institute and Ministry of Health, we propose a hybrid framework that systematically integrates expert knowledge with optimization techniques. Classical optimization methods provide theoretical guarantees but require explicit, quantitative objectives, whereas stakeholder criteria are often articulated in natural language and difficult to formalize. To bridge these domains, we develop the Large language model and Extended Greedy (LEG) framework. Our framework combines a provable approximation algorithm for population coverage optimization with LLM-driven iterative refinement that incorporates human-AI alignment to ensure solutions reflect expert qualitative guidance while preserving coverage guarantees. Experiments on real-world data from three Ethiopian regions demonstrate the framework's effectiveness and its potential to inform equitable, data-driven health system planning.

Subject: Artificial Intelligence

Publish: 2026-01-16 18:02:09 UTC


#24 Generative Scenario Rollouts for End-to-End Autonomous Driving [PDF9] [Copy] [Kimi1] [REL]

Authors: Rajeev Yasarla, Deepti Hegde, Shizhong Han, Hsin-Pai Cheng, Yunxiao Shi, Meysam Sadeghigooghari, Shweta Mahajan, Apratim Bhattacharyya, Litian Liu, Risheek Garrepalli, Thomas Svantesson, Fatih Porikli, Hong Cai

Vision-Language-Action (VLA) models are emerging as highly effective planning models for end-to-end autonomous driving systems. However, current works mostly rely on imitation learning from sparse trajectory annotations and under-utilize their potential as generative models. We propose Generative Scenario Rollouts (GeRo), a plug-and-play framework for VLA models that jointly performs planning and generation of language-grounded future traffic scenes through an autoregressive rollout strategy. First, a VLA model is trained to encode ego vehicle and agent dynamics into latent tokens under supervision from planning, motion, and language tasks, facilitating text-aligned generation. Next, GeRo performs language-conditioned autoregressive generation. Given multi-view images, a scenario description, and ego-action questions, it generates future latent tokens and textual responses to guide long-horizon rollouts. A rollout-consistency loss stabilizes predictions using ground truth or pseudo-labels, mitigating drift and preserving text-action alignment. This design enables GeRo to perform temporally consistent, language-grounded rollouts that support long-horizon reasoning and multi-agent planning. On Bench2Drive, GeRo improves driving score and success rate by +15.7 and +26.2, respectively. By integrating reinforcement learning with generative rollouts, GeRo achieves state-of-the-art closed-loop and open-loop performance, demonstrating strong zero-shot robustness. These results highlight the promise of generative, language-conditioned reasoning as a foundation for safer and more interpretable end-to-end autonomous driving.

Subject: Computer Vision and Pattern Recognition

Publish: 2026-01-16 17:59:28 UTC


#25 Low-Rank Key Value Attention [PDF7] [Copy] [Kimi5] [REL]

Authors: James O'Neill, Robert Clancy, Mariia Matskevichus, Fergal Reid

Transformer pretraining is increasingly constrained by memory and compute requirements, with the key-value (KV) cache emerging as a dominant bottleneck during training and autoregressive decoding. We propose \textit{low-rank KV adaptation} (LRKV), a simple modification of multi-head attention that reduces KV cache memory by exploiting redundancy across attention heads while preserving full token-level resolution. Each layer uses a shared full-rank KV projection augmented with low-rank, head-specific residuals, yielding a continuous trade-off between complete sharing and fully independent attention. LRKV is a drop-in replacement for standard multi-head attention and directly subsumes query-sharing approaches such as multi-query and grouped-query attention, while remaining distinct from latent-compression methods such as multi-latent attention (MLA). Across large-scale pretraining experiments, LRKV consistently achieves faster loss reduction, lower validation perplexity, and stronger downstream task performance than standard attention, MQA/GQA, and MLA. At the 2.5B scale, LRKV outperforms standard attention while using roughly half the KV cache, and reaches equivalent model quality with up to \textbf{20-25\% less training compute} when measured in cumulative FLOPs. To explain these gains, we analyze attention head structure in operator space and show that LRKV preserves nearly all functional head diversity relative to standard attention, whereas more aggressive KV-sharing mechanisms rely on compensatory query specialization. Together, these results establish LRKV as a practical and effective attention mechanism for scaling Transformer pretraining under memory- and compute-constrained regimes.

Subject: Machine Learning

Publish: 2026-01-16 17:56:40 UTC