Computer Science

2025-12-30 | | Total: 826

#1 Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion [PDF6] [Copy] [Kimi3] [REL]

Authors: Hau-Shiang Shiu, Chin-Yang Lin, Zhixiang Wang, Chi-Wei Hsiao, Po-Fan Yu, Yu-Chih Chen, Yu-Lun Liu

Diffusion-based video super-resolution (VSR) methods achieve strong perceptual quality but remain impractical for latency-sensitive settings due to reliance on future frames and expensive multi-step denoising. We propose Stream-DiffVSR, a causally conditioned diffusion framework for efficient online VSR. Operating strictly on past frames, it combines a four-step distilled denoiser for fast inference, an Auto-regressive Temporal Guidance (ARTG) module that injects motion-aligned cues during latent denoising, and a lightweight temporal-aware decoder with a Temporal Processor Module (TPM) that enhances detail and temporal coherence. Stream-DiffVSR processes 720p frames in 0.328 seconds on an RTX4090 GPU and significantly outperforms prior diffusion-based methods. Compared with the online SOTA TMP, it boosts perceptual quality (LPIPS +0.095) while reducing latency by over 130x. Stream-DiffVSR achieves the lowest latency reported for diffusion-based VSR, reducing initial delay from over 4600 seconds to 0.328 seconds, thereby making it the first diffusion VSR method suitable for low-latency online deployment. Project page: https://jamichss.github.io/stream-diffvsr-project-page/

Subject: Computer Vision and Pattern Recognition

Publish: 2025-12-29 18:59:57 UTC


#2 Training AI Co-Scientists Using Rubric Rewards [PDF4] [Copy] [Kimi1] [REL]

Authors: Shashwat Goel, Rishi Hazra, Dulhan Jayalath, Timon Willi, Parag Jain, William F. Shen, Ilias Leontiadis, Francesco Barbieri, Yoram Bachrach, Jonas Geiping, Chenxi Whitehouse

AI co-scientists are emerging as a tool to assist human researchers in achieving their research goals. A crucial feature of these AI co-scientists is the ability to generate a research plan given a set of aims and constraints. The plan may be used by researchers for brainstorming, or may even be implemented after further refinement. However, language models currently struggle to generate research plans that follow all constraints and implicit requirements. In this work, we study how to leverage the vast corpus of existing research papers to train language models that generate better research plans. We build a scalable, diverse training corpus by automatically extracting research goals and goal-specific grading rubrics from papers across several domains. We then train models for research plan generation via reinforcement learning with self-grading. A frozen copy of the initial policy acts as the grader during training, with the rubrics creating a generator-verifier gap that enables improvements without external human supervision. To validate this approach, we conduct a study with human experts for machine learning research goals, spanning 225 hours. The experts prefer plans generated by our finetuned Qwen3-30B-A3B model over the initial model for 70% of research goals, and approve 84% of the automatically extracted goal-specific grading rubrics. To assess generality, we also extend our approach to research goals from medical papers, and new arXiv preprints, evaluating with a jury of frontier models. Our finetuning yields 12-22% relative improvements and significant cross-domain generalization, proving effective even in problem settings like medical research where execution feedback is infeasible. Together, these findings demonstrate the potential of a scalable, automated training recipe as a step towards improving general AI co-scientists.

Subjects: Machine Learning , Computation and Language , Human-Computer Interaction

Publish: 2025-12-29 18:59:33 UTC


#3 Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation [PDF4] [Copy] [Kimi2] [REL]

Authors: Shaocong Xu, Songlin Wei, Qizhe Wei, Zheng Geng, Hong Li, Licheng Shen, Qianpu Sun, Shu Han, Bin Ma, Bohan Li, Chongjie Ye, Yuhang Zheng, Nan Wang, Saining Zhang, Hao Zhao

Transparent objects remain notoriously hard for perception systems: refraction, reflection and transmission break the assumptions behind stereo, ToF and purely discriminative monocular depth, causing holes and temporally unstable estimates. Our key observation is that modern video diffusion models already synthesize convincing transparent phenomena, suggesting they have internalized the optical rules. We build TransPhy3D, a synthetic video corpus of transparent/reflective scenes: 11k sequences rendered with Blender/Cycles. Scenes are assembled from a curated bank of category-rich static assets and shape-rich procedural assets paired with glass/plastic/metal materials. We render RGB + depth + normals with physically based ray tracing and OptiX denoising. Starting from a large video diffusion model, we learn a video-to-video translator for depth (and normals) via lightweight LoRA adapters. During training we concatenate RGB and (noisy) depth latents in the DiT backbone and co-train on TransPhy3D and existing frame-wise synthetic datasets, yielding temporally consistent predictions for arbitrary-length input videos. The resulting model, DKT, achieves zero-shot SOTA on real and synthetic video benchmarks involving transparency: ClearPose, DREDS (CatKnown/CatNovel), and TransPhy3D-Test. It improves accuracy and temporal consistency over strong image/video baselines, and a normal variant sets the best video normal estimation results on ClearPose. A compact 1.3B version runs at ~0.17 s/frame. Integrated into a grasping stack, DKT's depth boosts success rates across translucent, reflective and diffuse surfaces, outperforming prior estimators. Together, these results support a broader claim: "Diffusion knows transparency." Generative video priors can be repurposed, efficiently and label-free, into robust, temporally coherent perception for challenging real-world manipulation.

Subject: Computer Vision and Pattern Recognition

Publish: 2025-12-29 18:59:24 UTC


#4 Robo-Dopamine: General Process Reward Modeling for High-Precision Robotic Manipulation [PDF2] [Copy] [Kimi1] [REL]

Authors: Huajie Tan, Sixiang Chen, Yijie Xu, Zixiao Wang, Yuheng Ji, Cheng Chi, Yaoxu Lyu, Zhongxia Zhao, Xiansheng Chen, Peterson Co, Shaoxuan Xie, Guocai Yao, Pengwei Wang, Zhongyuan Wang, Shanghang Zhang

The primary obstacle for applying reinforcement learning (RL) to real-world robotics is the design of effective reward functions. While recently learning-based Process Reward Models (PRMs) are a promising direction, they are often hindered by two fundamental limitations: their reward models lack step-aware understanding and rely on single-view perception, leading to unreliable assessments of fine-grained manipulation progress; and their reward shaping procedures are theoretically unsound, often inducing a semantic trap that misguides policy optimization. To address these, we introduce Dopamine-Reward, a novel reward modeling method for learning a general-purpose, step-aware process reward model from multi-view inputs. At its core is our General Reward Model (GRM), trained on a vast 3,400+ hour dataset, which leverages Step-wise Reward Discretization for structural understanding and Multi-Perspective Reward Fusion to overcome perceptual limitations. Building upon Dopamine-Reward, we propose Dopamine-RL, a robust policy learning framework that employs a theoretically-sound Policy-Invariant Reward Shaping method, which enables the agent to leverage dense rewards for efficient self-improvement without altering the optimal policy, thereby fundamentally avoiding the semantic trap. Extensive experiments across diverse simulated and real-world tasks validate our approach. GRM achieves state-of-the-art accuracy in reward assessment, and Dopamine-RL built on GRM significantly improves policy learning efficiency. For instance, after GRM is adapted to a new task in a one-shot manner from a single expert trajectory, the resulting reward model enables Dopamine-RL to improve the policy from near-zero to 95% success with only 150 online rollouts (approximately 1 hour of real robot interaction), while retaining strong generalization across tasks. Project website: https://robo-dopamine.github.io

Subject: Robotics

Publish: 2025-12-29 18:57:44 UTC


#5 Eliciting Behaviors in Multi-Turn Conversations [PDF5] [Copy] [Kimi3] [REL]

Authors: Jing Huang, Shujian Zhang, Lun Wang, Andrew Hard, Rajiv Mathews, John Lambert

Identifying specific and often complex behaviors from large language models (LLMs) in conversational settings is crucial for their evaluation. Recent work proposes novel techniques to find natural language prompts that induce specific behaviors from a target model, yet they are mainly studied in single-turn settings. In this work, we study behavior elicitation in the context of multi-turn conversations. We first offer an analytical framework that categorizes existing methods into three families based on their interactions with the target model: those that use only prior knowledge, those that use offline interactions, and those that learn from online interactions. We then introduce a generalized multi-turn formulation of the online method, unifying single-turn and multi-turn elicitation. We evaluate all three families of methods on automatically generating multi-turn test cases. We investigate the efficiency of these approaches by analyzing the trade-off between the query budget, i.e., the number of interactions with the target model, and the success rate, i.e., the discovery rate of behavior-eliciting inputs. We find that online methods can achieve an average success rate of 45/19/77% with just a few thousand queries over three tasks where static methods from existing multi-turn conversation benchmarks find few or even no failure cases. Our work highlights a novel application of behavior elicitation methods in multi-turn conversation evaluation and the need for the community to move towards dynamic benchmarks.

Subjects: Computation and Language , Machine Learning

Publish: 2025-12-29 18:57:10 UTC


#6 OpenPBR: Novel Features and Implementation Details [PDF] [Copy] [Kimi] [REL]

Authors: Jamie Portsmouth, Peter Kutz, Stephen Hill

OpenPBR is a physically based, standardized uber-shader developed for interoperable material authoring and rendering across VFX, animation, and design visualization workflows. This document serves as a companion to the official specification, offering deeper insight into the model's development and more detailed implementation guidance, including code examples and mathematical derivations. We begin with a description of the model's formal structure and theoretical foundations - covering slab-based layering, statistical mixing, and microfacet theory - before turning to its physical components. These include metallic, dielectric, subsurface, and glossy-diffuse base substrates, followed by thin-film iridescence, coat, and fuzz layers. A special-case mode for rendering thin-walled objects is also described. Additional sections explore technical topics in greater depth, such as the decoupling of specular reflectivity from transmission, the choice of parameterization for subsurface scattering, and the detailed physics of coat darkening and thin-film interference. We also discuss planned extensions, including hazy specular reflection and retroreflection.

Subject: Graphics

Publish: 2025-12-29 18:53:00 UTC


#7 Fine-Tuning LLMs with Fine-Grained Human Feedback on Text Spans [PDF1] [Copy] [Kimi2] [REL]

Authors: Sky CH-Wang, Justin Svegliato, Helen Appel, Jason Eisner

We present a method and dataset for fine-tuning language models with preference supervision using feedback-driven improvement chains. Given a model response, an annotator provides fine-grained feedback by marking ``liked'' and ``disliked'' spans and specifying what they liked or disliked about them. The base model then rewrites the disliked spans accordingly, proceeding from left to right, forming a sequence of incremental improvements. We construct preference pairs for direct alignment from each adjacent step in the chain, enabling the model to learn from localized, targeted edits. We find that our approach outperforms direct alignment methods based on standard A/B preference ranking or full contrastive rewrites, demonstrating that structured, revision-based supervision leads to more efficient and effective preference tuning.

Subject: Computation and Language

Publish: 2025-12-29 18:51:56 UTC


#8 Unlocking WebRTC for End User Driven Innovation [PDF] [Copy] [Kimi] [REL]

Author: Kundan Singh

We present a software architecture to enable end user driven innovation of web multimedia communication applications. RTC Helper is a simple and easy-to-use software that can intercept WebRTC (web real-time communication) and related APIs in the browser, and change the behavior of web apps in real-time. Such customization can even be driven by the end user on third-party web apps using our flexible and general purpose browser extension. It also facilitates rapid prototyping of ideas by web developers in their existing web apps without having to rebuild or redeploy after every change. It has more than ten customization categories, and over a hundred built-in examples covering a wide range of novel use cases in web-based audio/video communication.

Subjects: Multimedia , Networking and Internet Architecture

Publish: 2025-12-29 18:44:59 UTC


#9 The Minimum Subgraph Complementation Problem [PDF] [Copy] [Kimi] [REL]

Authors: Juan Gutiérrez, Sagartanu Pal

Subgraph complementation is an operation that toggles all adjacencies inside a selected vertex set. Given a graph \(G\) and a target class \(\mathcal{C}\), the Minimum Subgraph Complementation problem asks for a minimum-size vertex set \(S\) such that complementing the subgraph induced by \(S\) transforms \(G\) into a graph belonging to \(\mathcal{C}\). While the decision version of Subgraph Complementation has been extensively studied and is NP-complete for many graph classes, the algorithmic complexity of its optimization variant has remained largely unexplored. In this paper, we study MSC from an algorithmic perspective. We present polynomial-time algorithms for MSC in several nontrivial settings. Our results include polynomial-time solvability for transforming graphs between bipartite, co-bipartite, and split graphs, as well as for complementing bipartite regular graphs into chordal graphs. We also show that MSC to the class of graphs of fixed degeneracy can be solved in polynomial time when the input graph is a forest. Moreover, we investigate MSC with respect to connectivity and prove that MSC to the class of disconnected graphs and to the class of 2-connected graphs can be solved in polynomial time for arbitrary inputs.

Subjects: Data Structures and Algorithms , Combinatorics

Publish: 2025-12-29 18:44:21 UTC


#10 PROFASR-BENCH: A Benchmark for Context-Conditioned ASR in High-Stakes Professional Speech [PDF] [Copy] [Kimi] [REL]

Author: Deepak Babu Piskala

Automatic Speech Recognition (ASR) in professional settings faces challenges that existing benchmarks underplay: dense domain terminology, formal register variation, and near-zero tolerance for critical entity errors. We present ProfASR-Bench, a professional-talk evaluation suite for high-stakes applications across finance, medicine, legal, and technology. Each example pairs a natural-language prompt (domain cue and/or speaker profile) with an entity-rich target utterance, enabling controlled measurement of context-conditioned recognition. The corpus supports conventional ASR metrics alongside entity-aware scores and slice-wise reporting by accent and gender. Using representative families Whisper (encoder-decoder ASR) and Qwen-Omni (audio language models) under matched no-context, profile, domain+profile, oracle, and adversarial conditions, we find a consistent pattern: lightweight textual context produces little to no change in average word error rate (WER), even with oracle prompts, and adversarial prompts do not reliably degrade performance. We term this the context-utilization gap (CUG): current systems are nominally promptable yet underuse readily available side information. ProfASR-Bench provides a standardized context ladder, entity- and slice-aware reporting with confidence intervals, and a reproducible testbed for comparing fusion strategies across model families. Dataset: https://huggingface.co/datasets/prdeepakbabu/ProfASR-Bench Code: https://github.com/prdeepakbabu/ProfASR-Bench

Subjects: Computation and Language , Sound

Publish: 2025-12-29 18:43:23 UTC


#11 Multilingual Hidden Prompt Injection Attacks on LLM-Based Academic Reviewing [PDF] [Copy] [Kimi1] [REL]

Authors: Panagiotis Theocharopoulos, Ajinkya Kulkarni, Mathew Magimai. -Doss

Large language models (LLMs) are increasingly considered for use in high-impact workflows, including academic peer review. However, LLMs are vulnerable to document-level hidden prompt injection attacks. In this work, we construct a dataset of approximately 500 real academic papers accepted to ICML and evaluate the effect of embedding hidden adversarial prompts within these documents. Each paper is injected with semantically equivalent instructions in four different languages and reviewed using an LLM. We find that prompt injection induces substantial changes in review scores and accept/reject decisions for English, Japanese, and Chinese injections, while Arabic injections produce little to no effect. These results highlight the susceptibility of LLM-based reviewing systems to document-level prompt injection and reveal notable differences in vulnerability across languages.

Subjects: Computation and Language , Artificial Intelligence

Publish: 2025-12-29 18:43:05 UTC


#12 Coloring Hardness on Low Twin-Width Graphs [PDF] [Copy] [Kimi] [REL]

Author: Édouard Bonnet

As the class $\mathcal T_4$ of graphs of twin-width at most 4 contains every finite subgraph of the infinite grid and every graph obtained by subdividing each edge of an $n$-vertex graph at least $2 \log n$ times, most NP-hard graph problems, like Max Independent Set, Dominating Set, Hamiltonian Cycle, remain so on $\mathcal T_4$. However, Min Coloring and k-Coloring are easy on both families because they are 2-colorable and 3-colorable, respectively. We show that Min Coloring is NP-hard on the class $\mathcal T_3$ of graphs of twin-width at most 3. This is the first hardness result on $\mathcal T_3$ for a problem that is easy on cographs (twin-width 0), on trees (whose twin-width is at most 2), and on unit circular-arc graphs (whose twin-width is at most 3). We also show that for every $k \geqslant 3$, k-Coloring is NP-hard on $\mathcal T_4$. We finally make two observations: (1) there are currently very few problems known to be in P on $\mathcal T_d$ (graphs of twin-width at most $d$) and NP-hard on $\mathcal T_{d+1}$ for some nonnegative integer $d$, and (2) unlike $\mathcal T_4$, which contains every graph as an induced minor, the class $\mathcal T_3$ excludes a fixed planar graph as an induced minor; thus it may be viewed as a special case (or potential counterexample) for conjectures about classes excluding a (planar) induced minor. These observations are accompanied by several open questions.

Subjects: Computational Complexity , Discrete Mathematics , Data Structures and Algorithms , Combinatorics

Publish: 2025-12-29 18:36:19 UTC


#13 Web World Models [PDF11] [Copy] [Kimi11] [REL]

Authors: Jichen Feng, Yifan Zhang, Chenggong Zhang, Yifu Lu, Shilong Liu, Mengdi Wang

Language agents increasingly require persistent worlds in which they can act, remember, and learn. Existing approaches sit at two extremes: conventional web frameworks provide reliable but fixed contexts backed by databases, while fully generative world models aim for unlimited environments at the expense of controllability and practical engineering. In this work, we introduce the Web World Model (WWM), a middle ground where world state and ``physics'' are implemented in ordinary web code to ensure logical consistency, while large language models generate context, narratives, and high-level decisions on top of this structured latent state. We build a suite of WWMs on a realistic web stack, including an infinite travel atlas grounded in real geography, fictional galaxy explorers, web-scale encyclopedic and narrative worlds, and simulation- and game-like environments. Across these systems, we identify practical design principles for WWMs: separating code-defined rules from model-driven imagination, representing latent state as typed web interfaces, and utilizing deterministic generation to achieve unlimited but structured exploration. Our results suggest that web stacks themselves can serve as a scalable substrate for world models, enabling controllable yet open-ended environments. Project Page: https://github.com/Princeton-AI2-Lab/Web-World-Models.

Subjects: Artificial Intelligence , Computation and Language , Computer Vision and Pattern Recognition

Publish: 2025-12-29 18:31:45 UTC


#14 End-to-End Test-Time Training for Long Context [PDF1] [Copy] [Kimi1] [REL]

Authors: Arnuv Tandon, Karan Dalal, Xinhao Li, Daniel Koceja, Marcel Rød, Sam Buchanan, Xiaolong Wang, Jure Leskovec, Sanmi Koyejo, Tatsunori Hashimoto, Carlos Guestrin, Jed McCaleb, Yejin Choi, Yu Sun

We formulate long-context language modeling as a problem in continual learning rather than architecture design. Under this formulation, we only use a standard architecture -- a Transformer with sliding-window attention. However, our model continues learning at test time via next-token prediction on the given context, compressing the context it reads into its weights. In addition, we improve the model's initialization for learning at test time via meta-learning at training time. Overall, our method, a form of Test-Time Training (TTT), is End-to-End (E2E) both at test time (via next-token prediction) and training time (via meta-learning), in contrast to previous forms. We conduct extensive experiments with a focus on scaling properties. In particular, for 3B models trained with 164B tokens, our method (TTT-E2E) scales with context length in the same way as Transformer with full attention, while others, such as Mamba 2 and Gated DeltaNet, do not. However, similar to RNNs, TTT-E2E has constant inference latency regardless of context length, making it 2.7 times faster than full attention for 128K context. Our code is publicly available.

Subject: Machine Learning

Publish: 2025-12-29 18:30:14 UTC


#15 The Bulldozer Technique: Efficient Elimination of Local Minima Traps for APF-Based Robot Navigation [PDF] [Copy] [Kimi] [REL]

Authors: Mohammed Baziyad, Manal Al Shohna, Tamer Rabie

Path planning is a fundamental component in autonomous mobile robotics, enabling a robot to navigate from its current location to a desired goal while avoiding obstacles. Among the various techniques, Artificial Potential Field (APF) methods have gained popularity due to their simplicity, real-time responsiveness, and low computational requirements. However, a major limitation of conventional APF approaches is the local minima trap problem, where the robot becomes stuck in a position with no clear direction toward the goal. This paper proposes a novel path planning technique, termed the Bulldozer, which addresses the local minima issue while preserving the inherent advantages of APF. The Bulldozer technique introduces a backfilling mechanism that systematically identifies and eliminates local minima regions by increasing their potential values, analogous to a bulldozer filling potholes in a road. Additionally, a ramp-based enhancement is incorporated to assist the robot in escaping trap areas when starting within a local minimum. The proposed technique is experimentally validated using a physical mobile robot across various maps with increasing complexity. Comparative analyses are conducted against standard APF, adaptive APF, and well-established planning algorithms such as A*, PRM, and RRT. Results demonstrate that the Bulldozer technique effectively resolves the local minima problem while achieving superior execution speed and competitive path quality. Furthermore, a kinematic tracking controller is employed to assess the smoothness and traceability of the planned paths, confirming their suitability for real-world execution.

Subject: Robotics

Publish: 2025-12-29 18:28:51 UTC


#16 Random Controlled Differential Equations [PDF] [Copy] [Kimi] [REL]

Authors: Francesco Piatti, Thomas Cass, William F. Turner

We introduce a training-efficient framework for time-series learning that combines random features with controlled differential equations (CDEs). In this approach, large randomly parameterized CDEs act as continuous-time reservoirs, mapping input paths to rich representations. Only a linear readout layer is trained, resulting in fast, scalable models with strong inductive bias. Building on this foundation, we propose two variants: (i) Random Fourier CDEs (RF-CDEs): these lift the input signal using random Fourier features prior to the dynamics, providing a kernel-free approximation of RBF-enhanced sequence models; (ii) Random Rough DEs (R-RDEs): these operate directly on rough-path inputs via a log-ODE discretization, using log-signatures to capture higher-order temporal interactions while remaining stable and efficient. We prove that in the infinite-width limit, these model induces the RBF-lifted signature kernel and the rough signature kernel, respectively, offering a unified perspective on random-feature reservoirs, continuous-time deep architectures, and path-signature theory. We evaluate both models across a range of time-series benchmarks, demonstrating competitive or state-of-the-art performance. These methods provide a practical alternative to explicit signature computations, retaining their inductive bias while benefiting from the efficiency of random features.

Subjects: Machine Learning , Machine Learning

Publish: 2025-12-29 18:25:10 UTC


#17 IDT: A Physically Grounded Transformer for Feed-Forward Multi-View Intrinsic Decomposition [PDF6] [Copy] [Kimi] [REL]

Authors: Kang Du, Yirui Guan, Zeyu Wang

Intrinsic image decomposition is fundamental for visual understanding, as RGB images entangle material properties, illumination, and view-dependent effects. Recent diffusion-based methods have achieved strong results for single-view intrinsic decomposition; however, extending these approaches to multi-view settings remains challenging, often leading to severe view inconsistency. We propose \textbf{Intrinsic Decomposition Transformer (IDT)}, a feed-forward framework for multi-view intrinsic image decomposition. By leveraging transformer-based attention to jointly reason over multiple input images, IDT produces view-consistent intrinsic factors in a single forward pass, without iterative generative sampling. IDT adopts a physically grounded image formation model that explicitly decomposes images into diffuse reflectance, diffuse shading, and specular shading. This structured factorization separates Lambertian and non-Lambertian light transport, enabling interpretable and controllable decomposition of material and illumination effects across views. Experiments on both synthetic and real-world datasets demonstrate that IDT achieves cleaner diffuse reflectance, more coherent diffuse shading, and better-isolated specular components, while substantially improving multi-view consistency compared to prior intrinsic decomposition methods.

Subject: Computer Vision and Pattern Recognition

Publish: 2025-12-29 18:24:46 UTC


#18 Automating the Analysis of Parsing Algorithms (and other Dynamic Programs) [PDF] [Copy] [Kimi] [REL]

Authors: Tim Vieira, Ryan Cotterell, Jason Eisner

Much algorithmic research in NLP aims to efficiently manipulate rich formal structures. An algorithm designer typically seeks to provide guarantees about their proposed algorithm -- for example, that its running time or space complexity is upper-bounded as a certain function of its input size. They may also wish to determine the necessary properties of the quantities derived by the algorithm to synthesize efficient data structures and verify type errors. In this paper, we develop a system for helping programmers to perform these types of analyses. We apply our system to a number of NLP algorithms and find that it successfully infers types, dead and redundant code, and parametric runtime and space complexity bounds.

Subject: Programming Languages

Publish: 2025-12-29 18:19:43 UTC


#19 Less is more: Probabilistic reduction is best explained by small-scale predictability measures [PDF2] [Copy] [Kimi1] [REL]

Authors: Cassandra L. Jacobs, Andrés Buxó-Lugo, Anna K. Taylor, Marie Leopold-Hooke

The primary research questions of this paper center on defining the amount of context that is necessary and/or appropriate when investigating the relationship between language model probabilities and cognitive phenomena. We investigate whether whole utterances are necessary to observe probabilistic reduction and demonstrate that n-gram representations suffice as cognitive units of planning.

Subject: Computation and Language

Publish: 2025-12-29 18:12:37 UTC


#20 A Review of Community-Centric Power Systems Resilience Assessment and Enhancement Strategies [PDF] [Copy] [Kimi] [REL]

Authors: Masoud H. Nazaria, Hamid Varmazyari, Antar Kumar Biswas, Umit Cali, Hollis Belnap, Masood Parvania

This paper presents a comprehensive review of resilience metrics, covering both engineering-based measures, such as fragility-curve modeling, and data-driven approaches, including triangular and trapezoidal representations. Next, the paper examines the interdependencies between power systems resilience and community resilience, addressing socioeconomic and behavioral dimensions, infrastructure interconnections, and the emerging role of resilience hubs. The review then synthesizes state-of-the-art strategies for enhancing power system resilience, including network hardening, resource allocation, optimal scheduling, and reconfiguration techniques. Special emphasis is placed on the integration of Artificial Intelligence (AI) methods and the techno-legal dimensions of resilient power systems and communities. In particular, the paper contrasts the regulatory landscapes of the European Union and the United States, highlighting key similarities and distinctions. By analyzing methodologies for mitigating the impacts of high-impact, low-probability (HILP) events, the review identifies critical research gaps and outlines promising directions for future investigation.

Subject: Systems and Control

Publish: 2025-12-29 18:11:09 UTC


#21 A note on the depth of optimal fanout-bounded prefix circuits [PDF] [Copy] [Kimi] [REL]

Author: Igor S. Sergeev

It is shown that the minimal depth of an optimal prefix circuit (i.e., a zero-deficiency circuit) on $N$ inputs with fanout bounded by $k$ is ${\log_{α_k} N \pm O(1)}$, where $α_k$ is the unique positive root of the polynomial ${2+x+ x^2+\ldots + x^{k-2}-x^k}$. This bound was previously known in the cases $k=2$ and $k=\infty$.

Subject: Data Structures and Algorithms

Publish: 2025-12-29 18:11:09 UTC


#22 Distributed Accountability in Democracy: Using MANETs and DTNs in the Face of Acts of Questionable Legality [PDF] [Copy] [Kimi] [REL]

Authors: Mathew Schmidheiser, Milena Radenkovic

In this paper, we explore the behavior of the Epidemic and Wave DTN routing protocols in a realistic setting where individuals may wish to communicate with others for support regarding an act of questionable legality. We identify situations where using the Epidemic routing protocol may be more advantageous in such a scenario, and situations where using the Wave routing protocol may be more advantageous instead. We discuss other aspects of our findings in detail and suggest multiple approaches to future works.

Subject: Networking and Internet Architecture

Publish: 2025-12-29 18:06:03 UTC


#23 Do You Have Freestyle? Expressive Humanoid Locomotion via Audio Control [PDF1] [Copy] [Kimi] [REL]

Authors: Zhe Li, Cheng Chi, Yangyang Wei, Boan Zhu, Tao Huang, Zhenguo Sun, Yibo Peng, Pengwei Wang, Zhongyuan Wang, Fangzhou Liu, Chang Xu, Shanghang Zhang

Humans intuitively move to sound, but current humanoid robots lack expressive improvisational capabilities, confined to predefined motions or sparse commands. Generating motion from audio and then retargeting it to robots relies on explicit motion reconstruction, leading to cascaded errors, high latency, and disjointed acoustic-actuation mapping. We propose RoboPerform, the first unified audio-to-locomotion framework that can directly generate music-driven dance and speech-driven co-speech gestures from audio. Guided by the core principle of "motion = content + style", the framework treats audio as implicit style signals and eliminates the need for explicit motion reconstruction. RoboPerform integrates a ResMoE teacher policy for adapting to diverse motion patterns and a diffusion-based student policy for audio style injection. This retargeting-free design ensures low latency and high fidelity. Experimental validation shows that RoboPerform achieves promising results in physical plausibility and audio alignment, successfully transforming robots into responsive performers capable of reacting to audio.

Subject: Robotics

Publish: 2025-12-29 17:59:24 UTC


#24 RoboMirror: Understand Before You Imitate for Video to Humanoid Locomotion [PDF3] [Copy] [Kimi1] [REL]

Authors: Zhe Li, Cheng Chi, Yangyang Wei, Boan Zhu, Tao Huang, Zhenguo Sun, Yibo Peng, Pengwei Wang, Zhongyuan Wang, Fangzhou Liu, Chang Xu, Shanghang Zhang

Humans learn locomotion through visual observation, interpreting visual content first before imitating actions. However, state-of-the-art humanoid locomotion systems rely on either curated motion capture trajectories or sparse text commands, leaving a critical gap between visual understanding and control. Text-to-motion methods suffer from semantic sparsity and staged pipeline errors, while video-based approaches only perform mechanical pose mimicry without genuine visual understanding. We propose RoboMirror, the first retargeting-free video-to-locomotion framework embodying "understand before you imitate". Leveraging VLMs, it distills raw egocentric/third-person videos into visual motion intents, which directly condition a diffusion-based policy to generate physically plausible, semantically aligned locomotion without explicit pose reconstruction or retargeting. Extensive experiments validate the effectiveness of RoboMirror, it enables telepresence via egocentric videos, drastically reduces third-person control latency by 80%, and achieves a 3.7% higher task success rate than baselines. By reframing humanoid control around video understanding, we bridge the visual understanding and action gap.

Subjects: Robotics , Computer Vision and Pattern Recognition

Publish: 2025-12-29 17:59:19 UTC


#25 A High-Order Spectral Element Solver for Steady-State Free Surface Flows [PDF] [Copy] [Kimi] [REL]

Authors: Simone Minniti, Jens Visbech, Claes Eskilsson, Nicola Parolini, Allan Peter Engsig-Karup

We present a spectral element solver for the steady incompressible Navier-Stokes equations subject to a free surface. Utilizing the kinematic behaviour of the free surface boundary, an iterative pseudo-time procedure is proposed to determine the a priori unknown free surface profile. The numerical model is implemented in the open-source finite element framework Firedrake, which enables the use of a high-order polynomial basis on unstructured meshes through weak formulations. Additionally, the curvature of the free surface and submerged bodies is incorporated through curvilinear elements obtained via transfinite linear blending, which conserves the high-order convergent properties of the overall scheme. The model is applied to several benchmark cases in two spatial dimensions. Initially, it addresses fixed-domain problems, including the lid-driven cavity flow and flows around bodies such as a cylinder and a NACA airfoil. Subsequently, with the presence of a free surface, it is extended to determine the flow around a bathymetry bump and a submerged NACA airfoil. The results confirm the high-order accuracy of the model through convergence studies and demonstrate a substantial speed-up over low-order numerical schemes.

Subject: Numerical Analysis

Publish: 2025-12-29 17:59:14 UTC