2025-12-08 | | Total: 59
Current speech-language models (SLMs) typically use a cascade of speech encoder and large language model, treating speech understanding as a single black box. They analyze the content of speech well but reason weakly about other aspects, especially under sparse supervision. Thus, we argue for explicit reasoning over speech states and actions with modular and transparent decisions. Inspired by cognitive science we adopt a modular perspective and a world model view in which the system learns forward dynamics over latent states. We factorize speech understanding into four modules that communicate through a causal graph, establishing a cognitive state search space. Guided by posterior traces from this space, an instruction-tuned language model produces a concise causal analysis and a user-facing response, enabling counterfactual interventions and interpretability under partial supervision. We present the first graph based modular speech model for explicit reasoning and we will open source the model and data to promote the development of advanced speech understanding.
Accurate real-time waypoints estimation for the UAV-based online Terrain Following during wildfire patrol missions is critical to ensuring flight safety and enabling wildfire detection. However, existing real-time filtering algorithms struggle to maintain accurate waypoints under measurement noise in nonlinear and time-varying systems, posing risks of flight instability and missed wildfire detections during UAV-based terrain following. To address this issue, a Residual Variance Matching Recursive Least Squares (RVM-RLS) filter, guided by a Residual Variance Matching Estimation (RVME) criterion, is proposed to adaptively estimate the real-time waypoints of nonlinear, time-varying UAV-based terrain following systems. The proposed method is validated using a UAV-based online terrain following system within a simulated terrain environment. Experimental results show that the RVM-RLS filter improves waypoints estimation accuracy by approximately 88$\%$ compared with benchmark algorithms across multiple evaluation metrics. These findings demonstrate both the methodological advances in real-time filtering and the practical potential of the RVM-RLS filter for UAV-based online wildfire patrol.
We show that the dynamics of a thrusting spacecraft can be embedded in the Lie group SE2(3) in a form that is group-affine with application of a feed-forward control law. This structure implies that the configuration-tracking error evolves exactly linearly in the associated Lie algebra coordinates (log-linear dynamics), rather than arising from a local linearization of the nonlinear system. As a result, a broad class of linear analysis and synthesis tools becomes directly applicable to powered spacecraft motion on SE2(3). A simple numerical example confirms that the error predicted by the linear Lie-algebra dynamics matches the error computed from the full nonlinear system, illustrating the exact log-linear behavior. This foundational property opens a path toward rigorous tools for satellite docking, autonomous rendezvous and proximity operations, robust controller design, and convex safety certification-capabilities that are difficult to achieve with classical local linearizations such as Tschauner-Hempel/Yamanaka-Ankersen (TH/YA).
The transition toward power grids with high renewable penetration demands context-aware decision making frameworks. Traditional operational paradigms, which rely on static optimization of history-based load forecasting, often fail to capture the complex nature of real-time operational conditions, such as operator-issued maintenance mandates, emergency topology changes, or event-driven load surges. To address this challenge, we introduce InstructMPC, a closed-loop framework that integrates Large Language Models~(LLMs) to generate context-aware predictions, enabling the controller to optimize power system operation. Our method employs a Contextual Disturbances Predictor~(CDP) module to translate contextual information into predictive disturbance trajectories, which are then incorporated into the Model Predictive Control~(MPC) optimization. Unlike conventional open-loop forecasting frameworks, InstructMPC features an online tuning mechanism where the predictor's parameters are continuously updated based on the realized control cost with a theoretical guarantee, achieving a regret bound of $O(\sqrt{T \log T})$ for linear dynamics when optimized via a tailored loss function, ensuring task-aware learning and adaption to non-stationary grid conditions.
We propose a simple (12 parameter) hybrid dynamic model that simultaneously captures the continuous-valued dynamics of three human cognitive states-trust, perceived risk, and mental workload-as well as discrete transitions in reliance on the automation. The discrete-time dynamic evolution of each cognitive state is modeled using a first-order affine difference equation. Reliance is defined as a single discrete-valued state, whose evolution at each time step depends on the cognitive states satisfying certain threshold conditions. Using data collected from 16 participants, we estimate participant-specific model parameters based on their reliance on the automation and intermittently self-reported cognitive states during a continuous drive in a vehicle simulator. The model can be estimated using a single user's trajectory data (e.g. 8 minutes of driving), making it suitable for online parameter adaptation methods. Our results show that the model fits the observed trajectories well for several participants, with their reliance behavior primarily influenced by trust, perceived risk, or both. Importantly, the model is interpretable, such that the variations in model parameters across participants provide insights into differences in the time scales over which cognitive states evolve, and how these states are influenced by task complexity. Implications on the design of human-centric vehicle automation design are discussed.
Background and objective: Hybrid automated insulin delivery (hAID) systems represent the most advanced therapy for type 1 diabetes (T1D). Current systems rely on linear or linearized models of glucose homeostasis, which may compromise prediction accuracy and, in turn, timely decision-making by the controller. Physiological variability further complicates insulin requirements, underscoring the need for controllers that adapt dynamically and reduce user burden. Methods: We introduce the University of Bern (UniBE) hAID system, a framework based on successive linearization model predictive control (MPC). The controller integrates basal insulin infusion with the insulin bolus delivery module for meal-related and corrective bolus dosing, adapting bounds in real time to glucose dynamics while accounting for both automated and user-initiated inputs. In-silico evaluation was conducted using the commercial version of the FDA-accepted UVa/Padova metabolic simulator across nine scenarios involving persistent and time-varying errors in meal timing, carbohydrate estimation, and basal insulin profiles. Results: In the baseline scenario, UniBE achieved a mean time in range of 92.0+-13.2%, with time below range at 0.1+-0.2% and time above range at 7.9+-13.2%. Across perturbation scenarios, time in range remained between 75.1 and 92.8%, with low hypoglycemia incidence, demonstrating resilience to clinically relevant disturbances.
The increasing power densities and intricate heat dissipation paths in advanced 2.5D/3D chiplet systems necessitate thermal modeling frameworks that deliver detailed thermal maps with high computational efficiency. Traditional compact thermal models (CTMs) often struggle to scale with the complexity and heterogeneity of modern architectures. This work introduces 3D-ICE 4.0, designed for heterogeneous chip-based systems. Key innovations include: (i) preservation of material heterogeneity and anisotropy directly from industrial layouts, integrated with OpenMP and SuperLU MT-based parallel solvers for scalable performance, (ii) adaptive vertical layer partitioning to accurately model vertical heat conduction, and (iii) temperature-aware non-uniform grid generation. The results with different benchmarks demonstrate that 3D-ICE 4.0 achieves speedups ranging from 3.61x-6.46x over state-of-the-art tools, while reducing grid complexity by more than 23.3% without compromising accuracy. Compared to the commercial software COMSOL, 3D-ICE 4.0 effectively captures both lateral and vertical heat flows, validating its precision and robustness. These advances demonstrate that 3D-ICE 4.0 is an efficient solution for thermal modeling in emerging heterogeneous 2.5D/3D integrated systems.
This paper presents a safe output regulation control strategy for a class of systems modeled by a coupled $2\times 2$ hyperbolic PDE-ODE structure, subject to fully distributed disturbances throughout the system. A state-feedback controller is developed by the {nonovershooting backstepping} method to simultaneously achieve exponential output regulation and enforce safety constraints on the system output, which is the state furthest from the control input. To handle unmeasured PDE states and external disturbances, a state observer and a disturbance estimator are designed. Explicit bounds on the estimation errors are derived and used to construct a robust safe regulator that accounts for the uncertainties. The proposed control scheme guarantees that: 1) If the system output is initially within the safe region, it remains there; otherwise, it will be rescued to the safety within a prescribed time; 2) The output tracking error converges to zero exponentially; 3) The observer accurately estimates both the distributed states and external disturbances, with estimation errors converging to zero exponentially; 4) All signals in the closed-loop system remain bounded. The effectiveness of the proposed method is demonstrated through a UAV delivery scenario with a cable-suspended payload, where the payload is regulated to track a desired reference while avoiding collisions with barriers.
The success of collaborative task completion among networked devices hinges on the effective selection of trustworthy collaborators. However, accurate task-specific trust evaluation of multi-hop collaborators can be extremely complex. The reason is that their trust evaluation is determined by a combination of diverse trust-related perspectives with different characteristics, including historical collaboration reliability, volatile and sensitive conditions of available resources for collaboration, as well as continuously evolving network topologies. To address this challenge, this paper presents a graph neural network (GNN)-aided distributed agentic AI (GADAI) framework, in which different aspects of devices' task-specific trustworthiness are separately evaluated and jointly integrated to facilitate multi-hop collaborator selection. GADAI first utilizes a GNN-assisted model to infer device trust from historical collaboration data. Specifically, it employs GNN to propagate and aggregate trust information among multi-hop neighbours, resulting in more accurate device reliability evaluation. Considering the dynamic and privacy-sensitive nature of device resources, a privacy-preserving resource evaluation mechanism is implemented using agentic AI. Each device hosts a large AI model-driven agent capable of autonomously determining whether its local resources meet the requirements of a given task, ensuring both task-specific and privacy-preserving trust evaluation. By combining the outcomes of these assessments, only the trusted devices can coordinate a task-oriented multi-hop cooperation path through their agents in a distributed manner. Experimental results show that our proposed GADAI outperforms the comparison algorithms in planning multi-hop paths that maximize the value of task completion.
The impedance criterion has emerged as an alternative way to stability assessment of grid-connected power electronic converters. However, the lack of physical meaning of impedance and admittance matrices hinders the ability to understand the root cause of instabilities. To address this issue, this paper proposes the application of Pauli decomposition to the impedance matrices and the minor loop of grid-connected power electronic converters. The application of this methodology simplifies establishing the link between impedance matrix terms and closed-loop stability properties. Moreover, Pauli decomposition transforms impedance matrices in a quaternion-like form that is helpful to assess the root cause of instabilities. The theoretical contributions are validated using a case study consisting of a power electronic converter connected to a weak grid that has been previously analysed in the literature using existing techniques.
This paper addresses the synthesis of slow-time coded waveforms for single target tracking in a radar network operating under colored Gaussian interference. Based on the Posterior Cramér Rao Lower Bound (PCRLB), which characterizes the theoretically optimal accuracy of target state estimation, the problem at each tracking frame is formulated as the minimization of the trace of the PCRLB, together with power budget requirements and a similarity constraint to account for transmitter limitations and appropriate waveform features. To tackle this challenging optimization problem, an approximation solution technique is proposed, aimed at better tracking accuracy than the reference code. The resulting approximated problems, endowed with more tractable objective functions through Taylor-series expansion, are solved using a customized block Majorization-Minimization (block-MM) algorithm. The convergence properties of the developed procedure are thoroughly analyzed. Numerical results illustrate the accuracy improvements in the target state estimation process, and robust tracking performance under uncertain target state conditions achieved by the proposed technique.
Fluid antenna multiple access (FAMA) has recently emerged as a simple, promising scheme for large-scale multiuser connectivity, offering strong scalability with low implementation complexity. Nevertheless, most existing FAMA studies focus on downlink transmission under perfect channel state information (CSI) at the receiver side, while the uplink counterpart remains largely unexplored. This paper proposes a novel codebook-based port selection and combining (CPSC) FAMA framework for the uplink communications without CSI at the base station (BS). In the proposed scheme, a predefined codebook is designed and broadcast by the BS. Each user equipment (UE) employs a fluid antenna, acquires its local CSI and independently chooses the most suitable codeword, activates the corresponding fluid antenna ports, and determines the combining weights to achieve a two-way match between the selected codeword and the instantaneous effective channel. The BS then separates the superimposed user signals through codebook-guided projection operations without requiring global CSI or multiuser joint optimization. To handle potential codeword collisions, three lightweight scheduling strategies are introduced, offering flexible trade-offs between signaling overhead and collision avoidance. Simulation results demonstrate that the proposed CPSC-FAMA approach achieves substantially higher rates than fixed-antenna systems while maintaining low complexity. Moreover, the results confirm that amortizing the optimization cost over the UEs effectively reduces the BS processing burden and enhances scalability, making the proposed scheme a strong candidate for future sixth-generation (6G) networks.
An antenna coding approach for exploiting the spatial multiplexing capability of pixel antennas is proposed. This approach can leverage additional degrees of freedom in the beamspace domain to transmit more information streams. Pixel antennas are a general reconfigurable antenna design where a radiating structure with arbitrary shape and size can be discretized into sub-wavelength elements called pixels which are connected by radio frequency switches. By controlling the switch states, the pixel antenna topology can be flexibly adjusted so that the resulting radiation pattern can be reconfigured for beamspace spatial multiplexing. In this work, we introduce the antenna coder and pattern coder for pixel antennas, provide a multiple-input multiple-output (MIMO) communication system model with antenna coding in the beamspace domain, and derive the spectral efficiency. Utilizing the antenna coder, the radiation pattern of the pixel antenna is analyzed and efficient optimization algorithms are provided for antenna coding design. Numerical simulation results show that the proposed technique using pixel antennas can enhance spectral efficiency of 4-by-4 MIMO by up to 12 bits/s/Hz or equivalently reduce the required transmit power by up to 90% when compared to conventional MIMO, demonstrating the effectiveness of the antenna coding technique in spectral efficiency enhancement and its promise for future sixth generation (6G) wireless communication.
We demonstrate and experimentally validate an end-to-end hybrid CMOS-memristor auditory encoder that realises adaptive-threshold, asynchronous delta-modulation (ADM)-based spike encoding by exploiting the inherent volatility of HfTiOx devices. A spike-triggered programming pulse rapidly raises the ADM threshold Delta (desensitisation); the device's volatility then passively lowers Delta when activity subsides (resensitisation), emphasising onsets while restoring sensitivity without static control energy. Our prototype couples an 8-channel 130 nm encoder IC to off-chip HfTiOx devices via a switch interface and an off-chip controller that monitors spike activity and issues programming events. An on-chip current-mirror transimpedance amplifier (TIA) converts device current into symmetric thresholds, enabling both sensitive and conservative encoding regimes. Evaluated with gammatone-filtered speech, the adaptive loop-at matched spike budget-sharpens onsets and preserves fine temporal detail that a fixed-Delta baseline misses; multi-channel spike cochleagrams show the same trend. Together, these results establish a practical hybrid CMOS-memristor pathway to onset-salient, spike-efficient neuromorphic audio front-ends and motivate low-power single-chip integration.
Model predictive control (MPC) is a powerful control method that allows to directly include state and input constraints into the controller design. However, errors in the model, e.g., caused by unknown disturbances, can lead to constraint violation, loss of feasibility and deteriorate closed-loop performance. In this paper, we propose a new MPC scheme based on the internal model principle. This enables the MPC to reject unknown disturbances provided that the dynamics of the linear signal generator are known. We reformulate the output regulation problem as a stability problem, to ensure feasibility, constraint satisfaction, and convergence to the optimal reachable setpoint. The controller is validated on a real fourtank system.
Autonomous highway driving demands a critical balance between proactive, efficiency-seeking behavior and robust safety guarantees. This paper proposes Language Action-guided Reinforcement Learning (LA-RL) with Safety Guarantees, a novel framework that integrates the semantic reasoning of large language models (LLMs) into the actor-critic architecture with an improved safety layer. Within this framework, task-specific reward shaping harmonizes the dual objectives of maximizing driving efficiency and ensuring safety, guiding decision-making based on both environmental insights and clearly defined goals. To enhance safety, LA-RL incorporates a safety-critical planner that combines model predictive control (MPC) with discrete control barrier functions (DCBFs). This layer formally constrains the LLM-informed policy to a safe action set, employs a slack mechanism that enhances solution feasibility, prevents overly conservative behavior and allows for greater policy exploration without compromising safety. Extensive experiments demonstrate that it significantly outperforms several current state-of-the-art methods, offering a more adaptive, reliable, and robust solution for autonomous highway driving. Compared to existing SOTA, it achieves approximately 20$\%$ higher success rate than the knowledge graph (KG) based baseline and about 30$\%$ higher than the retrieval augmented generation (RAG) based baseline. In low-density environments, LA-RL achieves a 100$\%$ success rate. These results confirm its enhanced exploration of the state-action space and its ability to autonomously adopt more efficient, proactive strategies in complex, mixed-traffic highway environments.
Semantic communication systems aim to transmit task-relevant information between devices capable of artificial intelligence, but their performance can degrade when heterogeneous transmitter-receiver models produce misaligned latent representations. Existing semantic alignment methods typically rely on additional digital processing at the transmitter or receiver, increasing overall device complexity. In this work, we introduce the first over-the-air semantic alignment framework based on stacked intelligent metasurfaces (SIM), which enables latent-space alignment directly in the wave domain, reducing substantially the computational burden at the device level. We model SIMs as trainable linear operators capable of emulating both supervised linear aligners and zero-shot Parseval-frame-based equalizers. To realize these operators physically, we develop a gradient-based optimization procedure that tailors the metasurface transfer function to a desired semantic mapping. Experiments with heterogeneous vision transformer (ViT) encoders show that SIMs can accurately reproduce both supervised and zero-shot semantic equalizers, achieving up to 90% task accuracy in regimes with high signal-to-noise ratio (SNR), while maintaining strong robustness even at low SNR values.
In this note, we present a novel synchronization framework for heterogeneous multi-agent systems enabled by neuro-spike communication, which induces emergence. Unlike conventional synchronization strategies that require continuous transmission of full-state data packets, our approach utilizes a bio-inspired neuromorphic amplifier to achieve practical synchronization via intermittent, 1-bit Dirac delta pulses. The proposed method drastically improves communication efficiency in terms of bandwidth and energy by minimizing the information payload to a single bit, with intermittent and asynchronous communication. We provide a rigorous convergence analysis of the proposed method and validate the proposed scheme through numerical examples.
This paper proposes an active learning method for designing experiments to identify quasi-Linear Parameter-Varying (qLPV) models. Since informative experiments are costly, input signals must be selected to maximize information content based on the currently available model. To improve the extrapolation properties of the identified model, we introduce a manifold-regularization strategy that enforces smooth variations in the qLPV dynamics, promoting Linear Time-Varying (LTV) behavior. Using this regularized structure, we propose a new active learning criterion based on path integrals of an inverse-distance variance measure and derive an efficient approximation exploiting the LTV smoothness. Numerical examples show that the proposed regularization enhances qLPV extrapolation and that the resulting active learning scheme accelerates the identification process.
In this work, we analyze the internal and boundary stabilization of the Cahn-Hilliard and Kuramoto-Sivashinsky equations under saturated feedback control. We conduct our study through the spectral analysis of the associated linear operator. We identify a finite number of eigenvalues related to the unstable part of the system and then design a stabilization strategy based on modal decomposition, linear matrix inequalities (LMIs), and geometric conditions on the saturation function. Local exponential stabilization in $H^{2}$ is established.
Reliable state estimation depends on accurately modeled noise covariances, which are difficult to determine in practice. This paper formulates the noise covariance estimation as a bilevel optimization problem that factorizes the joint likelihood of primary and supervisory measurements to reconcile information exploitation with computational tractability. The factorization converts the nested Bayesian dependency into a Markov-chain structure, allowing efficient computation. At the lower level, a Kalman filter with state augmentation performs such computation. Meanwhile, closed-form forward and reverse differentiation provide efficient gradients for the upper-level updates, and we compare the two models' space and time complexities to inform their practical selection. The upper level subsequently refines the noise covariances to guide the lower-level estimation. Taken together, the proposed algorithms offer a systematic and computationally efficient approach to noise covariance estimation in linear Gaussian systems.
Monitoring physiological and behavioral parameters of laboratory rodents is fundamental for biomedical research, yet conventional techniques often rely on invasive sensors or frequent handling that can induce stress and compromise data fidelity. To address these limitations, this paper presents a contactless and non-invasive in-vivo monitoring system based on a low-power 60 GHz frequency-modulated continuous wave (FMCW) radar. The proposed system enables simultaneous detection of rodent activity and vital signs directly within home-cage environments, eliminating the need for implants, electrodes, or human intervention. The hardware platform leverages a compact Infineon BGT60 series radar sensor, optimized for low power consumption and continuous operation. We investigate sensor placement strategies and design a complete signal processing pipeline, including range bin selection, phase extraction, and frequency-domain estimation tailored to rodent vital signs. The system achieves 3 cm and 0.1 m/s sensitivity for motion and activity detection, while allowing discrimination of micro-movements associated with cardiopulmonary activity with a 2 um distance resolution. Experimental validation with two rodents in realistic in-vivo cages demonstrates that the radar can track animal position and extract respiration rates with 2 bpm accuracy. By minimizing stress and disturbance, this work improves both animal welfare and the reliability of physiological measurements, offering a refined alternative to traditional monitoring methods. This work represents the first demonstration of continuous radar-based vital sign monitoring in freely moving rodents within group-housed cages. The proposed approach lays the foundation for scalable, automated, and ethical monitoring solutions in preclinical and translational research.
The rapid advancement of generative models, particularly diffusion-based methods, has significantly improved the realism of synthetic images. As new generative models continuously emerge, detecting generated images remains a critical challenge. While fully supervised, and few-shot methods have been proposed, maintaining an updated dataset is time-consuming and challenging. Consequently, zero-shot methods have gained increasing attention in recent years. We find that existing zero-shot methods often struggle to adapt to specific image domains, such as artistic images, limiting their real-world applicability. In this work, we introduce CLIDE, a novel zero-shot detection method based on conditional likelihood approximation. Our approach computes likelihoods conditioned on real images, enabling adaptation across diverse image domains. We extensively evaluate CLIDE, demonstrating state-of-the-art performance on a large-scale general dataset and significantly outperform existing methods in domain-specific cases. These results demonstrate the robustness of our method and underscore the need of broad, domain-aware generalization for the AI-generated image detection task. Code is available at https://github.com/FujitsuResearch/domain_adaptive_image_detection.
This paper presents a method for solving the Inverse Stochastic Differential Game (ISDG) problem in finite-horizon linear-quadratic Gaussian (LQG) differential games. The objective is to recover cost function parameters of all players, as well as noise scaling parameters of the stochastic system, consistent with observed trajectories. The proposed framework combines (i) estimation of the feedback strategies, (ii) identification of the cost function parameters via a novel reformulation of the coupled Riccati differential equations, and (iii) maximum likelihood estimation of the noise scaling parameters. Simulation results demonstrate that the approach recovers parameters, yielding trajectories that closely match the observed trajectories.
This paper investigates the problem of safety certification for black-box discrete-time stochastic systems, where both the system dynamics and disturbance distributions are unknown, and only sampled data are available. Under such limited information, ensuring robust or classical quantitative safety over finite or infinite horizons is generally infeasible. To address this challenge, we propose a data-driven framework that provides theoretical one-step safety guarantees in the Probably Approximately Correct (PAC) sense. This one-step guarantee can be applied recursively at each time step, thereby yielding step-by-step safety assurances over extended horizons. Our approach formulates barrier certificate conditions based solely on sampled data and establishes PAC safety guarantees by leveraging the VC dimension, scenario approaches, Markov's inequality, and Hoeffding's inequality. Two sampling procedures are proposed, and three methods are proposed to derive PAC safety guarantees. The properties and comparative advantages of these three methods are thoroughly discussed. Finally, the effectiveness of the proposed methods are demonstrated through several numerical examples.