Electrical Engineering and Systems Science

2026-06-17 | | Total: 90

#1 Receiver-Aware Analysis and Verification of the Spectral Separation Coefficient Under Interference-Induced Degradation [PDF] [Copy] [Kimi] [REL]

Authors: Lucas Heublein, Fabian Benschuh, Alexander Rügamer, Felix Ott

Interference poses a significant challenge to satellite-based positioning systems, making it essential to accurately quantify the effects of specific interference types on receiver performance and the resulting reliability of position computation. In current practice, interference effects are often quantified using receiver-independent metrics, with receiver-specific front-end characteristics either idealized or only implicitly considered. In this paper, we address this limitation by explicitly incorporating receiver-specific front-end characteristics into the computation of interference effects and validating the resulting receiver-dependent analysis experimentally. Therefore, we record a real-world open-field dataset comprising 210 distinct interference scenarios and compute the receiver-dependent spectral separation coefficient (SSC) and interference impact for a specific receiver module. Furthermore, we verify the computation using a controlled dataset generated with a radio frequency constellation simulator (RFCS), employing the same receiver module and replaying similar interferences classes. The comparison of results obtained in both environments demonstrates the robustness of the interference impact computation.

Subject: Signal Processing

Publish: 2026-06-16 17:26:15 UTC


#2 Channel Charting for Position and Orientation [PDF] [Copy] [Kimi] [REL]

Authors: Daniel Richner, Reinhard Wiesmayr, Frederik Zumegen, Christoph Studer

Channel charting (CC) in real-world coordinates is a recently proposed self-supervised machine learning method that maps high-dimensional channel state information (CSI) to user equipment (UE) position. In this paper, we extend CC to also estimate UE orientation, which can further assist tasks such as beamfinding, precoding, and beam- and cell-assignment. To this end, we propose a novel orientation triplet loss that accounts for angle periodicity and an alignment loss that embeds estimated orientations in real-world coordinates in a self-supervised fashion. Using real-world CSI measurements from a standard-compliant 5G NR system, we demonstrate that the proposed method achieves position and orientation estimation accuracy close to that of supervised approaches trained with ground-truth labels.

Subjects: Signal Processing , Information Theory

Publish: 2026-06-16 16:48:31 UTC


#3 Spatial and Temporal Generalization of CSI-based Neural Positioning [PDF] [Copy] [Kimi] [REL]

Authors: Till-Yannic Müller, Frederik Zumegen, Reinhard Wiesmayr, Christoph Studer

Channel state information (CSI)-based neural positioning learns a mapping from CSI measurements to user equipment (UE) positions using neural networks. However, most existing performance evaluations utilize randomly partitioned train/test CSI-dataset splits, which fail to reflect the generalization requirements of practical deployments and present optimistic results. In this paper, we study the spatial and temporal generalization of neural positioning with standard-compliant Wi-Fi and 5G NR systems for three real-world CSI datasets acquired in indoor and outdoor environments. We assess generalization with two different architectures, a conventional multilayer perceptron (MLP) and a novel transformer architecture, to unseen spatial regions, unseen UE trajectories, and CSI measurement campaigns separated by one week. Our experiments show that both architectures generalize well in space and time, and the proposed transformer consistently outperforms the MLP in positioning accuracy while requiring fewer model parameters.

Subjects: Signal Processing , Information Theory

Publish: 2026-06-16 16:48:06 UTC


#4 Grounding Spoken LLMs in Multi-Speaker Audio via Diarization Conditioning [PDF1] [Copy] [Kimi] [REL]

Authors: Alexander Polok, Samuele Cornell, Sathvik Udupa, Jan Černocký, Shinji Watanabe, Lukáš Burget

We propose diarization-conditioned spoken language models (SLMs), a strategy for extending SLMs to far-field multi-talker audio. Rather than adapting the decoder via Serialized Output Training, which risks catastrophic forgetting, we condition the acoustic encoder on diarization masks to extract target-speaker representations, keeping the decoder frozen. We instantiate this as Dixtral, integrating a Diarization Conditioned Whisper (DiCoW) encoder into the Voxtral SLM. On AMI, NOTSOFAR-1, LibriSpeechMix, and Mixer6, Dixtral outperforms Gemini 3.0 Flash, VibeVoice, and Voxtral Mini Transcribe V2 on speaker-attributed transcription by 29.0%, 19.8%, and 16.0% absolute cpWER respectively. On a novel long-form multi-speaker QA benchmark, zero-shot Dixtral matches Gemini on far-field content understanding, and when fine-tuned surpasses both Gemini and Voxtral operating on close-talk across all tasks.

Subject: Audio and Speech Processing

Publish: 2026-06-16 16:34:35 UTC


#5 Decentralized Decision-Making for Finite-State Systems over Finite Alphabets is Undecidable [PDF] [Copy] [Kimi] [REL]

Author: Xiang Yin

This paper investigates decentralized decision-making for finite-state transition systems, i.e., discrete-event systems, under finite communication alphabet constraints. We consider a general decentralized observation framework in which a plant is observed by multiple local agents that transmit symbolic messages over a finite alphabet to a memoryless fusion center. The fusion center then produces a binary decision according to a prescribed fusion rule. We study the fundamental question of whether there exist local decision maps that enable exact reconstruction of a given regular specification language from decentralized observations. Contrary to classical results that rely on specific monotone fusion rules such as conjunction and disjunction, we show that the problem becomes undecidable even under a severely restricted information architecture: binary local decision alphabets and a fixed exclusive-or (XOR) fusion rule. The proof is based on a reduction from the Thue word problem, a classical undecidable problem in rewriting systems. We further show that decentralized supervisory control, decentralized fault diagnosis, and decentralized fault prognosis are also undecidable under finite communication alphabets. Our results reveal that existing decidability results fundamentally rely on structural properties of fusion rules, in particular their monotone order-preserving nature. In contrast, non-monotone fusion rules such as XOR break this structure, leading to undecidability even in highly restricted settings.

Subject: Systems and Control

Publish: 2026-06-16 16:25:31 UTC


#6 Verifiable computations for dynamic encrypted control [PDF] [Copy] [Kimi] [REL]

Authors: Sebastian Schlor, Frank Allgöwer

Encrypted control can preserve the privacy of data and parameters while the necessary computations can be outsourced to a cloud server. To ensure the integrity of the received values from the cloud, i.e., that they have not been changed, however, strong assumptions or verification algorithms are needed. Previous methods require computationally expensive cryptographic protocols or are only applicable to static computations. In this paper, we present a novel type of verification algorithm for linear dynamic encrypted control. We utilize system-theoretic input-output properties of the controller for artificial challenge signals, which are processed in the cloud in parallel with the requested control input, to check the correctness of the results at the plant. This results in almost no additional computational load, wrong computations are revealed with high probability, and no replay attacks are possible.

Subjects: Systems and Control , Cryptography and Security

Publish: 2026-06-16 16:15:12 UTC


#7 A Generic Multi-dimensional Symbol Construction for Digital Over-the-Air Computation and Practical Aspects [PDF] [Copy] [Kimi] [REL]

Author: Alphan Sahin

In this paper, we propose a general-purpose multi-dimensional symbol construction for computing an arbitrary symmetric function with digital over-the-air computation (OAC) and discuss the practical aspects of coherent aggregation. For our first contribution, we discuss the categorical representation of a symmetric function. By using this representation and leveraging the sufficiency of the histogram to evaluate a symmetric function, i.e., inspired by type-based multiple access (TBMA), we introduce a general approach to design a single set of OAC symbols to compute any digital function. For our second contribution, we use a comprehensive platform based on low-cost nodes that maintain synchronization in time, frequency, phase, and amplitude via a trigger mechanism, enabling coherent OAC experiments without Global Positioning System (GPS) or cable-based synchronization. Using measurements from the platform, we characterize the phase and amplitude statistics of the composite channel to derive a realistic impairment model for coherent OAC. Through a comprehensive analysis, we demonstrate the effectiveness of the proposed scheme under impairments captured by the proposed model

Subjects: Signal Processing , Information Theory

Publish: 2026-06-16 15:47:02 UTC


#8 One-Step Token-to-Waveform Generation with MeanFlow in Latent Space [PDF] [Copy] [Kimi] [REL]

Authors: Zheqi Dai, Guangyan Zhang, Zhen Ye, Jingyu Li, Haolin He, Chunyat Wu, Yiwen Guo, Qiuqiang Kong

Neural audio codecs are central to modern LLM-based Text-to-Speech (TTS) and multimodal systems. As low-bitrate semantic codecs gain prominence, the Token-to-Waveform (Token2Wav) decoder becomes a bottleneck determining both perceptual quality and system efficiency. Conventional multi-step flow-matching decoders offer superior quality but suffer from high inference latency due to iterative sampling, creating a severe quality-speed trade-off. In this paper, we propose a novel Token2Wav architecture that overcomes this limitation by applying MeanFlow in a highly compressed latent space. By modeling the average velocity rather than the instantaneous velocity field, MeanFlow enables true one-step generation. Operating in the latent domain mitigates the memory and stability issues of waveform-level flows, yielding up to a 17$\times$ improvement in Real-Time Factor (RTF) compared to multi-step baselines with negligible quality degradation. Furthermore, we introduce refinement strategies that mitigate latent mismatch, including decoder-only fine-tuning with the MeanFlow generator frozen and end-to-end joint fine-tuning, improving fidelity without increasing inference-time cost. Code and demo are publicly available.

Subject: Audio and Speech Processing

Publish: 2026-06-16 15:40:37 UTC


#9 Multiscale reconstruction of protein conformations from cryo-EM images [PDF] [Copy] [Kimi] [REL]

Authors: David Y. W. Thong, Ozan Öktem, Joakim Andén

We present a novel multiscale algorithm for directly recovering the atomic model structure of a protein from single-particle cryo-EM data. Our algorithm is able to estimate protein structures to state-of-the-art accuracy for high-noise and low-contrast data. It is also robust to misspecifications in the TEM image formation model. These desirable properties are primarily due to the use of an explicit representation of the protein backbone in terms of bonds, torsion angles and bond angles, which supplies rich prior information to the structure recovery process. We apply our method on three protein cryo-EM datasets, generated using an electron microscope digital twin, and show that using a multiscale approach yields an improvement of the root-mean-square deviation (RMSD) and template modelling (TM) scores with respect to the ground truth. Furthermore, there is evidence that larger-scale structures are being prioritised with the multiscale algorithm, which reduces the possibility of convergence to bad local minima.

Subjects: Image and Video Processing , Quantitative Methods

Publish: 2026-06-16 15:35:31 UTC


#10 AI-based Cognitive-linguistic Features for Dementia Assessment in Picture Description [PDF] [Copy] [Kimi] [REL]

Authors: Lingfeng Xu, Prad Kadambi, Samuel Goldinger, Visar Berisha, Kimberly D. Mueller, Julie Liss

Picture descriptions provide valuable insights into several clinical constructs related to cognitive-linguistic abilities. However, operationalizing these constructs into quantitative measures remains challenging, limiting interpretability and clinical utility. We introduced seven constructs tailored to the Cookie Theft picture description task and prompted large language models (LLMs) to evaluate them, generating severity scores and example-based explanations. Among the examined LLMs, Claude 3.5 Sonnet performed the best, producing severity scores that significantly distinguish cognitively impaired individuals from healthy controls. The model achieves a high accuracy of 85% on the ADReSS dataset. Expert evaluation of Claude's scores and explanations yields a 3.99/5 average agreement. The findings demonstrate the potential of LLMs to operationalize clinical constructs and generate interpretable evaluations, offering a promising approach for accessible cognitive screening tools.

Subject: Audio and Speech Processing

Publish: 2026-06-16 15:32:03 UTC


#11 Reading between the Lines: Leveraging Large Language Models for Global Dementia and Depression Assessment from Clinical Interviews [PDF] [Copy] [Kimi] [REL]

Authors: Franziska Braun, Alea Rüggeberg, Thomas Ranzenberger, Hartmut Lehfeld, Thomas Hillemacher, Tobias Bocklet, Korbinian Riedhammer

Dementia and depression are the most prevalent neuropsychiatric disorders in geriatric populations, and their overlapping symptoms pose major challenges for differential diagnosis. In this study, we investigate open-weights Large Language Models (LLMs) for predicting dementia and depression severity from speech samples collected during standardized history taking interviews with 154 German-speaking subjects. We introduce an observer-based Global Depression Scale (GDS-D) aligned with the established Global Deterioration Scale (GDS), enabling parallel global staging of affective and cognitive symptoms. We compare three LLMs (Mistral 3.1, DeepHermes, Qwen3) in two settings: (1) zero-shot prediction and (2) LLM-based feature extraction for Support Vector Regression, using human and pause-enriched transcripts. Results show that LLMs effectively predict depression severity in zero-shot settings (best MAE of 0.60), while dementia assessment benefits substantially from structured feature extraction (best MAE of 0.78), reducing errors by up to 35% over zero-shot baselines. Pause-enriched transcripts achieve competitive performance with human transcriptions, demonstrating the viability of fully automatic screening pipelines for differential neuropsychiatric assessment.

Subjects: Audio and Speech Processing , Computation and Language , Sound

Publish: 2026-06-16 15:01:30 UTC


#12 On the Optimum Energy-per-bit Launch Power in Coherent Hollow-core Fibre Transmission Systems [PDF] [Copy] [Kimi] [REL]

Authors: Ronit Sohanpal, Eric Sillekens, Mindaugas Jarmolovicius, Robert I. Killey, Polina Bayvel

We investigate the optimum energy per bit in hollow-core-fibre transmission systems. We show that a 1000 km C-band link can achieve a 41.5% reduction in total power consumption when operating at the minimum energy-per-bit launch power with only 2.2% throughput penalty.

Subject: Signal Processing

Publish: 2026-06-16 13:50:46 UTC


#13 Three-phase model of unbalanced distribution networks with DERs [PDF] [Copy] [Kimi] [REL]

Authors: S. Perna, C. Lillo, A. R. Di Fazio, M. Russo, G. M. Casolino, P. Varilone, P. Verde

Classical DistFlow equations for steady-state distribution network analysis fail to capture the inherent imbalances of three-phase systems arising from asymmetrical lines, loads, and distributed energy resources (DERs). This paper extends the classical power flow (PF) equations into a rigorous, non-approximated three-phase formulation, termed Dist3Flow. The proposed branch flow model (BFM) utilizes the real and imaginary components of nodal voltages and the active and reactive power flows as state variables. Lines are modelled by nonlinear forward and backward equations, while loads and DERs are represented via ZIP models and P-Q control, respectively. By incorporating specific boundary conditions at the terminal nodes, the formulation generalizes PF analysis to both radial and closed-ring topologies. The solution is obtained by using a backward/borward sweep (BFS) algorithm. The approach is validated against OpenDSS across various configurations, considering open-ring and closed-ring topologies with and without DERs.

Subject: Systems and Control

Publish: 2026-06-16 13:34:19 UTC


#14 Reducing Building Heat Demand Through Intelligent Control: A Comparative Simulation Study [PDF] [Copy] [Kimi] [REL]

Authors: Ueli Schilt, Curtis Meister, Philipp Schuetz

Space heating remains the dominant energy consumer in buildings. While structural retrofitting can substantially reduce demand, it is often costly and time-intensive. As an alternative, this study investigates the potential of intelligent heating control strategies to reduce heat consumption with lower investment and faster implementation. Previous studies have shown that replacing conventional heating-curve-based controllers with model predictive controllers (MPCs) can reduce heating energy demand. Whereas most studies compare MPC to conventional control, this work evaluates two MPC strategies with different control objectives and quantifies their impact on indoor temperature tracking and heating demand. A virtual residential building model was developed in Python based on ISO 52016-1 to generate synthetic measurement data. A simplified resistance-capacitance (RC) model was parametrised using this dataset and used as the internal model for two MPC strategies implemented in MATLAB. The strategies differ only in their optimisation objective: one minimises quadratic heating power, while the other prioritises indoor temperature tracking for thermal comfort. Simulations over six days show that both strategies satisfy comfort and system constraints, but differ in energy use and temperature variation. The comfort-oriented controller achieves lower total heat consumption than the controller minimising heating power, which is attributed to the penalisation of high heating rates in the quadratic objective function. The results demonstrate the importance of objective function formulation in MPC design and show that high comfort levels can be maintained while achieving lower heating demand without structural modifications to the building envelope.

Subject: Systems and Control

Publish: 2026-06-16 13:34:19 UTC


#15 Constellation Design for Nonlinear Unified SWIPT Receiver Channels with Memory [PDF] [Copy] [Kimi] [REL]

Authors: Triantafyllos Mavrovoltsos, Elio Faddoul, Zulqarnain Bin Ashraf, Constantinos Psomas, Besma Smida, Ioannis Krikidis

Unified receivers (URs) have emerged as a promising architecture for simultaneous wireless information and power transfer (SWIPT), since a common rectifying front-end enables information decoding (ID) and energy harvesting (EH) from the same rectified output. However, rectification is nonlinear due to the diode, while the capacitor introduces memory across symbols, making constellation design over the channel challenging. In this paper, we study constellation design for nonlinear UR-SWIPT channels in both memoryless and memory regimes. First, we propose a tractable unified rectification model that captures both (i) the nonlinear steady-state mapping and (ii) the asymmetric capacitor charging/discharging dynamics under transient operation. To isolate the impact of rectification with memory on ID, we study the information-based design. In this setting, we develop a state-adaptive policy with an algorithmic constellation design that accounts for the rectifier state and shapes the constellation in the observation domain. By approximating the rectifier state distribution, we derive a closed-form average symbol error rate (SER) expression and characterize the rate-reliability (R-R) tradeoff. We then seek constellations that minimize the SER under average transmit power and EH constraints. We address the resulting energy-constrained setting in the memoryless regime using an autoencoder-based framework that embeds the nonlinear rectification model as a differentiable channel block. Numerical results validate the proposed models, demonstrate the impact of memory on the R-R tradeoff, and show how learned constellations adapt to EH requirements in the rate-energy tradeoff.

Subject: Signal Processing

Publish: 2026-06-16 13:28:23 UTC


#16 Time-Slotted Multi-Cluster UAV AirComp with Energy-Awareness: A Pointer Network-Assisted Soft Actor-Critic Learning Framework [PDF] [Copy] [Kimi] [REL]

Authors: Xunqiang Lan, Xiao Tang, Ruonan Zhang, Qinghe Du, Tony Q. S. Quek

Over-the-air computation (AirComp) has emerged as a promising approach for massive data aggregation, which is yet challenged by the channel variations, task distributions, and inherent energy limitation of the computation nodes. In this paper, we propose an unmanned aerial vehicle (UAV)-assisted Aircomp system to serve multi-cluster computation tasks over time, where the UAV mobility-facilitated spatial and time diversity is exploited for efficient and accurate data computation. Specifically, we aim for the minimization of AirComp aggregation error and the energy consumption by jointly optimizing the transceiver beamforming, normalizing factors, sensor scheduling, and UAV trajectory. To solve the formulated problem, we decompose it into two layers where the inner layer addresses the optimization-based AirComp transceiver design, and the outer layer focuses on the deep reinforcement learning (DRL)-based scheduling and trajectory design. In particular, a pointer network actor-critic learning is developed to tackle the binary scheduling problem, and a soft actor-critic DRL algorithm is employed to determine the UAV trajectory. Simulation results validate the convergence of the proposed hierarchical learning framework and demonstrate its significant performance gains in terms of AirComp aggregation error and energy consumption as compared with baseline schemes.

Subject: Signal Processing

Publish: 2026-06-16 13:26:51 UTC


#17 Condition-Wise Sinkhorn Drifting for One-Shot Learned Channel Simulation [PDF] [Copy] [Kimi] [REL]

Authors: Rick Fritschek, Rafael F. Schaefer

Learned communication systems may evaluate stochastic channel surrogates millions of times inside differentiable training loops, making diffusion-style reverse sampling expensive. This paper proposes condition-wise Sinkhorn drifting, a one-shot channel surrogate that preserves the transmitted symbol and transports only the conditional output laws \(p(y\mid x)\). We formulate a conditional Sinkhorn objective over repeated outputs at the same transmitted symbol and train the generator with finite-sample barycentric velocities followed by detached particle regression. Experiments on additive white Gaussian noise (AWGN), Rayleigh fading, solid-state power amplifier (SSPA) nonlinearity, and a compact tapped-delay-line (TDL) channel compare direct drifting, joint Sinkhorn drifting, condition-wise Sinkhorn drifting, conditional denoising diffusion probabilistic modeling (DDPM), denoising diffusion implicit modeling (DDIM), and Wasserstein generative adversarial network (WGAN) references. Within the evaluated one-shot drifting-family variants, condition-wise Sinkhorn is strongest under conditional diagnostics and symbolic-coding checks, while diffusion remains strongest on the hardest downstream symbol-error-rate (SER) curves. The resulting operating point is a condition-preserving one-shot simulator for settings where repeated channel calls make diffusion-style sampling too costly.

Subject: Signal Processing

Publish: 2026-06-16 13:13:49 UTC


#18 A 399uW 114.3 dB DR Companding Readout ASIC for MEMS Microphones Employing a Multirate Time-Domain ADC [PDF] [Copy] [Kimi] [REL]

Authors: Javier Granizo, Ruben Garvi, Ricardo Carrero, Jorge de la Torre, Javier Fernandez, Dietmar Straeussnigg, Andreas Wiesbauer, Luis Hernandez

Improvements in the dynamic range and sensitivity of digital MEMS microphones are essential in applications like advanced noise canceling and voice recognition. A cost effective solution to achieve these goals is the companding ADC architecture. Companding ADCs split the dynamic range in several segments with different quantization noise levels, relaxing power constraints. A common problem of companding microphones are audible artifacts generated when the input signal crosses the boundaries between different amplitude segments. We show in this paper a companding ADC architecture that mitigates the boundary artifacts by leveraging the instantaneous and high-resolution time-domain representation of the input signal in a VCO-based ADC. The use of a multi-rate frequency-to-digital converter allows to decouple quantization noise from the VCO frequency, keeping standard audio sampling rates. Co-optimization of the driver and oscillator circuits enables our VCO-ADC to reach \textgreater 112dBc of peak SFDR without a feedback DAC, keeping a Giga-Ohm input impedance compatible with a capacitive MEMS. We show measurements of a 0.13 $μ$m ASIC implementing a complete readout circuit for a digital MEMS microphone. This includes two analog channels and the digital signal processing and calibration blocks required to deliver a standard single-bit PDM output. This ADC reaches a dynamic range of 114.3dB with a power budget under 400 uW, a Schreier FoM_{SNDR} of 171.0 dB and a FoM_{DR} of 191.3 dB.

Subject: Audio and Speech Processing

Publish: 2026-06-16 12:52:45 UTC


#19 Feedforward and Iterative Phase Noise Compensation for Channels with Chromatic Dispersion [PDF] [Copy] [Kimi] [REL]

Authors: Alex Jäger, Gerhard Kramer

Equalization-enhanced phase noise is avoided by applying phase noise compensation (PNC) before chromatic dispersion compensation. Feedforward and iterative PNC algorithms based on expectation propagation are proposed. Both achieve information rates close to channels without phase noise for 100 GBaud 64-QAM and 10,000 km of fiber.

Subjects: Signal Processing , Information Theory

Publish: 2026-06-16 12:47:51 UTC


#20 Model-Free Control for Multi-Time Scale Dynamics of Grid-Connected Power Converters [PDF] [Copy] [Kimi] [REL]

Authors: Dewan Mahnaaz Mahmud, Vinu Thomas, Bogdan Marinescu

Controller synthesis in power electronics-based systems depends predominantly on the mathematical model of the system, which is a limitation when the actual system is complex and the mathematical model cannot capture all its dynamics. Model-free control addresses this limitation by using an ad-hoc simple model which is compensated by high-rate evaluation of dynamics in terms of their derivatives. However, application of the model-free control strategy to power electronics-based multi-time scale dynamical systems is challenging because of the derivative action needed to implement such control. Grid-connected power converters are examples of such systems, yet experimental validation has not been adequately addressed in the literature. This letter presents the validation of such control including the hardware implementation level. An intelligent proportional-integral (iPI) controller is synthesized and validated on a 16 kW experimental test bench. This proves the benefits of the approach in control of grid-connected power converters, among which their participation in the secondary voltage control.

Subject: Systems and Control

Publish: 2026-06-16 12:45:37 UTC


#21 Perceptually-Weighted Video Quality Metric for Asymmetric Encoded Sports Videos [PDF] [Copy] [Kimi] [REL]

Authors: Anna Meyer, Jonas Janzen, Diwakara Reddy, Alexander Kopte, Simon Deniffel, Paul Wawerek-López, Marc Windsheimer, André Kaup

Objective video quality metrics commonly assume uniform spatial attention, an assumption that conflicts with the selective nature of human visual perception, particularly in sports videos. Here, allocating more bits for salient regions through semantic encoding can lead to significant bitrate savings. We present a Perceptually-Weighted Video Quality Metric (PW-VQM), a full-reference metric that accounts for the unequal perceptual importance of spatial regions and therefore targets quality evaluation for asymmetrically encoded content. SSIM maps computed in a multiscale wavelet domain are weighted by differentiating between foreground and background regions. Perceptually salient foreground regions are identified by combining open-vocabulary object detection with optical flow analysis, and are assigned higher weight during quality aggregation. Evaluated on sports video content, PW-VQM achieves a Spearman Rank Order Correlation Coefficient of 0.9511, outperforming established metrics including SSIM, VMAF, FUNQUE, and LPIPS. An ablation study confirms the individual contributions of the components of the perceptual weighting.

Subject: Image and Video Processing

Publish: 2026-06-16 12:41:03 UTC


#22 PhASE-Flow: Phonetic-Conditioned Acoustic Flow Matching in SSL Representation Domain for Speech Enhancement [PDF] [Copy] [Kimi] [REL]

Authors: Jun Gao, Xiaobin Rong, Yu Sun, Dahan Wang, Jing Lu

Flow matching (FM) enables high-fidelity generation, while self-supervised learning (SSL) speech models provide hierarchical representations spanning acoustic and phonetic levels. However, existing FM-based speech enhancement (SE) methods operate primarily in the spectral domain, treating SSL features only as external conditions rather than modeling directly in the SSL latent space. To fully exploit the structural richness of SSL representations, we propose PhASE-Flow, an FM-based SE framework that operates entirely in the SSL space. It models the conditional distribution of clean acoustic representations given phonetic ones, reconstructing the waveform via a neural vocoder. Experiments show that PhASE-Flow outperforms state-of-the-art baselines in perceptual quality and intelligibility. Notably, it achieves competitive performance with only four sampling steps, enabling highly efficient inference. Audio demos are available at https://anonymous.4open.science/w/phase-flow_demo-E6E1/.

Subject: Audio and Speech Processing

Publish: 2026-06-16 11:28:29 UTC


#23 Joint Direction-of-Arrival and Range Estimation for Millimeter-Wave Uniform Linear Array Radar [PDF] [Copy] [Kimi] [REL]

Authors: Necati Kagan Erkek, Zeynep Gul Pehlivanli

An FFT-based direction-of-arrival (DOA) and range-estimation framework for a monostatic uniform linear array (ULA) operating at 77 GHz is presented. A narrowband sinusoidal waveform is used to derive the spatial phase model, determine an aliasing-free inter-element spacing, and select the aperture required to obtain a boresight angular resolution of 2 degree. The resulting design uses an element spacing of 0.97 mm and 58 antenna elements, corresponding to an aperture length of 56.42 mm. Numerical results show accurate angular estimation for a single target at 30 degree and for multiple simultaneous targets. The analysis is further extended to two-dimensional localization by replacing the narrowband waveform with a 1 GHz sinc-modulated signal, which provides an approximate range resolution of 0.15 m. Additional simulations quantify the effects of additive complex Gaussian noise, increased antenna spacing, and target decorrelation on the DOA response.

Subject: Signal Processing

Publish: 2026-06-16 11:26:24 UTC


#24 A Wearable Multimodal Ultrasound+Inertial System for Real-Time Virtual Reality Interaction [PDF] [Copy] [Kimi] [REL]

Authors: Giusy Spacone, Sebastian Frey, Enzo Baraldi, Mattia Orlandi, Luca Benini, Andrea Cossettini

A-mode ultrasound (US) is a promising sensing modality for Virtual Reality (VR) interaction, as it enables the mapping of muscular activity into control commands while retaining the benefits of wearable sensing. However, existing approaches still face limitations in terms of wearability and interaction complexity, often relying on external hardware such as cameras. In this work, we propose a fully wearable multimodal interface for real-time VR-interaction, based on concurrent US and inertial (accelerometry) sensing from the forearm and upper arm. The system is built on the WULPUS platform and integrates an end-to-end software framework for real-time acquisition, visualization, and communication with a Unity-based VR environment. A multimodal learning pipeline is introduced for concurrent hand pose and forearm position estimation in 2D space. The interface is evaluated through offline and online experiments with five subjects, during the execution of three functional tasks: cylinder grasping (gross motor) and relocation, marble pinching (fine motor) and relocation, and liquid pouring. For offline experiments, we collect 5 acquisition sessions across multiple days, achieving an average inter-session accuracy across subjects of 80$\pm$6\% for hand pose estimation and 77$\pm$7\% for forearm position estimation. Online validation with minimal fine-tuning (5 min) demonstrates success rates of 92.0$\pm$16.0\%, 88.0$\pm$9.8\%, and 96.0$\pm$8.0\% for the three tasks, respectively. With a power consumption of only 19.9~mW, our system enables more than 2.5 days of continuous use on a small 350 mAh LiPo battery without the need for recharge, enabling truly wearable, multimodal, and functionally meaningful VR interaction.

Subjects: Systems and Control , Human-Computer Interaction

Publish: 2026-06-16 10:03:19 UTC


#25 Deep CSI Feedback for FDD Massive MIMO Systems: A Curvelet Learning Approach [PDF] [Copy] [Kimi] [REL]

Authors: Mengli Tao, Jiancun Fan, Huiqiang Xie, Kai Xie

Downlink channel state information (CSI) feedback plays a key role in frequency division duplex (FDD) massive multiple-input multiple-output (mMIMO) systems. The growth of antennas in ultra-massive MIMO increases the difficulty and overhead of CSI feedback, which poses significant challenges for conventional downlink CSI feedback mechanisms. To address the limitations of existing CSI feedback approaches, this paper proposes a novel curvelet learning based framework termed SwinCANet, comprising a frequency-domain information processing module and a denoising module. The frequency-domain information processing module employs curvelet transform to decompose CSI into low-frequency and high-frequency components. Subsequently, Swin Transformer and channel-wise attention block are utilized for extracting the low-frequency and high-frequency representations, respectively, thereby enhancing reconstruction quality. Notably, an additional Swin Transformer facilitates the fusion of multi-scale frequency components, enhancing capabilities across different angular resolutions and spatial directions. Furthermore, we develop a variant (De-SwinCANet), which employs a Sigmoid threshold function to effectively suppress noise coefficients, thereby mitigating various channel impairments and nonlinear distortions. Numerical simulation results demonstrate that the proposed methodology achieves superior performance compared to existing benchmarks while maintaining robust performance under challenging propagation conditions.

Subject: Signal Processing

Publish: 2026-06-16 09:56:11 UTC