Processing math: 100%

Electrical Engineering and Systems Science

2025-07-03 | | Total: 83

#1 Perceptual Ratings Predict Speech Inversion Articulatory Kinematics in Childhood Speech Sound Disorders [PDF] [Copy] [Kimi] [REL]

Authors: Nina R. Benway, Saba Tabatabaee, Dongliang Wang, Benjamin Munson, Jonathan L. Preston, Carol Espy-Wilson

Purpose: This study evaluated whether articulatory kinematics, inferred by Articulatory Phonology speech inversion neural networks, aligned with perceptual ratings of /r/ and /s/ in the speech of children with speech sound disorders. Methods: Articulatory Phonology vocal tract variables were inferred for 5,961 utterances from 118 children and 3 adults, aged 2.25-45 years. Perceptual ratings were standardized using the novel 5-point PERCEPT Rating Scale and training protocol. Two research questions examined if the articulatory patterns of inferred vocal tract variables aligned with the perceptual error category for the phones investigated (e.g., tongue tip is more anterior in dentalized /s/ productions than in correct /s/). A third research question examined if gradient PERCEPT Rating Scale scores predicted articulatory proximity to correct productions. Results: Estimated marginal means from linear mixed models supported 17 of 18 /r/ hypotheses, involving tongue tip and tongue body constrictions. For /s/, estimated marginal means from a second linear mixed model supported 7 of 15 hypotheses, particularly those related to the tongue tip. A third linear mixed model revealed that PERCEPT Rating Scale scores significantly predicted articulatory proximity of errored phones to correct productions. Conclusion: Inferred vocal tract variables differentiated category and magnitude of articulatory errors for /r/, and to a lesser extent for /s/, aligning with perceptual judgments. These findings support the clinical interpretability of speech inversion vocal tract variables and the PERCEPT Rating Scale in quantifying articulatory proximity to the target sound, particularly for /r/.

Subject: Audio and Speech Processing

Publish: 2025-07-02 16:57:46 UTC


#2 A computationally frugal open-source foundation model for thoracic disease detection in lung cancer screening programs [PDF] [Copy] [Kimi] [REL]

Authors: Niccolò McConnell, Pardeep Vasudev, Daisuke Yamada, Daryl Cheng, Mehran Azimbagirad, John McCabe, Shahab Aslani, Ahmed H. Shahin, Yukun Zhou, The SUMMIT Consortium, Andre Altmann, Yipeng Hu, Paul Taylor, Sam M. Janes, Daniel C. Alexander, Joseph Jacob

Low-dose computed tomography (LDCT) imaging employed in lung cancer screening (LCS) programs is increasing in uptake worldwide. LCS programs herald a generational opportunity to simultaneously detect cancer and non-cancer-related early-stage lung disease. Yet these efforts are hampered by a shortage of radiologists to interpret scans at scale. Here, we present TANGERINE, a computationally frugal, open-source vision foundation model for volumetric LDCT analysis. Designed for broad accessibility and rapid adaptation, TANGERINE can be fine-tuned off the shelf for a wide range of disease-specific tasks with limited computational resources and training data. Relative to models trained from scratch, TANGERINE demonstrates fast convergence during fine-tuning, thereby requiring significantly fewer GPU hours, and displays strong label efficiency, achieving comparable or superior performance with a fraction of fine-tuning data. Pretrained using self-supervised learning on over 98,000 thoracic LDCTs, including the UK's largest LCS initiative to date and 27 public datasets, TANGERINE achieves state-of-the-art performance across 14 disease classification tasks, including lung cancer and multiple respiratory diseases, while generalising robustly across diverse clinical centres. By extending a masked autoencoder framework to 3D imaging, TANGERINE offers a scalable solution for LDCT analysis, departing from recent closed, resource-intensive models by combining architectural simplicity, public availability, and modest computational requirements. Its accessible, open-source lightweight design lays the foundation for rapid integration into next-generation medical imaging tools that could transform LCS initiatives, allowing them to pivot from a singular focus on lung cancer detection to comprehensive respiratory disease management in high-risk populations.

Subjects: Image and Video Processing , Computer Vision and Pattern Recognition , Machine Learning

Publish: 2025-07-02 16:52:10 UTC


#3 Autoadaptive Medical Segment Anything Model [PDF] [Copy] [Kimi] [REL]

Authors: Tyler Ward, Meredith K. Owen, O'Kira Coleman, Brian Noehren, Abdullah-Al-Zubaer Imran

Medical image segmentation is a key task in the imaging workflow, influencing many image-based decisions. Traditional, fully-supervised segmentation models rely on large amounts of labeled training data, typically obtained through manual annotation, which can be an expensive, time-consuming, and error-prone process. This signals a need for accurate, automatic, and annotation-efficient methods of training these models. We propose ADA-SAM (automated, domain-specific, and adaptive segment anything model), a novel multitask learning framework for medical image segmentation that leverages class activation maps from an auxiliary classifier to guide the predictions of the semi-supervised segmentation branch, which is based on the Segment Anything (SAM) framework. Additionally, our ADA-SAM model employs a novel gradient feedback mechanism to create a learnable connection between the segmentation and classification branches by using the segmentation gradients to guide and improve the classification predictions. We validate ADA-SAM on real-world clinical data collected during rehabilitation trials, and demonstrate that our proposed method outperforms both fully-supervised and semi-supervised baselines by double digits in limited label settings. Our code is available at: https://github.com/tbwa233/ADA-SAM.

Subjects: Image and Video Processing , Computer Vision and Pattern Recognition

Publish: 2025-07-02 15:44:32 UTC


#4 Low-Complexity Neural Wind Noise Reduction for Audio Recordings [PDF] [Copy] [Kimi1] [REL]

Authors: Hesam Eftekhari, Srikanth Raj Chetupalli, Shrishti Saha Shetu, Emanuël A. P. Habets, Oliver Thiergart

Wind noise significantly degrades the quality of outdoor audio recordings, yet remains difficult to suppress in real-time on resource-constrained devices. In this work, we propose a low-complexity single-channel deep neural network that leverages the spectral characteristics of wind noise. Experimental results show that our method achieves performance comparable to the state-of-the-art low-complexity ULCNet model. The proposed model, with only 249K parameters and roughly 73 MHz of computational power, is suitable for embedded and mobile audio applications.

Subjects: Audio and Speech Processing , Sound , Signal Processing

Publish: 2025-07-02 15:36:54 UTC


#5 Measurement-based Evaluation of CNN-based Detection and Estimation for ISAC Systems [PDF] [Copy] [Kimi] [REL]

Authors: Steffen Schieler, Sebastian Semper, Christian Schneider, Reiner Thomä

In wireless sensing applications, such as ISAC, one of the first crucial signal processing steps is the detection and estimation targets from a channel estimate. Effective algorithms in this context must be robust across a broad SNR range, capable of handling an unknown number of targets, and computationally efficient for real-time implementation. During the last decade, different Machine Learning methods have emerged as promising solutions, either as standalone models or as complementing existing techniques. However, since models are often trained and evaluated on synthetic data from existing models, applying them to measurement is challenging. All the while, training directly on measurement data is prohibitive in complex propagation scenarios as a groundtruth is not available. Therefore, in this paper, we train a CNN approach for target detection and estimation on synthetic data and evaluate it on measurement data from a suburban outdoor measurement. Using knowledge of the environment as well as available groundtruth positions, we study the detection probability and accuracy of our approach. The results demonstrate that our approach works on measurement data and is suitable for joint detection and estimation of sensing targets in ISAC systems.

Subject: Signal Processing

Publish: 2025-07-02 15:19:52 UTC


#6 Robust brain age estimation from structural MRI with contrastive learning [PDF] [Copy] [Kimi] [REL]

Authors: Carlo Alberto Barbano, Benoit Dufumier, Edouard Duchesnay, Marco Grangetto, Pietro Gori

Estimating brain age from structural MRI has emerged as a powerful tool for characterizing normative and pathological aging. In this work, we explore contrastive learning as a scalable and robust alternative to supervised approaches for brain age estimation. We introduce a novel contrastive loss function, Lexp, and evaluate it across multiple public neuroimaging datasets comprising over 20,000 scans. Our experiments reveal four key findings. First, scaling pre-training on diverse, multi-site data consistently improves generalization performance, cutting external mean absolute error (MAE) nearly in half. Second, Lexp is robust to site-related confounds, maintaining low scanner-predictability as training size increases. Third, contrastive models reliably capture accelerated aging in patients with cognitive impairment and Alzheimer's disease, as shown through brain age gap analysis, ROC curves, and longitudinal trends. Lastly, unlike supervised baselines, Lexp maintains a strong correlation between brain age accuracy and downstream diagnostic performance, supporting its potential as a foundation model for neuroimaging. These results position contrastive learning as a promising direction for building generalizable and clinically meaningful brain representations.

Subjects: Image and Video Processing , Computer Vision and Pattern Recognition

Publish: 2025-07-02 15:18:03 UTC


#7 Higher-Order Tensor-Based Deferral of Gaussian Splitting for Orbit Uncertainty Propagation [PDF] [Copy] [Kimi] [REL]

Authors: G. Andrew Siciliano, Keith A. LeGrand, Jackson Kulik

Accurate propagation of orbital uncertainty is essential for a range of applications within space domain awareness. Adaptive Gaussian mixture-based approaches offer tractable nonlinear uncertainty propagation through splitting mixands to increase resolution in areas of stronger nonlinearities, as well as by reducing mixands to prevent unnecessary computational effort. Recent work introduced principled heuristics that incorporate information from the system dynamics and initial uncertainty to determine optimal directions for splitting. This paper develops adaptive uncertainty propagation methods based on these robust splitting techniques. A deferred splitting algorithm tightly integrated with higher-order splitting techniques is proposed and shown to offer substantial gains in computational efficiency without sacrificing accuracy. Second-order propagation of mixand moments is also seen to improve accuracy while retaining significant computational savings from deferred splitting. Different immediate and deferred splitting methods are compared in three representative test cases, including a geostationary orbit, a Molniya orbit, and a periodic three-body orbit.

Subjects: Signal Processing , Probability

Publish: 2025-07-02 14:56:15 UTC


#8 First Steps Towards Voice Anonymization for Code-Switching Speech [PDF] [Copy] [Kimi] [REL]

Authors: Sarina Meyer, Ekaterina Kolos, Ngoc Thang Vu

The goal of voice anonymization is to modify an audio such that the true identity of its speaker is hidden. Research on this task is typically limited to the same English read speech datasets, thus the efficacy of current methods for other types of speech data remains unknown. In this paper, we present the first investigation of voice anonymization for the multilingual phenomenon of code-switching speech. We prepare two corpora for this task and propose adaptations to a multilingual anonymization model to make it applicable for code-switching speech. By testing the anonymization performance of this and two language-independent methods on the datasets, we find that only the multilingual system performs well in terms of privacy and utility preservation. Furthermore, we observe challenges in performing utility evaluations on this data because of its spontaneous character and the limited code-switching support by the multilingual speech recognition model.

Subject: Audio and Speech Processing

Publish: 2025-07-02 14:46:44 UTC


#9 Generalizable Detection of Audio Deepfakes [PDF] [Copy] [Kimi] [REL]

Authors: Jose A. Lopez, Georg Stemmer, Héctor Cordourier Maruri

In this paper, we present our comprehensive study aimed at enhancing the generalization capabilities of audio deepfake detection models. We investigate the performance of various pre-trained backbones, including Wav2Vec2, WavLM, and Whisper, across a diverse set of datasets, including those from the ASVspoof challenges and additional sources. Our experiments focus on the effects of different data augmentation strategies and loss functions on model performance. The results of our research demonstrate substantial enhancements in the generalization capabilities of audio deepfake detection models, surpassing the performance of the top-ranked single system in the ASVspoof 5 Challenge. This study contributes valuable insights into the optimization of audio models for more robust deepfake detection and facilitates future research in this critical area.

Subjects: Audio and Speech Processing , Sound

Publish: 2025-07-02 14:28:11 UTC


#10 Position and Velocity Estimation Accuracy in MIMO-OFDM ISAC Networks: A Fisher Information Analysis [PDF] [Copy] [Kimi] [REL]

Authors: Lorenzo Pucci, Luca Arcangeloni, Andrea Giorgetti

Integrated sensing and communication (ISAC) is a core technology for future wireless networks, enabling high-resolution sensing and reliable data transmission within a unified radio platform. This paper develops a theoretical framework to assess the estimation accuracy of target position and velocity in heterogeneous orthogonal frequency division multiplexing (OFDM)-based ISAC networks with multiple cooperative and distributed multiple-input multiple-output (MIMO) base stations (BSs). Using Fisher information analysis, we first derive closed-form Cramér-Rao lower bounds (CRLBs) for target localization in single monostatic and bistatic configurations. We then analyze the benefits of BS cooperation by deriving CRLBs for joint position and velocity estimation in a general setting that encompasses multiple cooperating monostatic systems and multistatic networks with multiple transmitters (Txs) and receivers (Rxs). The influence of key system parameters, including the number of BSs, bandwidth, antenna array configuration, and network geometry, is systematically examined. Numerical results highlight the performance gains enabled by cooperative sensing and provide insights to guide the design of future ISAC systems.

Subject: Signal Processing

Publish: 2025-07-02 14:23:10 UTC


#11 Token Communication in the Era of Large Models: An Information Bottleneck-Based Approach [PDF] [Copy] [Kimi] [REL]

Authors: Hao Wei, Wanli Ni, Wen Wang, Wenjun Xu, Dusit Niyato, Ping Zhang

This letter proposes UniToCom, a unified token communication paradigm that treats tokens as the fundamental units for both processing and wireless transmission. Specifically, to enable efficient token representations, we propose a generative information bottleneck (GenIB) principle, which facilitates the learning of tokens that preserve essential information while supporting reliable generation across multiple modalities. By doing this, GenIB-based tokenization is conducive to improving the communication efficiency and reducing computational complexity. Additionally, we develop σ-GenIB to address the challenges of variance collapse in autoregressive modeling, maintaining representational diversity and stability. Moreover, we employ a causal Transformer-based multimodal large language model (MLLM) at the receiver to unify the processing of both discrete and continuous tokens under the next-token prediction paradigm. Simulation results validate the effectiveness and superiority of the proposed UniToCom compared to baselines under dynamic channel conditions. By integrating token processing with MLLMs, UniToCom enables scalable and generalizable communication in favor of multimodal understanding and generation, providing a potential solution for next-generation intelligent communications.

Subjects: Signal Processing , Machine Learning

Publish: 2025-07-02 14:03:01 UTC


#12 Auto-optimization of Energy Generation for Wave Energy Converters with Active Learning [PDF] [Copy] [Kimi] [REL]

Authors: Siyang Tang, Wen-Hua Chen, Cunjia Liu

This paper presents an auto-optimization control framework for wave energy converters (WECs) to maximize energy generation under unknown and changing ocean conditions. The proposed control framework consists of two levels. The high-level controller operating at a longer time scale aims to maximize the average energy generation over several wave periods. The generated Power Take-Off (PTO) profile as the reference for the low-level physical system to follow. The new auto-optimization process leverages the parameterization of the non-stationary operation condition in WECs, establishing the relationship between the average energy generation and the key design parameters of the PTO force subject to the unknown wave parameters. The high-level controller is designed based on the concept of Dual Control for Exploration and Exploitation (DCEE) to quickly learn the unknown wave parameters by actively probing the ocean condition, while generating the optimal PTO profile. During this process, the uncertainty of the estimated wave condition is quantified and embedded in the optimization cost function to enable active learning. Simulation results under unknown regular and irregular waves demonstrate the effectiveness and robustness of this novel auto-optimization WEC systems with active learning, outperforming model predictive control, extremum seeking and classic Bang-Bang control approaches.

Subject: Systems and Control

Publish: 2025-07-02 14:02:33 UTC


#13 Re-examining the Legendre-Gauss-Lobatto Pseudospectral Methods for Optimal Control [PDF] [Copy] [Kimi] [REL]

Authors: Yilin Zou, Fanghua Jiang

Pseudospectral methods represent an efficient approach for solving optimal control problems. While Legendre-Gauss-Lobatto (LGL) collocation points have traditionally been considered inferior to Legendre-Gauss (LG) and Legendre-Gauss-Radau (LGR) points in terms of convergence properties, this paper presents a rigorous re-examination of LGL-based methods. We introduce an augmented formulation that enhances the standard LGL collocation approach by incorporating an additional degree of freedom (DOF) into the interpolation structure. We demonstrate that this augmented formulation is mathematically equivalent to the integral formulation of the LGL collocation method. Through analytical derivation, we establish that the adjoint system in both the augmented differential and integral formulations corresponds to a Lobatto IIIB discontinuous collocation method for the costate vector, thereby resolving the previously reported convergence issues. Our comparative analysis of LG, LGR, and LGL collocation methods reveals significant advantages of the improved LGL approach in terms of discretized problem dimensionality and symplectic integration properties. Numerical examples validate our theoretical findings, demonstrating that the proposed LGL-based method achieves comparable accuracy to LG and LGR methods while offering superior computational performance for long-horizon optimal control problems due to the preservation of symplecticity.

Subjects: Systems and Control , Optimization and Control

Publish: 2025-07-02 12:41:33 UTC


#14 Frequency-switching Array Enhanced Physical-Layer Security in Terahertz Bands: A Movable Antenna Perspective [PDF] [Copy] [Kimi] [REL]

Authors: Cong Zhou, Changsheng You, Shuo Shi, Weidong Mei

In this paper, we propose a new frequency-switching array (FSA) enhanced physical-layer security (PLS) system in terahertz bands, where the carrier frequency can be flexibly switched and small frequency offsets can be imposed on each antenna at Alice, so as to eliminate information wiretapping by undesired eavesdroppers. First, we analytically show that by flexibly controlling the carrier frequency parameters, FSAs can effectively form uniform/non-uniform sparse arrays, hence resembling movable antennas (MAs) in the control of inter-antenna spacing and providing additional degree-of-freedom (DoF) in the beam control. Although the proposed FSA experiences additional path-gain attenuation in the received signals, it can overcome several hardware and signal processing issues incurred by MAs, such as limited positioning accuracy, considerable response latency, and demanding hardware and energy cost. To shed useful insights, we first consider a secrecy-guaranteed problem with a null-steering constraint for which maximum ratio transmission (MRT) beamformer is considered at Alice and the frequency offsets are set as uniform frequency increment. Interestingly, it is shown that the proposed FSA can flexibly realize null-steering over Eve in both the angular domain (by tuning carrier frequency) and range domain (by controlling per-antenna frequency offset), thereby achieving improved PLS performance. Then, for the general case, we propose an efficient algorithm to solve the formulated non-convex problem by using the block coordinate descent (BCD) and projected gradient ascent (PGA) techniques. Finally, numerical results demonstrate the convergence of the proposed optimization algorithm and its superiority over fixed-position arrays (FPAs) in terms of secrecy-rate performance.

Subject: Signal Processing

Publish: 2025-07-02 11:54:38 UTC


#15 QHARMA-GAN: Quasi-Harmonic Neural Vocoder based on Autoregressive Moving Average Model [PDF] [Copy] [Kimi1] [REL]

Authors: Shaowen Chen, Tomoki Toda

Vocoders, encoding speech signals into acoustic features and allowing for speech signal reconstruction from them, have been studied for decades. Recently, the rise of deep learning has particularly driven the development of neural vocoders to generate high-quality speech signals. On the other hand, the existing end-to-end neural vocoders suffer from a black-box nature that blinds the speech production mechanism and the intrinsic structure of speech, resulting in the ambiguity of separately modeling source excitation and resonance characteristics and the loss of flexibly synthesizing or modifying speech with high quality. Moreover, their sequence-wise waveform generation usually requires complicated networks, leading to substantial time consumption. In this work, inspired by the quasi-harmonic model (QHM) that represents speech as sparse components, we combine the neural network and QHM synthesis process to propose a novel framework for the neural vocoder. Accordingly, speech signals can be encoded into autoregressive moving average (ARMA) functions to model the resonance characteristics, yielding accurate estimates of the amplitudes and phases of quasi-harmonics at any frequency. Subsequently, the speech can be resynthesized and arbitrarily modified in terms of pitch shifting and time stretching with high quality, whereas the time consumption and network size decrease. The experiments indicate that the proposed method leverages the strengths of QHM, the ARMA model, and neural networks, leading to the outperformance of our methods over other methods in terms of generation speed, synthesis quality, and modification flexibility.

Subjects: Audio and Speech Processing , Sound , Signal Processing

Publish: 2025-07-02 11:28:53 UTC


#16 Enhancing Multi-Exposure High Dynamic Range Imaging with Overlapped Codebook for Improved Representation Learning [PDF] [Copy] [Kimi] [REL]

Authors: Keuntek Lee, Jaehyun Park, Nam Ik Cho

High dynamic range (HDR) imaging technique aims to create realistic HDR images from low dynamic range (LDR) inputs. Specifically, Multi-exposure HDR imaging uses multiple LDR frames taken from the same scene to improve reconstruction performance. However, there are often discrepancies in motion among the frames, and different exposure settings for each capture can lead to saturated regions. In this work, we first propose an Overlapped codebook (OLC) scheme, which can improve the capability of the VQGAN framework for learning implicit HDR representations by modeling the common exposure bracket process in the shared codebook structure. Further, we develop a new HDR network that utilizes HDR representations obtained from a pre-trained VQ network and OLC. This allows us to compensate for saturated regions and enhance overall visual quality. We have tested our approach extensively on various datasets and have demonstrated that it outperforms previous methods both qualitatively and quantitatively

Subject: Image and Video Processing

Publish: 2025-07-02 10:58:18 UTC


#17 Transfer Learning for VLC-based indoor Localization: Addressing Environmental Variability [PDF] [Copy] [Kimi] [REL]

Authors: Masood Jan, Wafa Njima, Xun Zhang, Alexander Artemenko

Accurate indoor localization is crucial in industrial environments. Visible Light Communication (VLC) has emerged as a promising solution, offering high accuracy, energy efficiency, and minimal electromagnetic interference. However, VLC-based indoor localization faces challenges due to environmental variability, such as lighting fluctuations and obstacles. To address these challenges, we propose a Transfer Learning (TL)-based approach for VLC-based indoor localization. Using real-world data collected at a BOSCH factory, the TL framework integrates a deep neural network (DNN) to improve localization accuracy by 47\%, reduce energy consumption by 32\%, and decrease computational time by 40\% compared to the conventional models. The proposed solution is highly adaptable under varying environmental conditions and achieves similar accuracy with only 30\% of the dataset, making it a cost-efficient and scalable option for industrial applications in Industry 4.0.

Subjects: Signal Processing , Machine Learning

Publish: 2025-07-02 10:51:38 UTC


#18 Vision-Aided ISAC in Low-Altitude Economy Networks via De-Diffused Visual Priors [PDF] [Copy] [Kimi] [REL]

Authors: Yulan Gao, Ziqiang Ye, Zhonghao Lyu, Ming Xiao, Yue Xiao, Ping Yang, Agata Manolova

Emerging low-altitude economy networks (LAENets) require agile and privacy-preserving resource control under dynamic agent mobility and limited infrastructure support. To meet these challenges, we propose a vision-aided integrated sensing and communication (ISAC) framework for UAV-assisted access systems, where onboard masked De-Diffusion models extract compact semantic tokens, including agent type, activity class, and heading orientation, while explicitly suppressing sensitive visual content. These tokens are fused with mmWave radar measurements to construct a semantic risk heatmap reflecting motion density, occlusion, and scene complexity, which guides access technology selection and resource scheduling. We formulate a multi-objective optimization problem to jointly maximize weighted energy and perception efficiency via radio access technology (RAT) assignment, power control, and beamforming, subject to agent-specific QoS constraints. To solve this, we develop De-Diffusion-driven vision-aided risk-aware resource optimization algorithm DeDiff-VARARO, a novel two-stage cross-modal control algorithm: the first stage reconstructs visual scenes from tokens via De-Diffusion model for semantic parsing, while the second stage employs a deep deterministic policy gradient (DDPG)-based policy to adapt RAT selection, power control, and beam assignment based on fused radar-visual states. Simulation results show that DeDiff-VARARO consistently outperforms baselines in reward convergence, link robustness, and semantic fidelity, achieving within 4% of the performance of a raw-image upper bound while preserving user privacy and scalability in dense environments.

Subject: Systems and Control

Publish: 2025-07-02 10:50:49 UTC


#19 Time-Varying Coverage Control: A Distributed Tracker-Planner MPC Framework [PDF] [Copy] [Kimi] [REL]

Authors: Patrick Benito Eberhard, Johannes Köhler, Oliver Hüsser, Melanie N. Zeilinger, Andrea Carron

Time-varying coverage control addresses the challenge of coordinating multiple agents covering an environment where regions of interest change over time. This problem has broad applications, including the deployment of autonomous taxis and coordination in search and rescue operations. The achievement of effective coverage is complicated by the presence of time-varying density functions, nonlinear agent dynamics, and stringent system and safety constraints. In this paper, we present a distributed multi-agent control framework for time-varying coverage under nonlinear constrained dynamics. Our approach integrates a reference trajectory planner and a tracking model predictive control (MPC) scheme, which operate at different frequencies within a multi-rate framework. For periodic density functions, we demonstrate closed-loop convergence to an optimal configuration of trajectories and provide formal guarantees regarding constraint satisfaction, collision avoidance, and recursive feasibility. Additionally, we propose an efficient algorithm capable of handling nonperiodic density functions, making the approach suitable for practical applications. Finally, we validate our method through hardware experiments using a fleet of four miniature race cars.

Subjects: Systems and Control , Robotics

Publish: 2025-07-02 10:33:14 UTC


#20 Multi Source COVID-19 Detection via Kernel-Density-based Slice Sampling [PDF] [Copy] [Kimi] [REL]

Authors: Chia-Ming Lee, Bo-Cheng Qiu, Ting-Yao Chen, Ming-Han Sun, Fang-Ying Lin, Jung-Tse Tsai, I-An Tsai, Yu-Fan Lin, Chih-Chung Hsu

We present our solution for the Multi-Source COVID-19 Detection Challenge, which classifies chest CT scans from four distinct medical centers. To address multi-source variability, we employ the Spatial-Slice Feature Learning (SSFL) framework with Kernel-Density-based Slice Sampling (KDS). Our preprocessing pipeline combines lung region extraction, quality control, and adaptive slice sampling to select eight representative slices per scan. We compare EfficientNet and Swin Transformer architectures on the validation set. The EfficientNet model achieves an F1-score of 94.68%, compared to the Swin Transformer's 93.34%. The results demonstrate the effectiveness of our KDS-based pipeline on multi-source data and highlight the importance of dataset balance in multi-institutional medical imaging evaluation.

Subjects: Image and Video Processing , Computer Vision and Pattern Recognition

Publish: 2025-07-02 10:27:59 UTC


#21 Frequency Domain Design of a Reset-Based Filter: An Add-On Nonlinear Filter for Industrial Motion Control [PDF] [Copy] [Kimi] [REL]

Authors: S. Ali Hosseini, Fabian R. Quinten, Luke F. van Eijk, Dragan Kostic, S. Hassan HosseinNia

This study introduces a modified version of the Constant-in-Gain, Lead-in-Phase (CgLp) filter, which incorporates a feedthrough term in the First-Order Reset Element (FORE) to reduce the undesirable nonlinearities and achieve an almost constant gain across all frequencies. A backward calculation approach is proposed to derive the additional parameter introduced by the feedthrough term, enabling designers to easily tune the filter to generate the required phase. The paper also presents an add-on filter structure that can enhance the performance of an existing LTI controller without altering its robustness margins. A sensitivity improvement indicator is proposed to guide the tuning process, enabling designers to visualize the improvements in closed-loop performance. The proposed methodology is demonstrated through a case study of an industrial wire bonder machine, showcasing its effectiveness in addressing low-frequency vibrations and improving overall control performance.

Subject: Systems and Control

Publish: 2025-07-02 08:51:27 UTC


#22 Robust Input Shaping Control for Flexible Structures Based on Unscented Kalman Filter [PDF] [Copy] [Kimi] [REL]

Authors: Weiyi Yang, Yu Yuan, Mingsheng Shang

With the rapid development of industrial automation and smart manufacturing, the control of flexible structures and underactuated systems has become a critical research focus. Residual vibrations in these systems not only degrade operational efficiency but also pose risks to structural integrity and longevity. Traditional input shaping techniques, while effective, often suffer from performance degradation due to parameter inaccuracies and environmental disturbances. To address these challenges, this paper introduces an innovative unscented Kalman filter-based zero vibration derivative input shaping (UZS) method. The proposed approach combines two key innovations: 1) a data-driven Unscented Kalman Filterfor real-time system parameter identification, and 2) a zero-vibration derivative (ZVD) input shaper for robust vibration suppression. To validate the effectiveness of UZS, we conducted extensive experiments on a vertical flexible beam platform, and the results demonstrate significant improvements over state-of-the-art methods. Additionally, we have made the experimental datasets publicly available to facilitate further research. The findings highlight UZS's potential for practical applications in industrial automation, robotics, and precision engineering.

Subject: Systems and Control

Publish: 2025-07-02 08:19:11 UTC


#23 Multi-Revolution Low-Thrust Trajectory Optimization With Very Sparse Mesh Pseudospectral Method [PDF] [Copy] [Kimi] [REL]

Authors: Yilin Zou, Fanghua Jiang

Multi-revolution low-thrust trajectory optimization problems are important and challenging in space mission design. In this paper, an efficient, accurate, and widely applicable pseudospectral method is proposed to solve multi-revolution low-thrust trajectory optimization problems with various objective functions and perturbations. The method is based on the Sundman transformation and pseudospectral method, together with a sparse mesh that is monotonic, near-uniformly spaced, and uniformly scattered on the unit circle. Two methods are proposed to construct the mesh: a deterministic method based on rotation mapping; a stochastic method utilizing autocorrelated random sequences. Core mechanisms ensuring the correctness of the method are analyzed, including the dual roles of mesh points as both integration points in the temporal domain and sampling points in the angular domain, the slow dynamics of the system excluding the fast angle variable, and the nearly commutative vector fields generated by applying different control inputs. The method is demonstrated through a multi-revolution low-thrust orbital rendezvous problem. Results show that the proposed method achieves high accuracy with only a few seconds of computational time for challenging problems.

Subjects: Systems and Control , Optimization and Control

Publish: 2025-07-02 08:08:44 UTC


#24 Basis Expansion Extrapolation based Long-Term Channel Prediction for Massive MIMO OTFS Systems [PDF] [Copy] [Kimi] [REL]

Authors: Yanfeng Zhang, Xu Zhu, Yujie Liu, Yong Liang Guan, David González G., Vincent K. N. Lau

Massive multi-input multi-output (MIMO) combined with orthogonal time frequency space (OTFS) modulation has emerged as a promising technique for high-mobility scenarios. However, its performance could be severely degraded due to channel aging caused by user mobility and high processing latency. In this paper, an integrated scheme of uplink (UL) channel estimation and downlink (DL) channel prediction is proposed to alleviate channel aging in time division duplex (TDD) massive MIMO-OTFS systems. Specifically, first, an iterative basis expansion model (BEM) based UL channel estimation scheme is proposed to accurately estimate UL channels with the aid of carefully designed OTFS frame pattern. Then a set of Slepian sequences are used to model the estimated UL channels, and the dynamic Slepian coefficients are fitted by a set of orthogonal polynomials. A channel predictor is derived to predict DL channels by iteratively extrapolating the Slepian coefficients. Simulation results verify that the proposed UL channel estimation and DL channel prediction schemes outperform the existing schemes in terms of normalized mean square error of channel estimation/prediction and DL spectral efficiency, with less pilot overhead.

Subject: Signal Processing

Publish: 2025-07-02 08:05:40 UTC


#25 SDR-Empowered Environment Sensing Design and Experimental Validation Using OTFS-ISAC Signals [PDF] [Copy] [Kimi] [REL]

Authors: Jun Wu, Yuye Shi, Weijie Yuan, Qingqing Cheng, Buyi Li, Xinyuan Wei

This paper investigates the system design and experimental validation of integrated sensing and communication (ISAC) for environmental sensing, which is expected to be a critical enabler for next-generation wireless networks. We advocate exploiting orthogonal time frequency space (OTFS) modulation for its inherent sparsity and stability in delay-Doppler (DD) domain channels, facilitating a low-overhead environment sensing design. Moreover, a comprehensive environmental sensing framework is developed, encompassing DD domain channel estimation, target localization, and experimental validation. In particular, we first explore the OTFS channel estimation in the presence of fractional delay and Doppler shifts. Given the estimated parameters, we propose a three-ellipse positioning algorithm to localize the target's position, followed by determining the mobile transmitter's velocity. Additionally, to evaluate the performance of our proposed design, we conduct extensive simulations and experiments using a software-defined radio (SDR)-based platform with universal software radio peripheral (USRP). The experimental validations demonstrate that our proposed approach outperforms the benchmarks in terms of localization accuracy and velocity estimation, confirming its effectiveness in practical environmental sensing applications.

Subject: Signal Processing

Publish: 2025-07-02 07:29:01 UTC