2024-12-03 | | Total: 158
Transfer learning is an umbrella term for machine learning approaches that leverage knowledge gained from solving one problem (the source domain) to improve speed, efficiency, and data requirements in solving a different but related problem (the target domain). The performance of the transferred model in the target domain is typically measured via some notion of loss function in the target domain. This paper focuses on effectively transferring control logic from a source control system to a target control system while providing approximately similar behavioral guarantees in both domains. However, in the absence of a complete characterization of behavioral specifications, this problem cannot be captured in terms of loss functions. To overcome this challenge, we use (approximate) simulation relations to characterize observational equivalence between the behaviors of two systems. Simulation relations ensure that the outputs of both systems, equipped with their corresponding controllers, remain close to each other over time, and their closeness can be quantified {\it a priori}. By parameterizing simulation relations with neural networks, we introduce the notion of \emph{neural simulation relations}, which provides a data-driven approach to transfer any synthesized controller, regardless of the specification of interest, along with its proof of correctness. Compared with prior approaches, our method eliminates the need for a closed-loop mathematical model and specific requirements for both the source and target systems. We also introduce validity conditions that, when satisfied, guarantee the closeness of the outputs of two systems equipped with their corresponding controllers, thus eliminating the need for post-facto verification. We demonstrate the effectiveness of our approach through case studies involving a vehicle and a double inverted pendulum.
The communication control delay between the inverters and the power plant controller can be caused by several factors related to the communication link between them. Under undesirable conditions, high delay values can produce oscillations in the wind power plant that can affect the rest of the power system. In this work, we present a new robust methodology for wind turbines to estimate the value of the communication control delay using PMU data. Several scenarios are considered where external faults are simulated and the performance of the algorithm is evaluated based on dynamic state estimation of the mathematical model of the wind turbine. In this paper, we have shown that the characterization of the delay can be performed offering the transmission system operator an online tool to identify the most suited communication delay for the plant controller models used in dynamic studies.
This paper presents a novel approach to designing millimeter-wave (mmWave) cellular communication systems, based on joint phase time array (JPTA) radio frequency (RF) frontend architecture. JPTA architecture comprises time-delay components appended to conventional phase shifters, which offer extra degrees of freedom to be exploited for designing frequency-selective analog beams. Hence, a mmWave device equipped with JPTA can receive and transmit signals in multiple directions in a single time slot per RF chain, one direction per frequency subband, which alleviates the traditional constraint of one analog beam per transceiver chain per time slot. The utilization of subband-specific analog beams offers a new opportunity in designing mmWave systems, allowing for enhanced cell capacity and reduced pilot overhead. To understand the practical feasibility of JPTA, a few challenges and system design considerations are discussed in relation to the performance and complexity of the JPTA systems. For example, frequency-selective beam gain losses are present for the subband analog beams, e.g., up to 1 dB losses for 2 subband cases, even with the state-of-the-art JPTA delay and phase optimization methods. Despite these side effects, system-level analysis reveals that the JPTA system is capable of improving cell capacity: the 5%-tile throughput by up to 65%.
Private 5G networks provide enhanced security, a wide range of optimized services through network slicing, reduced latency, and support for many IoT devices in a specific area, all under the owner's full control. Higher security and privacy to protect sensitive data is the most significant advantage of private networks, in e.g., smart hospitals. For long-term sustainability and cost-effectiveness of private 5G networks, analyzing and understanding the energy consumption variation holds a greater significance in reaching toward green private network architecture for 6G. This paper addresses this research gap by providing energy profiling of network components using an experimental laboratory setup that mimics real private 5G networks under various network conditions, which is a missing aspect in the existing literature.
Recent studies showed that network slices (NSs), which are logical networks supported by shared physical networks, can experience service interference due to sharing of physical and virtual resources. Thus, from the perspective of providing end-to-end (E2E) service quality assurance in 5G/6G systems, it is crucial to discover possible service interference among the NSs in a timely manner and isolate the potential issues before they can lead to violations of service quality agreements. We study the problem of detecting service interference among NSs in 5G/6G systems, only using E2E key performance indicator measurements, and propose a new algorithm. Our numerical studies demonstrate that, even when the service interference among NSs is weak to moderate, provided that a reasonable number of measurements are available, the proposed algorithm can correctly identify most of shared resources that can cause service interference among the NSs that utilize the shared resources.
An analogue of the describing function method is developed using square waves rather than sinusoids. Static nonlinearities map square waves to square waves, and their behavior is characterized by their response to square waves of varying amplitude - their amplitude response. The output of an LTI system to a square wave input is approximated by a square wave, to give an analogue of the describing function. The classical describing function method for predicting oscillations in feedback interconnections is generalized to this square wave setting, and gives accurate predictions when oscillations are approximately square.
Developing methods to process irregularly structured data is crucial in applications like gene-regulatory, brain, power, and socioeconomic networks. Graphs have been the go-to algebraic tool for modeling the structure via nodes and edges capturing their interactions, leading to the establishment of the fields of graph signal processing (GSP) and graph machine learning (GML). Key graph-aware methods include Fourier transform, filtering, sampling, as well as topology identification and spatiotemporal processing. Although versatile, graphs can model only pairwise dependencies in the data. To this end, topological structures such as simplicial and cell complexes have emerged as algebraic representations for more intricate structure modeling in data-driven systems, fueling the rapid development of novel topological-based processing and learning methods. This paper first presents the core principles of topological signal processing through the Hodge theory, a framework instrumental in propelling the field forward thanks to principled connections with GSP-GML. It then outlines advances in topological signal representation, filtering, and sampling, as well as inferring topological structures from data, processing spatiotemporal topological signals, and connections with topological machine learning. The impact of topological signal processing and learning is finally highlighted in applications dealing with flow data over networks, geometric processing, statistical ranking, biology, and semantic communication.
Deep learning models are widely used to process Computed Tomography (CT) data in the automated screening of pulmonary diseases, significantly reducing the workload of physicians. However, the three-dimensional nature of CT volumes involves an excessive number of voxels, which significantly increases the complexity of model processing. Previous screening approaches often overlook this issue, which undoubtedly reduces screening efficiency. Towards efficient and effective screening, we design a hierarchical approach to reduce the computational cost of pulmonary disease screening. The new approach re-organizes the screening workflows into three steps. First, we propose a Computed Tomography Volume Compression (CTVC) method to select a small slice subset that comprehensively represents the whole CT volume. Second, the selected CT slices are used to detect pulmonary diseases coarsely via a lightweight classification model. Third, an uncertainty measurement strategy is applied to identify samples with low diagnostic confidence, which are re-detected by radiologists. Experiments on two public pulmonary disease datasets demonstrate that our approach achieves comparable accuracy and recall while reducing the time by 50%-70% compared with the counterparts using full CT volumes. Besides, we also found that our approach outperforms previous cutting-edge CTVC methods in retaining important indications after compression.
Scoliosis is traditionally assessed based solely on 2D lateral deviations, but recent studies have also revealed the importance of other imaging planes in understanding the deformation of the spine. Consequently, extracting the spinal geometry in 3D would help quantify these spinal deformations and aid diagnosis. In this study, we propose an automated general framework to estimate the 3D spine shape from 2D DXA scans. We achieve this by explicitly predicting the sagittal view of the spine from the DXA scan. Using these two orthogonal projections of the spine (coronal in DXA, and sagittal from the prediction), we are able to describe the 3D shape of the spine. The prediction is learnt from over 30k paired images of DXA and MRI scans. We assess the performance of the method on a held out test set, and achieve high accuracy.
Delay-Doppler (DD) signal processing has emerged as a powerful tool for analyzing multipath and time-varying channel effects. Due to the inherent sparsity of the wireless channel in the DD domain, compressed sensing (CS) based techniques, such as orthogonal matching pursuit (OMP), are commonly used for channel estimation. However, many of these methods assume integer Doppler shifts, which can lead to performance degradation in the presence of fractional Doppler. In this paper, we propose a windowed dictionary design technique while we develop a delay-aware orthogonal matching pursuit (DA-OMP) algorithm that mitigates the impact of fractional Doppler shifts on DD domain channel estimation. First, we apply receiver windowing to reduce the correlation between the columns of our proposed dictionary matrix. Second, we introduce a delay-aware interference block to quantify the interference caused by fractional Doppler. This approach removes the need for a pre-determined stopping criterion, which is typically based on the number of propagation paths, in conventional OMP algorithm. Our simulation results confirm the effective performance of our proposed DA-OMP algorithm using the proposed windowed dictionary in terms of normalized mean square error (NMSE) of the channel estimate. In particular, our proposed DA-OMP algorithm demonstrates substantial gains compared to standard OMP algorithm in terms of channel estimation NMSE with and without windowed dictionary.
Large-scale pre-trained audio and image models demonstrate an unprecedented degree of generalization, making them suitable for a wide range of applications. Here, we tackle the specific task of sound-prompted segmentation, aiming to segment image regions corresponding to objects heard in an audio signal. Most existing approaches tackle this problem by fine-tuning pre-trained models or by training additional modules specifically for the task. We adopt a different strategy: we introduce a training-free approach that leverages Non-negative Matrix Factorization (NMF) to co-factorize audio and visual features from pre-trained models to reveal shared interpretable concepts. These concepts are passed to an open-vocabulary segmentation model for precise segmentation maps. By using frozen pre-trained models, our method achieves high generalization and establishes state-of-the-art performance in unsupervised sound-prompted segmentation, significantly surpassing previous unsupervised methods.
This paper focuses on identification of the state noise density of a linear time-varying system described by the state-space model with the known measurement noise density. For this purpose, a novel method extending the capabilities of the measurement difference method (MDM) is proposed. The proposed method is based on the enhanced MDM residue calculation being a sum of the state and measurement noise, and on the construction of the residue sample kernel density. The state noise density is then estimated by the density deconvolution algorithm utilising the Fourier transform. The developed method is supplemented with automatic selection of the deconvolution user-defined parameters based on the proposed method of the noise moment equality. The state noise density estimation performance is evaluated in numerical examples and supplemented with the MALAB example implementation.
In a recent paper, we presented the KU Leuven audiovisual, gaze-controlled auditory attention decoding (AV-GC-AAD) dataset, in which we recorded electroencephalography (EEG) signals of participants attending to one out of two competing speakers under various audiovisual conditions. The main goal of this dataset was to disentangle the direction of gaze from the direction of auditory attention, in order to reveal gaze-related shortcuts in existing spatial AAD algorithms that aim to decode the (direction of) auditory attention directly from the EEG. Various methods based on spatial AAD do not achieve significant above-chance performances on our AV-GC-AAD dataset, indicating that previously reported results were mainly driven by eye gaze confounds in existing datasets. Still, these adverse outcomes are often discarded for reasons that are attributed to the limitations of the AV-GC-AAD dataset, such as the limited amount of data to train a working model, too much data heterogeneity due to different audiovisual conditions, or participants allegedly being unable to focus their auditory attention under the complex instructions. In this paper, we present the results of the linear stimulus reconstruction AAD algorithm and show that high AAD accuracy can be obtained within each individual condition and that the model generalizes across conditions, across new subjects, and even across datasets. Therefore, we eliminate any doubts that the inadequacy of the AV-GC-AAD dataset is the primary reason for the (spatial) AAD algorithms failing to achieve above-chance performance when compared to other datasets. Furthermore, this report provides a simple baseline evaluation procedure (including source code) that can serve as the minimal benchmark for all future AAD algorithms evaluated on this dataset.
This paper proposes to use similarities of audio captions for estimating audio-caption relevances to be used for training text-based audio retrieval systems. Current audio-caption datasets (e.g., Clotho) contain audio samples paired with annotated captions, but lack relevance information about audio samples and captions beyond the annotated ones. Besides, mainstream approaches (e.g., CLAP) usually treat the annotated pairs as positives and consider all other audio-caption combinations as negatives, assuming a binary relevance between audio samples and captions. To infer the relevance between audio samples and arbitrary captions, we propose a method that computes non-binary audio-caption relevance scores based on the textual similarities of audio captions. We measure textual similarities of audio captions by calculating the cosine similarity of their Sentence-BERT embeddings and then transform these similarities into audio-caption relevance scores using a logistic function, thereby linking audio samples through their annotated captions to all other captions in the dataset. To integrate the computed relevances into training, we employ a listwise ranking objective, where relevance scores are converted into probabilities of ranking audio samples for a given textual query. We show the effectiveness of the proposed method by demonstrating improvements in text-based audio retrieval compared to methods that use binary audio-caption relevances for training.
Safety filters ensure that control actions that are executed are always safe, no matter the controller in question. Previous work has proposed a simple and stealthy false-data injection attack for deactivating such safety filters. This attack injects false sensor measurements to bias state estimates toward the interior of a safety region, making the safety filter accept unsafe control actions. The attack does, however, require the adversary to know the dynamics of the system, the safety region used in the safety filter, and the observer gain. In this work we relax these requirements and show how a similar data-injection attack can be performed when the adversary only observes the input and output of the observer that is used by the safety filter, without any a priori knowledge about the system dynamics, safety region, or observer gain. In particular, the adversary uses the observed data to identify a state-space model that describes the observer dynamics, and then approximates a safety region in the identified embedding. We exemplify the data-driven attack on an inverted pendulum, where we show how the attack can make the system leave a safe set, even when a safety filter is supposed to stop this from happening.
Distributed fiber sensing based on correlation-aided phase-sensitive optical time domain reflectometry is presented. The focus is on correlation as an enabler for high spatial resolution. Results from different applications are presented.
A deployed fiber with in-house and underground sections is interrogated with a coherent correlation OTDR. The origin and propagation speed of a hammer-generated pressure wave in the underground section is detected and acoustic signals are monitored.
As large-scale distributed energy resources are integrated into the active distribution networks (ADNs), effective energy management in ADNs becomes increasingly prominent compared to traditional distribution networks. Although advanced reinforcement learning (RL) methods, which alleviate the burden of complicated modelling and optimization, have greatly improved the efficiency of energy management in ADNs, safety becomes a critical concern for RL applications in real-world problems. Since the design and adjustment of penalty functions, which correspond to operational safety constraints, requires extensive domain knowledge in RL and power system operation, the emerging ADN operators call for a more flexible and customized approach to address the penalty functions so that the operational safety and efficiency can be further enhanced. Empowered with strong comprehension, reasoning, and in-context learning capabilities, large language models (LLMs) provide a promising way to assist safe RL for energy management in ADNs. In this paper, we introduce the LLM to comprehend operational safety requirements in ADNs and generate corresponding penalty functions. In addition, we propose an RL2 mechanism to refine the generated functions iteratively and adaptively through multi-round dialogues, in which the LLM agent adjusts the functions' pattern and parameters based on training and test performance of the downstream RL agent. The proposed method significantly reduces the intervention of the ADN operators. Comprehensive test results demonstrate the effectiveness of the proposed method.
This paper presents a novel two-stage method for constructing channel knowledge maps (CKMs) specifically for A2G (Aerial-to-Ground) channels in the presence of non-cooperative interfering nodes (INs). We first estimate the interfering signal strength (ISS) at sampling locations based on total received signal strength measurements and the desired communication signal strength (DSS) map constructed with environmental topology. Next, an ISS map construction network (IMNet) is proposed, where a negative value correction module is included to enable precise reconstruction. Subsequently, we further execute signal-to-interference-plus-noise ratio map construction and IN localization. Simulation results demonstrate lower construction error of the proposed IMNet compared to baselines in the presence of interference.
This paper introduces a novel approach to quantify the uncertainties in fault diagnosis of motor drives using Bayesian neural networks (BNN). Conventional data-driven approaches used for fault diagnosis often rely on point-estimate neural networks, which merely provide deterministic outputs and fail to capture the uncertainty associated with the inference process. In contrast, BNNs offer a principled framework to model uncertainty by treating network weights as probability distributions rather than fixed values. It offers several advantages: (a) improved robustness to noisy data, (b) enhanced interpretability of model predictions, and (c) the ability to quantify uncertainty in the decision-making processes. To test the robustness of the proposed BNN, it has been tested under a conservative dataset of gear fault data from an experimental prototype of three fault types at first, and is then incrementally trained on new fault classes and datasets to explore its uncertainty quantification features and model interpretability under noisy data and unseen fault scenarios.
Accurate embryo morphology assessment is essential in assisted reproductive technology for selecting the most viable embryo. Artificial intelligence has the potential to enhance this process. However, the limited availability of embryo data presents challenges for training deep learning models. To address this, we trained two generative models using two datasets, one we created and made publicly available, and one existing public dataset, to generate synthetic embryo images at various cell stages, including 2-cell, 4-cell, 8-cell, morula, and blastocyst. These were combined with real images to train classification models for embryo cell stage prediction. Our results demonstrate that incorporating synthetic images alongside real data improved classification performance, with the model achieving 97% accuracy compared to 95% when trained solely on real data. Notably, even when trained exclusively on synthetic data and tested on real data, the model achieved a high accuracy of 94%. Furthermore, combining synthetic data from both generative models yielded better classification results than using data from a single generative model. Four embryologists evaluated the fidelity of the synthetic images through a Turing test, during which they annotated inaccuracies and offered feedback. The analysis showed the diffusion model outperformed the generative adversarial network model, deceiving embryologists 66.6% versus 25.3% and achieving lower Frechet inception distance scores.
In this paper, we propose a novel multi-functional reconfigurable intelligent surface (MF-RIS) that supports signal reflection, refraction, amplification, and target sensing simultaneously. Our MF-RIS aims to enhance integrated communication and sensing (ISAC) systems, particularly in multi-user and multi-target scenarios. Equipped with reflection and refraction components (i.e., amplifiers and phase shifters), MF-RIS is able to adjust the amplitude and phase shift of both communication and sensing signals on demand. Additionally, with the assistance of sensing elements, MF-RIS is capable of capturing the echo signals from multiple targets, thereby mitigating the signal attenuation typically associated with multi-hop links. We propose a MF-RIS-enabled multi-user and multi-target ISAC system, and formulate an optimization problem to maximize the signal-to-interference-plus-noise ratio (SINR) of sensing targets. This problem involves jointly optimizing the transmit beamforming and MF-RIS configurations, subject to constraints on the communication rate, total power budget, and MF-RIS coefficients. We decompose the formulated non-convex problem into three sub-problems, and then solve them via an efficient iterative algorithm. Simulation results demonstrate that: 1) The performance of MF-RIS varies under different operating protocols, and energy splitting (ES) exhibits the best performance in the considered MF-RIS-enabled multi-user multi-target ISAC system; 2) Under the same total power budget, the proposed MF-RIS with ES protocol attains 52.2%, 73.5% and 60.86% sensing SINR gains over active RIS, passive RIS, and simultaneously transmitting and reflecting RIS (STAR-RIS), respectively; 3) The number of sensing elements will no longer improve sensing performance after exceeding a certain number.
Given the spatial heterogeneity of land use patterns in most cities, large-scale UAM will likely be deployed in specific areas, e.g., inter-transfer traffic between suburbs and city centers. However, large-scale UAM operations connecting multiple origin-destination pairs raise concerns about air traffic safety and efficiency with respect to conflict movements, particularly at large conflict points similar to roadway junctions. In this work, we propose an operational framework that integrates route guidance and collision avoidance to achieve an elegant trade-off between air traffic safety and efficiency. The route guidance mechanism aims to optimize aircraft distribution across both spatial and temporal dimensions by regulating their paths (composed of waypoints). Given the optimized paths, the collision avoidance module aims to generate collision-free aircraft trajectories between waypoints in 3D space. To enable large-scale operations, we develop a fast approximation method to solve the optimal path planning problem and employ the velocity obstacle model for collision avoidance. The proposed route guidance strategy significantly reduces the computational requirements for collision avoidance. As far as we know, this work is one of the first to combine route guidance and collision avoidance for UAM. The results indicate that the framework can enable efficient and flexible UAM operations, such as air traffic assignment, congestion prevention, and dynamic airspace clearance. Compared to the management scheme based on air corridors, the proposed framework has considerable improvements in computational efficiency (433%), average travel speed (70.2%), and trip completion rate (130%). The proposed framework has demonstrated great potential for real-time traffic simulation and management in large-scale UAM systems.
Recent speaker verification (SV) systems have shown a trend toward adopting deeper speaker embedding extractors. Although deeper and larger neural networks can significantly improve performance, their substantial memory requirements hinder training on consumer GPUs. In this paper, we explore a memory-efficient training strategy for deep speaker embedding learning in resource-constrained scenarios. Firstly, we conduct a systematic analysis of GPU memory allocation during SV system training. Empirical observations show that activations and optimizer states are the main sources of memory consumption. For activations, we design two types of reversible neural networks which eliminate the need to store intermediate activations during back-propagation, thereby significantly reducing memory usage without performance loss. For optimizer states, we introduce a dynamic quantization approach that replaces the original 32-bit floating-point values with a dynamic tree-based 8-bit data type. Experimental results on VoxCeleb demonstrate that the reversible variants of ResNets and DF-ResNets can perform training without the need to cache activations in GPU memory. In addition, the 8-bit versions of SGD and Adam save 75% of memory costs while maintaining performance compared to their 32-bit counterparts. Finally, a detailed comparison of memory usage and performance indicates that our proposed models achieve up to 16.2x memory savings, with nearly identical parameters and performance compared to the vanilla systems. In contrast to the previous need for multiple high-end GPUs such as the A100, we can effectively train deep speaker embedding extractors with just one or two consumer-level 2080Ti GPUs.
Modern wireless communication systems necessitate the development of cost-effective resource allocation strategies, while ensuring maximal system performance. While commonly realizable via efficient waterfilling schemes, ergodic-optimal policies often exhibit instantaneous resource constraint fluctuations as a result of fading variability, violating prescribed specifications possibly within unacceptable margins, inducing further operational challenges and/or costs. On the other extent, short-term-optimal policies -- commonly based on deterministic waterfilling-- while strictly maintaining operational specifications, are not only impractical and computationally demanding, but also suboptimal in a long-term sense. To address these challenges, we introduce a novel distributionally robust version of a classical point-to-point interference-free multi-terminal constrained stochastic resource allocation problem, by leveraging the Conditional Value-at-Risk (CVaR) as a coherent measure of power policy fluctuation risk. We derive closed-form dual-parameterized expressions for the CVaR-optimal resource policy, along with corresponding optimal CVaR quantile levels by capitalizing on (sampling) the underlying fading distribution. We subsequently develop two dual-domain schemes -- one model-based and one model-free -- to iteratively determine a globally-optimal resource policy. Our numerical simulations confirm the remarkable effectiveness of the proposed approach, also revealing an almost-constant character of the CVaR-optimal policy and at rather minimal ergodic rate optimality loss.