2025-06-13 | | Total: 85
Row-Column Arrays (RCAs) offer an attractive alternative to fully wired 2D-arrays for 3D-ultrasound, due to their greatly simplified wiring. However, conventional RCAs face challenges related to their long elements. These include an inability to image beyond the shadow of the aperture and an inability to focus in both transmit and receive for desired scan planes. To address these limitations, we recently developed bias-switchable RCAs, also known as Top Orthogonal to Bottom Electrode (TOBE) arrays. These arrays provide novel opportunities to read out from every element of the array and achieve high-quality images. While TOBE arrays and their associated imaging schemes have shown promise, they have not yet been directly compared experimentally to conventional RCA imaging techniques. This study aims to provide such a comparison, demonstrating superior B-scan and volumetric images from two electrostrictive relaxor TOBE arrays, using a method called Fast Orthogonal Row-Column Electronic scanning (FORCES), compared to conventional RCA imaging schemes, including Tilted Plane Wave (TPW) compounding and Virtual Line Source (VLS) imaging. The study quantifies resolution and Generalized Contrast to Noise Ratio (gCNR) in phantoms, and also demonstrates volumetric acquisitions in phantom and animal models.
Quality assurance is a critical but underexplored area in digital pathology, where even minor artifacts can have significant effects. Artifacts have been shown to negatively impact the performance of AI diagnostic models. In current practice, trained staff manually review digitized images prior to release of these slides to pathologists which are then used to render a diagnosis. Conventional image processing approaches, provide a foundation for detecting artifacts on digital pathology slides. However, current tools do not leverage deep learning, which has the potential to improve detection accuracy and scalability. Despite these advancements, methods for quality assurance in digital pathology remain limited, presenting a gap for innovation. We propose an AI algorithm designed to screen digital pathology slides by analyzing tiles and categorizing them into one of 10 predefined artifact types or as background. This algorithm identifies and localizes artifacts, creating a map that highlights regions of interest. By directing human operators to specific tiles affected by artifacts, the algorithm minimizes the time and effort required to manually review entire slides for quality issues. From internal archives and The Cancer Genome Atlas, 133 whole slide images were selected and 10 artifacts were annotated using an internally developed software ZAPP (Mayo Clinic, Jacksonville, FL). Ablation study of multiple models at different tile sizes and magnification was performed. InceptionResNet was selected. Single artifact models were trained and tested, followed by a limited multiple instance model with artifacts that performed well together (chatter, fold, and pen). From the results of this study we suggest a hybrid design for artifact screening composed of both single artifact binary models as well as multiple instance models to optimize detection of each artifact.
Theory and methods to obtain parametric reduced-order models by moment matching are presented. The definition of the parametric moment is introduced, and methods (model-based and data-driven) for the approximation of the parametric moment of linear and nonlinear parametric systems are proposed. These approximations are exploited to construct families of parametric reduced-order models that match the approximate parametric moment of the system to be reduced and preserve key system properties such as asymptotic stability and dissipativity. The use of the model reduction methods is illustrated by means of a parametric benchmark model for the linear case and a large-scale wind farm model for the nonlinear case. In the illustration, a comparison of the proposed approximation methods is drawn and their advantages/disadvantages are discussed.
Medical image segmentation is a fundamental and key technology in computer-aided diagnosis and treatment. Previous methods can be broadly classified into three categories: convolutional neural network (CNN) based, Transformer based, and hybrid architectures that combine both. However, each of them has its own limitations, such as restricted receptive fields in CNNs or the computational overhead caused by the quadratic complexity of Transformers. Recently, the Receptance Weighted Key Value (RWKV) model has emerged as a promising alternative for various vision tasks, offering strong long-range modeling capabilities with linear computational complexity. Some studies have also adapted RWKV to medical image segmentation tasks, achieving competitive performance. However, most of these studies focus on modifications to the Vision-RWKV (VRWKV) mechanism and train models from scratch, without exploring the potential advantages of leveraging pre-trained VRWKV models for medical image segmentation tasks. In this paper, we propose Med-URWKV, a pure RWKV-based architecture built upon the U-Net framework, which incorporates ImageNet-based pretraining to further explore the potential of RWKV in medical image segmentation tasks. To the best of our knowledge, Med-URWKV is the first pure RWKV segmentation model in the medical field that can directly reuse a large-scale pre-trained VRWKV encoder. Experimental results on seven datasets demonstrate that Med-URWKV achieves comparable or even superior segmentation performance compared to other carefully optimized RWKV models trained from scratch. This validates the effectiveness of using a pretrained VRWKV encoder in enhancing model performance. The codes will be released.
Automotive radars are one of the essential enablers of advanced driver assistance systems (ADASs). Continuous monitoring of the functional safety and reliability of automotive radars is a crucial requirement to prevent accidents and increase road safety. One of the most critical aspects to monitor in this context is radar channel imbalances, as they are a key parameter regarding the reliability of the radar. These imbalances may originate from several parameter variations or hardware fatigues, e.g., a solder ball break (SBB), and may affect some radar processing steps, such as the angle of arrival estimation. In this work, a novel method for online estimation of automotive radar channel imbalances is proposed. The proposed method exploits a normalized least mean squares (NLMS) algorithm as a block in the processing chain of the radar to estimate the channel imbalances. The input of this block is the detected targets in the range-Doppler map of the radar on the road without any prior knowledge on the angular parameters of the targets. This property in combination with low computational complexity of the NLMS, makes the proposed method suitable for online channel imbalance estimation, in parallel to the normal operation of the radar. Furthermore, it features reduced dependency on specific targets of interest and faster update rates of the channel imbalance estimation compared to the majority of state-of-the-art methods. This improvement is achieved by allowing for multiple targets in the angular spectrum, whereas most other methods are restricted to only single targets in the angular spectrum. The performance of the proposed method is validated using various simulation scenarios and is supported by measurement results.
Various domains such as power system stability analysis, electric machine modeling, and control of power electronic converters have significantly benefited from the application of coordinate transformations. One of the main benefits is the dimensional reduction, which reduces the complexity of the problems. This paper introduces a novel general transformation based on a geometric framework that directly identifies the plane containing the locus for unbalanced quantities through bivector analysis using Geometric Algebra. The proposed method provides a direct transformation valid for any degree of unbalance in $n$-phase, $(n+1)$-wire sinusoidal systems. The transformation requires only two measurements (voltage or current) taken at different time instants, making it computationally efficient. Moreover, we demonstrate through pure geometric reasoning that our approach is general and encompasses other techniques, such as the classical Clarke transformation. Numerical simulations and experimental validation using a real-time digital simulator and a physical laboratory setup demonstrate the effectiveness of the proposed method. This generalization to multi-dimensional systems, combined with the reduced measurement requirements, represents a significant advancement over existing approaches that are typically restricted to three-phase applications or suffer from computational limitations.
This study introduces a novel metric, the Index of Visual Similarity (IVS), to qualitatively characterize boiling heat transfer regimes using only visual data. The IVS is constructed by combining morphological similarity, through SIFT-based feature matching, with physical similarity, via vapor area estimation using Mask R-CNN. High-speed images of pool boiling on two distinct surfaces, polished copper and porous copper foam, are employed to demonstrate the generalizability of the approach. IVS captures critical changes in bubble shape, size, and distribution that correspond to transitions in heat transfer mechanisms. The metric is validated against an equivalent metric, $\Phi$, derived from measured heat transfer coefficients (HTC), showing strong correlation and reliability in detecting boiling regime transitions, including the onset of nucleate boiling and proximity to critical heat flux (CHF). Given experimental limitations in precisely measuring changes in HTC, the sensitivity of IVS to surface superheat is also examined to reinforce the credibility of IVS. IVS thus emerges as a powerful, rapid, and non-intrusive tool for real-time, image-based boiling diagnostics, with promising applications in phase change heat transfer.
Following the successful paradigm shift of large language models, leveraging pre-training on a massive corpus of data and fine-tuning on different downstream tasks, generalist models have made their foray into computer vision. The introduction of Segment Anything Model (SAM) set a milestone on segmentation of natural images, inspiring the design of a multitude of architectures for medical image segmentation. In this survey we offer a comprehensive and in-depth investigation on generalist models for medical image segmentation. We start with an introduction on the fundamentals concepts underpinning their development. Then, we provide a taxonomy on the different declinations of SAM in terms of zero-shot, few-shot, fine-tuning, adapters, on the recent SAM 2, on other innovative models trained on images alone, and others trained on both text and images. We thoroughly analyze their performances at the level of both primary research and best-in-literature, followed by a rigorous comparison with the state-of-the-art task-specific models. We emphasize the need to address challenges in terms of compliance with regulatory frameworks, privacy and security laws, budget, and trustworthy artificial intelligence (AI). Finally, we share our perspective on future directions concerning synthetic data, early fusion, lessons learnt from generalist models in natural language processing, agentic AI and physical AI, and clinical translation.
This paper presents a scenario based robust optimization framework for short term energy scheduling in electricity intensive industrial plants, explicitly addressing uncertainty in planning decisions. The model is formulated as a two-stage Mixed Integer Linear Program (MILP) and integrates a hybrid scenario generation method capable of representing uncertain inputs such as electricity prices, renewable generation, and internal demand. A convex objective function combining expected and worst case operational costs allows for tunable risk aversion, enabling planners to balance economic performance and robustness. The resulting schedule ensures feasibility across all scenarios and supports coordinated use of industrial flexibility assets, including battery energy storage and shiftable production. To isolate the effects of market volatility, the framework is applied to a real world cement manufacturing case study considering only day-ahead electricity price uncertainty, with all other inputs treated deterministically. Results show improved resilience to forecast deviations, reduced cost variability, and more consistent operations. The proposed method offers a scalable and risk-aware approach for industrial flexibility planning under uncertainty.
It is a challenging problem to jointly optimize the base station (BS) precoding matrix and the reconfigurable intelligent surface (RIS) phases simultaneously in a RIS-assisted multiple-user multiple-input-multiple-output (MU-MIMO) scenario when the size of the RIS becomes extremely large. In this paper, we propose a deep reinforcement learning algorithm called sequential multi-agent advantage actor-critic (A2C) to solve this problem. In addition, the discrete phase of RISs, imperfect channel state information (CSI), and channel correlations between users are taken into consideration. The computational complexity is also analyzed, and the performance of the proposed algorithm is compared with the zero-forcing (ZF) beamformer in terms of the sum spectral efficiency (SE). It is noted that the computational complexity of the proposed algorithm is lower than the benchmark, while the performance is better than the benchmark. Throughout simulations, it is also found that the proposed algorithm is robust to medium channel estimation error.
Large-scale ASR models have achieved remarkable gains in accuracy and robustness. However, fairness issues remain largely unaddressed despite their critical importance in real-world applications. In this work, we introduce FairASR, a system that mitigates demographic bias by learning representations that are uninformative about group membership, enabling fair generalization across demographic groups. Leveraging a multi-demographic dataset, our approach employs a gradient reversal layer to suppress demographic-discriminative features while maintaining the ability to capture generalizable speech patterns through an unsupervised contrastive loss. Experimental results show that FairASR delivers competitive overall ASR performance while significantly reducing performance disparities across different demographic groups.
We propose a variant of the Rapidly Exploring Random Tree Star (RRT$^{\star}$) algorithm to synthesize trajectories satisfying a given spatio-temporal specification expressed in a fragment of Signal Temporal Logic (STL) for linear systems. Previous approaches for planning trajectories under STL specifications using sampling-based methods leverage either mixed-integer or non-smooth optimization techniques, with poor scalability in the horizon and complexity of the task. We adopt instead a control-theoretic perspective on the problem, based on the notion of set forward invariance. Specifically, from a given STL task defined over polyhedral predicates, we develop a novel algorithmic framework by which the task is efficiently encoded into a time-varying set via linear programming, such that trajectories evolving within the set also satisfy the task. Forward invariance properties of the resulting set with respect to the system dynamics and input limitations are then proved via non-smooth analysis. We then present a modified RRT$^{\star}$ algorithm to synthesize asymptotically optimal and dynamically feasible trajectories satisfying a given STL specification, by sampling a tree of trajectories within the previously constructed time-varying set. We showcase two use cases of our approach involving an autonomous inspection of the International Space Station and room-servicing task requiring timed revisit of a charging station.
The rapid evolution of wearable technologies, such as AR glasses, demands compact, energy-efficient sensors capable of high-precision measurements in dynamic environments. Traditional Frequency-Modulated Continuous Wave (FMCW) Laser Feedback Interferometry (LFI) sensors, while promising, falter in applications that feature small distances, high velocities, shallow modulation, and low-power constraints. We propose a novel sensor-processing pipeline that reliably extracts distance and velocity measurements at distances as low as 1 cm. As a core contribution, we introduce a four-ramp modulation scheme that resolves persistent ambiguities in beat frequency signs and overcomes spectral blind regions caused by hardware limitations. Based on measurements of the implemented pipeline, a noise model is defined to evaluate its performance and sensitivity to several algorithmic and working point parameters. We show that the pipeline generally achieves robust and low-noise measurements using state-of-the-art hardware.
Sensor-based local inference at IoT devices faces severe computational limitations, often requiring data transmission over noisy wireless channels for server-side processing. To address this, split-network Deep Neural Network (DNN) based Joint Source-Channel Coding (JSCC) schemes are used to extract and transmit relevant features instead of raw data. However, most existing methods rely on fixed network splits and static configurations, lacking adaptability to varying computational budgets and channel conditions. In this paper, we propose a novel SNR- and computation-adaptive distributed CNN framework for wireless image classification across IoT devices and edge servers. We introduce a learning-assisted intelligent Genetic Algorithm (LAIGA) that efficiently explores the CNN hyperparameter space to optimize network configuration under given FLOPs constraints and given SNR. LAIGA intelligently discards the infeasible network configurations that exceed computational budget at IoT device. It also benefits from the Random Forests based learning assistance to avoid a thorough exploration of hyperparameter space and to induce application specific bias in candidate optimal configurations. Experimental results demonstrate that the proposed framework outperforms fixed-split architectures and existing SNR-adaptive methods, especially under low SNR and limited computational resources. We achieve a 10\% increase in classification accuracy as compared to existing JSCC based SNR-adaptive multilayer framework at an SNR as low as -10dB across a range of available computational budget (1M to 70M FLOPs) at IoT device.
Deep neural networks have been applied to audio spectrograms for respiratory sound classification, but it remains challenging to achieve satisfactory performance due to the scarcity of available data. Moreover, domain mismatch may be introduced into the trained models as a result of the respiratory sound samples being collected from various electronic stethoscopes, patient demographics, and recording environments. To tackle this issue, we proposed a modified MaskedAutoencoder(MAE) model, named Disentangling Dual-Encoder MAE (DDE-MAE) for respiratory sound classification. Two independent encoders were designed to capture disease-related and disease-irrelevant information separately, achieving feature disentanglement to reduce the domain mismatch. Our method achieves a competitive performance on the ICBHI dataset.
Medical images are usually collected from multiple domains, leading to domain shifts that impair the performance of medical image segmentation models. Domain Generalization (DG) aims to address this issue by training a robust model with strong generalizability. Recently, numerous domain randomization-based DG methods have been proposed. However, these methods suffer from the following limitations: 1) constrained efficiency of domain randomization due to their exclusive dependence on image style perturbation, and 2) neglect of the adverse effects of over-augmented images on model training. To address these issues, we propose a novel domain randomization-based DG method, called content style augmentation (ConStyX), for generalizable medical image segmentation. Specifically, ConStyX 1) augments the content and style of training data, allowing the augmented training data to better cover a wider range of data domains, and 2) leverages well-augmented features while mitigating the negative effects of over-augmented features during model training. Extensive experiments across multiple domains demonstrate that our ConStyX achieves superior generalization performance. The code is available at https://github.com/jwxsp1/ConStyX.
This chapter focuses on a hardware architecture for semi-passive Reconfigurable Intelligent Surfaces (RISs) and investigates its consideration for boosting the performance of Multiple-Input Multiple-Output (MIMO) communication systems. The architecture incorporates a single or multiple radio-frequency chains to receive pilot signals via tunable absorption phase profiles realized by the metasurface front end, as well as a controller encompassing a baseband processing unit to carry out channel estimation, and consequently, the optimization of the RIS reflection coefficients. A novel channel estimation protocol, according to which the RIS receives non-orthogonal training pilot sequences from two multi-antenna terminals via tunable absorption phase profiles, and then, estimates the respective channels via its signal processing unit, is presented. The channel estimates are particularly used by the RIS controller to design the capacity-achieving reflection phase configuration of the metasurface front end. The proposed channel estimation algorithm, which is based on the Alternating Direction Method of Multipliers (ADMM), profits from the RIS random spatial absorption sampling to capture the entire signal space, and exploits the beamspace sparsity and low-rank properties of extremely large MIMO channels, which is particularly relevant for communication systems at the FR3 band and above. Our extensive numerical investigations showcase the superiority of the proposed channel estimation technique over benchmark schemes for various system and RIS hardware configuration parameters, as well as the effectiveness of using channel estimates at the RIS side to dynamically optimize the possibly phase-quantized reflection coefficients of its unit elements.
Speech recognisers usually perform optimally only in a specific environment and need to be adapted to work well in another. For adaptation to a new speaker, there is often too little data for fine-tuning to be robust, and that data is usually unlabelled. This paper proposes a combination of approaches to make adaptation to a single minute of data robust. First, instead of estimating the adaptation parameters with cross-entropy on a single error-prone hypothesis or "pseudo-label", this paper proposes a novel loss function, the conditional entropy over complete hypotheses. Using multiple hypotheses makes adaptation more robust to errors in the initial recognition. Second, a "speaker code" characterises a speaker in a vector short enough that it requires little data to estimate. On a far-field noise-augmented version of Common Voice, the proposed scheme yields a 20% relative improvement in word error rate on one minute of adaptation data, increasing on 10 minutes to 29%.
In this letter, a pinching antennas (PAs) assisted rate splitting multiple access (RSMA) system with multiple waveguides is investigated to maximize sum rate. A two-step algorithm is proposed to determine PA activation scheme and optimize the waveguide beamforming. Specifically, a low complexity spatial correlation and distance based method is proposed for PA activation selection. After determining the PA activation status, a semi-definite programming (SDP) based successive convex approximation (SCA) is leveraged to obtain the optimal waveguide beamforming. Simulation results show that the proposed multiple waveguides based PAs assisted RSMA method achieves better performance than various benchmarking schemes.
Model predictive control (MPC) for tracking is a recently introduced approach, which extends standard MPC formulations by incorporating an artificial reference as an additional optimization variable, in order to track external and potentially time-varying references. In this work, we analyze the performance of such an MPC for tracking scheme without a terminal cost and terminal constraints. We derive a transient performance estimate, i.e. a bound on the closed-loop performance over an arbitrary time interval, yielding insights on how to select the scheme's parameters for performance. Furthermore, we show that in the asymptotic case, where the prediction horizon and observed time interval tend to infinity, the closed-loop solution of MPC for tracking recovers the infinite horizon optimal solution.
This paper presents a joint system modeling approach for fault simulation of all-electric auxiliary power unit (APU), integrating starter/generator turn-to-turn short circuit (TTSC) faults with gas generator gas-path faults.To address challenges in electromechanical coupling, simulation precision and computational efficiency balance, we propose a multi-rate continuous-discrete hybrid simulation architecture. This architecture treats the starter/generator as a continuous system with variable step size in Simulink, while modeling the gas generator as a discrete system with fixed step size in a dynamic-link library (DLL) environment. For the starter/generator fault modeling, a multi-loop approach is deployed to accurately simulate TTSC faults. For the gas generator, we develop an improved GasTurb-DLL modeling method (IGDM) that enhances uncertainty modeling, state-space representation, and tool chain compatibility. Finally, the proposed methodology above was implemented in a case study based on the APS5000 all-electric APU structure and parameters. Model validation was conducted by comparing simulation results--covering steady-state, transients, healthy, and fault conditions--with reference data from third-party software and literature. The close agreement confirms both the model's accuracy and the effectiveness of our modeling methodology. This work establishes a modeling foundation for investigating the opportunities and challenges in fault detection and isolation (FDI) brought by the all electrification of the APU, including joint fault estimation and diagnosis, coupled electromechanical fault characteristics.
By using an automated braking system, such as the Automatic Emergency Brake (AEB), crashes can be avoided in situations where the driver is unaware of an imminent collision. However, conventional AEB systems detect potential collision adversaries with onboard sensor systems, such as radars and cameras, that may fail in non-line-of-sight situations. By leveraging vehicle-to-everything (V2X) communication, information regarding an approaching vehicle can be received by the ego vehicle at an early point in time, even if the opponent vehicle is occluded by a view obstruction. In this work, we consider a 2-stage braking cascade, consisting of a partial brake, triggered based on V2X information, and a sensor-triggered AEB. We evaluate its crash avoidance performance in real-world crash situations extracted from the German In-Depth Accident Study (GIDAS) database using an accident simulation framework. The results are compared against a sensor-triggered AEB system and a purely V2X-triggered partial brake. To further analyze the results, we identify the crash cause for each situation in which the brake function under test could not prevent the crash. The simulation results show a high added benefit of the V2X-enhanced braking systems compared to the exclusive use of visual-based sensor systems for automated collision prevention.
Ultra-reliable low-latency communications (URLLC) demand decoding algorithms that simultaneously offer high reliability and low complexity under stringent latency constraints. While iterative decoding schemes for LDPC and Polar codes offer a good compromise between performance and complexity, they fall short in approaching the theoretical performance limits in the typical URLLC short block length regime. Conversely, quasi-ML decoding schemes for algebraic codes, like Chase-II decoding, exhibit a smaller gap to optimum decoding but are computationally prohibitive for practical deployment in URLLC systems. To bridge this gap, we propose an enhanced Chase-II decoding algorithm that leverages a neural network (NN) to predict promising perturbation patterns, drastically reducing the number of required decoding trials. The proposed approach combines the reliability of quasi-ML decoding with the efficiency of NN inference, making it well-suited for time-sensitive and resource-constrained applications.
This work proposes an automatic control solution for the operation of conventional wastewater treatment plants (WWTPs) as energy-autonomous water resource recovery facilities. We first conceptualize a classification of the quality of treated water for three resource recovery applications (environmental, industrial, and agricultural water reuse). We then present an output-feedback model predictive controller (Output MPC) that operates a plant to produce water of specific quality class, while also producing sufficient biogas to ensure nonpositive energy costs. The controller is demonstrated in the long-term operation of a full-scale WWTP subjected to typical influent loads and periodically changing quality targets. Our results provide a proof-of-concept on the energy-autonomous operation of existing wastewater treatment infrastructure with control strategies that are general enough to accommodate a wide range of resource recovery objectives.
The semi-tensor product (STP) of vectors is a generalization of conventional inner product of vectors, which allows the factor vectors to of different dimensions. This paper proposes a domain-based convolutional product (CP). Combining domain-based CP with STP of vectors, a new CP is proposed. Since there is no zero or any other padding, it can avoid the junk information caused by padding. Using it, the STP-based convolutional neural network (CNN) is developed. Its application to image and third order signal identifications is considered.