Electrical Engineering and Systems Science

Date: Thu, 9 May 2024 | Total: 41

#1 SVDD Challenge 2024: A Singing Voice Deepfake Detection Challenge Evaluation Plan [PDF] [Copy] [Kimi]

Authors: You Zhang ; Yongyi Zang ; Jiatong Shi ; Ryuichi Yamamoto ; Jionghao Han ; Yuxun Tang ; Tomoki Toda ; Zhiyao Duan

The rapid advancement of AI-generated singing voices, which now closely mimic natural human singing and align seamlessly with musical scores, has led to heightened concerns for artists and the music industry. Unlike spoken voice, singing voice presents unique challenges due to its musical nature and the presence of strong background music, making singing voice deepfake detection (SVDD) a specialized field requiring focused attention. To promote SVDD research, we recently proposed the "SVDD Challenge," the very first research challenge focusing on SVDD for lab-controlled and in-the-wild bonafide and deepfake singing voice recordings. The challenge will be held in conjunction with the 2024 IEEE Spoken Language Technology Workshop (SLT 2024).

#2 Cellular Traffic Prediction Using Online Prediction Algorithms [PDF] [Copy] [Kimi]

Authors: Hossein Mehri ; Hao Chen ; Hani Mehrpouyan

The advent of 5G technology promises a paradigm shift in the realm of telecommunications, offering unprecedented speeds and connectivity. However, the efficient management of traffic in 5G networks remains a critical challenge. It is due to the dynamic and heterogeneous nature of network traffic, varying user behaviors, extended network size, and diverse applications, all of which demand highly accurate and adaptable prediction models to optimize network resource allocation and management. This paper investigates the efficacy of live prediction algorithms for forecasting cellular network traffic in real-time scenarios. We apply two live prediction algorithms on machine learning models, one of which is recently proposed Fast LiveStream Prediction (FLSP) algorithm. We examine the performance of these algorithms under two distinct data gathering methodologies: synchronous, where all network cells report statistics simultaneously, and asynchronous, where reporting occurs across consecutive time slots. Our study delves into the impact of these gathering scenarios on the predictive performance of traffic models. Our study reveals that the FLSP algorithm can halve the required bandwidth for asynchronous data reporting compared to conventional online prediction algorithms, while simultaneously enhancing prediction accuracy and reducing processing load. Additionally, we conduct a thorough analysis of algorithmic complexity and memory requirements across various machine learning models. Through empirical evaluation, we provide insights into the trade-offs inherent in different prediction strategies, offering valuable guidance for network optimization and resource allocation in dynamic environments.

#3 Stability and Performance Analysis of Discrete-Time ReLU Recurrent Neural Networks [PDF] [Copy] [Kimi]

Authors: Sahel Vahedi Noori ; Bin Hu ; Geir Dullerud ; Peter Seiler

This paper presents sufficient conditions for the stability and $\ell_2$-gain performance of recurrent neural networks (RNNs) with ReLU activation functions. These conditions are derived by combining Lyapunov/dissipativity theory with Quadratic Constraints (QCs) satisfied by repeated ReLUs. We write a general class of QCs for repeated RELUs using known properties for the scalar ReLU. Our stability and performance condition uses these QCs along with a "lifted" representation for the ReLU RNN. We show that the positive homogeneity property satisfied by a scalar ReLU does not expand the class of QCs for the repeated ReLU. We present examples to demonstrate the stability / performance condition and study the effect of the lifting horizon.

#4 RACH Traffic Prediction in Massive Machine Type Communications [PDF] [Copy] [Kimi]

Authors: Hossein Mehri ; Hao Chen ; Hani Mehrpouyan

Traffic pattern prediction has emerged as a promising approach for efficiently managing and mitigating the impacts of event-driven bursty traffic in massive machine-type communication (mMTC) networks. However, achieving accurate predictions of bursty traffic remains a non-trivial task due to the inherent randomness of events, and these challenges intensify within live network environments. Consequently, there is a compelling imperative to design a lightweight and agile framework capable of assimilating continuously collected data from the network and accurately forecasting bursty traffic in mMTC networks. This paper addresses these challenges by presenting a machine learning-based framework tailored for forecasting bursty traffic in multi-channel slotted ALOHA networks. The proposed machine learning network comprises long-term short-term memory (LSTM) and a DenseNet with feed-forward neural network (FFNN) layers, where the residual connections enhance the training ability of the machine learning network in capturing complicated patterns. Furthermore, we develop a new low-complexity online prediction algorithm that updates the states of the LSTM network by leveraging frequently collected data from the mMTC network. Simulation results and complexity analysis demonstrate the superiority of our proposed algorithm in terms of both accuracy and complexity, making it well-suited for time-critical live scenarios. We evaluate the performance of the proposed framework in a network with a single base station and thousands of devices organized into groups with distinct traffic-generating characteristics. Comprehensive evaluations and simulations indicate that our proposed machine learning approach achieves a remarkable $52\%$ higher accuracy in long-term predictions compared to traditional methods, without imposing additional processing load on the system.

#5 Performance Bounds for Velocity Estimation with Large Antenna Arrays [PDF] [Copy] [Kimi]

Authors: Caterina Giovannetti ; Nicolò Decarli ; Davide Dardari

Joint communication and sensing (JCS) is envisioned as an enabler of future 6G networks. One of the key features of these networks will be the use of extremely large aperture arrays (ELAAs) and high operating frequencies, which will result in significant near-field propagation effects. This unique property can be harnessed to improve sensing capabilities. In this paper, we focus on velocity sensing, as using ELAAs allows the estimation of not just the radial component but also the transverse component. We derive analytical performance bounds for both velocity components, demonstrating how they are affected by the different system parameters and geometries. These insights offer a foundational understanding of how near-field effects play in velocity sensing differently from the far field and from position estimate.

#6 Filtering and smoothing estimation algorithms from uncertain nonlinear observations with time-correlated additive noise and random deception attacks [PDF] [Copy] [Kimi]

Authors: R. Caballero-Águila ; J. Hu ; J. Linares-Pérez

This paper discusses the problem of estimating a stochastic signal from nonlinear uncertain observations with time-correlated additive noise described by a first-order Markov process. Random deception attacks are assumed to be launched by an adversary, and both this phenomenon and the uncertainty in the observations are modelled by two sets of Bernoulli random variables. Under the assumption that the evolution model generating the signal to be estimated is unknown and only the mean and covariance functions of the processes involved in the observation equation are available, recursive algorithms based on linear approximations of the real observations are proposed for the least-squares filtering and fixed-point smoothing problems. Finally, the feasibility and effectiveness of the developed estimation algorithms are verified by a numerical simulation example, where the impact of uncertain observation and deception attack probabilities on estimation accuracy is evaluated.

#7 Dissipativity Conditions for Maximum Dynamic Loadability [PDF] [Copy] [Kimi]

Authors: Riley Lawson ; Marija Ilic

In this paper we consider a possibility of stabilizing very fast electromagnetic interactions between Inverter Based Resources (IBRs), known as the Control Induced System Stability problems. We propose that when these oscillatory interactions are controlled the ability of the grid to deliver power to loads at high rates will be greatly increased. We refer to this grid property as the dynamic grid loadability. The approach is to start by modeling the dynamical behavior of all components. Next, to avoid excessive complexity, interactions between components are captured in terms of unified technology-agnostic aggregate variables, instantaneous power and rate of change of instantaneous reactive power. Sufficient dissipativity conditions in terms of rate of change of energy conversion in components themselves and bounds on their rate of change of interactions are derived in support of achieving the maximum system loadability. These physically intuitive conditions are then used to derive methods to increase loadability using high switching frequency reactive power sources. Numerical simulations confirm the theoretical calculations, and shows dynamic load-side reactive power support increases stable dynamic loadability regions.

#8 Functional Specifications and Testing Requirements of Grid-Forming Type-IV Offshore Wind Power [PDF] [Copy] [Kimi]

Authors: Sulav Ghimire ; Gabriel M. G. Guerreiro ; Kanakesh V. K. ; Emerson D. Guest ; Kim H. Jensen ; Guangya Yang ; Xiongfei Wang

Throughout the past few years, various transmission system operators (TSOs) and research institutes have defined several functional specifications for grid-forming (GFM) converters via grid codes, white papers, and technical documents. These institutes and organisations also proposed testing requirements for general inverter-based resources (IBRs) and specific GFM converters. This paper initially reviews functional specifications and testing requirements from several sources to create an understanding of GFM capabilities in general. Furthermore, it proposes an outlook of the defined GFM capabilities, functional specifications, and testing requirements for offshore wind power plant (OF WPP) applications from an original equipment manufacturer (OEM) perspective. Finally, this paper briefly establishes the relevance of new testing methodologies for equipment-level certification and model validation, focusing on GFM functional specifications.

#9 Stability And Uncertainty Propagation In Power Networks: A Lyapunov-based Approach With Applications To Renewable Resources Allocation [PDF] [Copy] [Kimi]

Authors: Mohamad Kazma ; Ahmad F. Taha

The rapid increase in the integration of intermittent and stochastic renewable energy resources (RER) introduces challenging issues related to power system stability. Interestingly, identifying grid nodes that can best support stochastic loads from RER, has gained recent interest. Methods based on Lyapunov stability are commonly exploited to assess the stability of power networks. These strategies approach quantifying system stability while considering: (i) simplified reduced order power system models that do not model power flow constraints, or (ii) datadriven methods that are prone to measurement noise and hence can inaccurately depict stochastic loads as system instability. In this paper, while considering a nonlinear differential algebraic equation (NL-DAE) model, we introduce a new method for assessing the impact of uncertain renewable power injections on the stability of power system nodes/buses. The identification of stable nodes informs the operator/utility on how renewables injections affect the stability of the grid. The proposed method is based on optimizing metrics equivalent to the Lyapunov spectrum of exponents; its underlying properties result in a computationally efficient and scalable stable node identification algorithm for renewable energy resources allocation. The proposed method is validated on the IEEE 9-bus and 200-bus networks

#10 HC-Mamba: Vision MAMBA with Hybrid Convolutional Techniques for Medical Image Segmentation [PDF] [Copy] [Kimi]

Author: Jiashu Xu

Automatic medical image segmentation technology has the potential to expedite pathological diagnoses, thereby enhancing the efficiency of patient care. However, medical images often have complex textures and structures, and the models often face the problem of reduced image resolution and information loss due to downsampling. To address this issue, we propose HC-Mamba, a new medical image segmentation model based on the modern state space model Mamba. Specifically, we introduce the technique of dilated convolution in the HC-Mamba model to capture a more extensive range of contextual information without increasing the computational cost by extending the perceptual field of the convolution kernel. In addition, the HC-Mamba model employs depthwise separable convolutions, significantly reducing the number of parameters and the computational power of the model. By combining dilated convolution and depthwise separable convolutions, HC-Mamba is able to process large-scale medical image data at a much lower computational cost while maintaining a high level of performance. We conduct comprehensive experiments on segmentation tasks including skin lesion, and conduct extensive experiments on ISIC17 and ISIC18 to demonstrate the potential of the HC-Mamba model in medical image segmentation. The experimental results show that HC-Mamba exhibits competitive performance on all these datasets, thereby proving its effectiveness and usefulness in medical image segmentation.

#11 Bistatic OFDM-based ISAC with Over-the-Air Synchronization: System Concept and Performance Analysis [PDF] [Copy] [Kimi]

Authors: David Brunner ; Lucas Giroto de Oliveira ; Charlotte Muth ; Silvio Mandelli ; Marcus Henninger ; Axel Diewald ; Yueheng Li ; Mohamad Basim Alabd ; Laurent Schmalen ; Thomas Zwick ; Benjamin Nuss

Integrated sensing and communication (ISAC) has been defined as one goal for 6G mobile communication systems. In this context, this article introduces a bistatic ISAC system based on orthogonal frequency-division multiplexing (OFDM). While the bistatic architecture brings advantages such as not demanding full duplex operation with respect to the monostatic one, the need for synchronizing transmitter and receiver is imposed. In this context, this article introuces a bistatic ISAC signal processing framework where an incoming OFDM-based ISAC signal undergoes over-the-air synchronization based on preamble symbols and pilots. Afterwards, bistatic radar processing is performed using either only pilot subcarriers or the full OFDM frame. The latter approach requires estimation of the originally transmitted frame based on communication processing and therefore error-free communication, which can be achieved via appropriate channel coding. The performance and limitations of the introduced system based on both aforementioned approaches are assessed via an analysis of the impact of residual synchronization mismatches and data decoding failures on both communication and radar performances. Finally, the performed analyses are validated by proof-of-concept measurement results.

#12 HAGAN: Hybrid Augmented Generative Adversarial Network for Medical Image Synthesis [PDF] [Copy] [Kimi]

Authors: Zhihan Ju ; Wanting Zhou ; Longteng Kong ; Yu Chen ; Yi Li ; Zhenan Sun ; Caifeng Shan

Medical Image Synthesis (MIS) plays an important role in the intelligent medical field, which greatly saves the economic and time costs of medical diagnosis. However, due to the complexity of medical images and similar characteristics of different tissue cells, existing methods face great challenges in meeting their biological consistency. To this end, we propose the Hybrid Augmented Generative Adversarial Network (HAGAN) to maintain the authenticity of structural texture and tissue cells. HAGAN contains Attention Mixed (AttnMix) Generator, Hierarchical Discriminator and Reverse Skip Connection between Discriminator and Generator. The AttnMix consistency differentiable regularization encourages the perception in structural and textural variations between real and fake images, which improves the pathological integrity of synthetic images and the accuracy of features in local areas. The Hierarchical Discriminator introduces pixel-by-pixel discriminant feedback to generator for enhancing the saliency and discriminance of global and local details simultaneously. The Reverse Skip Connection further improves the accuracy for fine details by fusing real and synthetic distribution features. Our experimental evaluations on three datasets of different scales, i.e., COVID-CT, ACDC and BraTS2018, demonstrate that HAGAN outperforms the existing methods and achieves state-of-the-art performance in both high-resolution and low-resolution.

#13 MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and Results [PDF] [Copy] [Kimi]

Authors: Yaqi Wu ; Zhihao Fan ; Xiaofeng Chu ; Jimmy S. Ren ; Xiaoming Li ; Zongsheng Yue ; Chongyi Li ; Shangcheng Zhou ; Ruicheng Feng ; Yuekun Dai ; Peiqing Yang ; Chen Change Loy ; Senyan Xu ; Zhijing Sun ; Jiaying Zhu ; Yurui Zhu ; Xueyang Fu ; Zheng-Jun Zha ; Jun Cao ; Cheng Li ; Shu Chen ; Liang Ma ; Shiyang Zhou ; Haijin Zeng ; Kai Feng ; Yongyong Chen ; Jingyong Su ; Xianyu Guan ; Hongyuan Yu ; Cheng Wan ; Jiamin Lin ; Binnan Han ; Yajun Zou ; Zhuoyuan Wu ; Yuan Huang ; Yongsheng Yu ; Daoan Zhang ; Jizhe Li ; Xuanwu Yin ; Kunlong Zuo ; Yunfan Lu ; Yijie Xu ; Wenzong Ma ; Weiyu Guo ; Hui Xiong ; Wei Yu ; Bingchun Luo ; Sabari Nathan ; Priya Kansal

The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). Building on the achievements of the previous MIPI Workshops held at ECCV 2022 and CVPR 2023, we introduce our third MIPI challenge including three tracks focusing on novel image sensors and imaging algorithms. In this paper, we summarize and review the Nighttime Flare Removal track on MIPI 2024. In total, 170 participants were successfully registered, and 14 teams submitted results in the final testing phase. The developed solutions in this challenge achieved state-of-the-art performance on Nighttime Flare Removal. More details of this challenge and the link to the dataset can be found at https://mipi-challenge.org/MIPI2024/.

#14 A leadless power transfer and wireless telemetry solutions for an endovascular electrocorticography [PDF] [Copy] [Kimi]

Authors: Zhangyu Xu ; Majid Khazaee ; Nhan Duy Truong ; Deniel Havenga ; Armin Nikpour ; Arman Ahnood ; Omid Kavehei

Endovascular brain-computer interfaces (eBCIs) offer a minimally invasive way to connect the brain to external devices, merging neuroscience, engineering, and medical technology. Achieving wireless data and power transmission is crucial for the clinical viability of these implantable devices. Typically, solutions for endovascular electrocorticography (ECoG) include a sensing stent with multiple electrodes (e.g. in the superior sagittal sinus) in the brain, a subcutaneous chest implant for wireless energy harvesting and data telemetry, and a long (tens of centimetres) cable with a set of wires in between. This long cable presents risks and limitations, especially for younger patients or those with fragile vasculature. This work introduces a wireless and leadless telemetry and power transfer solution for endovascular ECoG. The proposed solution includes an optical telemetry module and a focused ultrasound (FUS) power transfer system. The proposed system can be miniaturised to fit in an endovascular stent. Our solution uses optical telemetry for high-speed data transmission (over 2 Mbit/s, capable of transmitting 41 ECoG channels at a 2 kHz sampling rate and 24-bit resolution) and the proposed power transferring scheme provides up to 10mW power budget into the site of the endovascular implants under the safety limit. Tests on bovine tissues confirmed the system's effectiveness, suggesting that future custom circuit designs could further enhance eBCI applications by removing wires and auxiliary implants, minimising complications.

#15 Teacher-Student Network for Real-World Face Super-Resolution with Progressive Embedding of Edge Information [PDF] [Copy] [Kimi2]

Authors: Zhilei Liu ; Chenggong Zhang

Traditional face super-resolution (FSR) methods trained on synthetic datasets usually have poor generalization ability for real-world face images. Recent work has utilized complex degradation models or training networks to simulate the real degradation process, but this limits the performance of these methods due to the domain differences that still exist between the generated low-resolution images and the real low-resolution images. Moreover, because of the existence of a domain gap, the semantic feature information of the target domain may be affected when synthetic data and real data are utilized to train super-resolution models simultaneously. In this study, a real-world face super-resolution teacher-student model is proposed, which considers the domain gap between real and synthetic data and progressively includes diverse edge information by using the recurrent network's intermediate outputs. Extensive experiments demonstrate that our proposed approach surpasses state-of-the-art methods in obtaining high-quality face images for real-world FSR.

#16 Communication-efficient and Differentially-private Distributed Nash Equilibrium Seeking with Linear Convergence [PDF] [Copy] [Kimi]

Authors: Xiaomeng Chen ; Wei Huo ; Kemi Ding ; Subhrakanti Dey ; Ling Shi

The distributed computation of a Nash equilibrium (NE) for non-cooperative games is gaining increased attention recently. Due to the nature of distributed systems, privacy and communication efficiency are two critical concerns. Traditional approaches often address these critical concerns in isolation. This work introduces a unified framework, named CDP-NES, designed to improve communication efficiency in the privacy-preserving NE seeking algorithm for distributed non-cooperative games over directed graphs. Leveraging both general compression operators and the noise adding mechanism, CDP-NES perturbs local states with Laplacian noise and applies difference compression prior to their exchange among neighbors. We prove that CDP-NES not only achieves linear convergence to a neighborhood of the NE in games with restricted monotone mappings but also guarantees $\epsilon$-differential privacy, addressing privacy and communication efficiency simultaneously. Finally, simulations are provided to illustrate the effectiveness of the proposed method.

#17 HILCodec: High Fidelity and Lightweight Neural Audio Codec [PDF] [Copy] [Kimi]

Authors: Sunghwan Ahn ; Beom Jun Woo ; Min Hyun Han ; Chanyeong Moon ; Nam Soo Kim

The recent advancement of end-to-end neural audio codecs enables compressing audio at very low bitrates while reconstructing the output audio with high fidelity. Nonetheless, such improvements often come at the cost of increased model complexity. In this paper, we identify and address the problems of existing neural audio codecs. We show that the performance of Wave-U-Net does not increase consistently as the network depth increases. We analyze the root cause of such a phenomenon and suggest a variance-constrained design. Also, we reveal various distortions in previous waveform domain discriminators and propose a novel distortion-free discriminator. The resulting model, \textit{HILCodec}, is a real-time streaming audio codec that demonstrates state-of-the-art quality across various bitrates and audio types.

#18 System Identification of the Upgraded LHPOST6 Reaction Mass at the University of California San Diego [PDF] [Copy] [Kimi]

Authors: Andres Rodriguez-Burneo ; Jose I. Restrepo ; Joel P. Conte

Upon completing the upgrade from one to six degrees of freedom of the Outdoor Shake Table at UCSD in 2019, forced vibration tests were carried out to identify the dynamic characteristics of the reaction mass and soil system. This report describes the motivation, execution, and results from such tests, which independently excited the reaction mass in four degrees of freedom: longitudinal, transverse, yaw, and vertical. The report discusses the frequency response curves and deformation patterns from which the natural frequencies, damping ratio, mode shapes, and rigid body motion were determined. The first objective of the study was to investigate if the dynamic properties of the system had dramatically changed after the upgrade by comparing the results to those from forced vibration tests performed 20 years ago, during the construction of the facility. In addition, most recent tests also contributed with results from the vertical degree of freedom, which had never been tested. The second objective was to obtain high-quality response data of the system that will be used to develop a high-fidelity computational model of the reaction mass in future research. A comparison of results showed a slight difference of 0.5Hz in the natural frequency of 2 degrees of freedom. Moreover, maximum displacements in the recent tests were overall larger than the previous ones with few exceptions. The report thoroughly discusses the several sources of discrepancy between the past and most recent results. Finally, test results allowed us to estimate the system's response if the shake table actuators were to be used at their maximum nominal capacity. Small displacement and high damping results were consistent with those of previous tests and further validated the design of the reaction mass.

#19 ResNCT: A Deep Learning Model for the Synthesis of Nephrographic Phase Images in CT Urography [PDF] [Copy] [Kimi]

Authors: Syed Jamal Safdar Gardezi ; Lucas Aronson ; Peter Wawrzyn ; Hongkun Yu ; E. Jason Abel ; Daniel D. Shapiro ; Meghan G. Lubner ; Joshua Warner ; Giuseppe Toia ; Lu Mao ; Pallavi Tiwari ; Andrew L. Wentland

Purpose: To develop and evaluate a transformer-based deep learning model for the synthesis of nephrographic phase images in CT urography (CTU) examinations from the unenhanced and urographic phases. Materials and Methods: This retrospective study was approved by the local Institutional Review Board. A dataset of 119 patients (mean $\pm$ SD age, 65 $\pm$ 12 years; 75/44 males/females) with three-phase CT urography studies was curated for deep learning model development. The three phases for each patient were aligned with an affine registration algorithm. A custom model, coined Residual transformer model for Nephrographic phase CT image synthesis (ResNCT), was developed and implemented with paired inputs of non-contrast and urographic sets of images trained to produce the nephrographic phase images, that were compared with the corresponding ground truth nephrographic phase images. The synthesized images were evaluated with multiple performance metrics, including peak signal to noise ratio (PSNR), structural similarity index (SSIM), normalized cross correlation coefficient (NCC), mean absolute error (MAE), and root mean squared error (RMSE). Results: The ResNCT model successfully generated synthetic nephrographic images from non-contrast and urographic image inputs. With respect to ground truth nephrographic phase images, the images synthesized by the model achieved high PSNR (27.8 $\pm$ 2.7 dB), SSIM (0.88 $\pm$ 0.05), and NCC (0.98 $\pm$ 0.02), and low MAE (0.02 $\pm$ 0.005) and RMSE (0.042 $\pm$ 0.016). Conclusion: The ResNCT model synthesized nephrographic phase CT images with high similarity to ground truth images. The ResNCT model provides a means of eliminating the acquisition of the nephrographic phase with a resultant 33% reduction in radiation dose for CTU examinations.

#20 SingIt! Singer Voice Transformation [PDF] [Copy] [Kimi]

Authors: Amit Eliav ; Aaron Taub ; Renana Opochinsky ; Sharon Gannot

In this paper, we propose a model which can generate a singing voice from normal speech utterance by harnessing zero-shot, many-to-many style transfer learning. Our goal is to give anyone the opportunity to sing any song in a timely manner. We present a system comprising several available blocks, as well as a modified auto-encoder, and show how this highly-complex challenge can be achieved by tailoring rather simple solutions together. We demonstrate the applicability of the proposed system using a group of 25 non-expert listeners. Samples of the data generated from our model are provided.

#21 Exploring Explainable AI Techniques for Improved Interpretability in Lung and Colon Cancer Classification [PDF1] [Copy] [Kimi]

Authors: Mukaffi Bin Moin ; Fatema Tuj Johora Faria ; Swarnajit Saha ; Bushra Kamal Rafa ; Mohammad Shafiul Alam

Lung and colon cancer are serious worldwide health challenges that require early and precise identification to reduce mortality risks. However, diagnosis, which is mostly dependent on histopathologists' competence, presents difficulties and hazards when expertise is insufficient. While diagnostic methods like imaging and blood markers contribute to early detection, histopathology remains the gold standard, although time-consuming and vulnerable to inter-observer mistakes. Limited access to high-end technology further limits patients' ability to receive immediate medical care and diagnosis. Recent advances in deep learning have generated interest in its application to medical imaging analysis, specifically the use of histopathological images to diagnose lung and colon cancer. The goal of this investigation is to use and adapt existing pre-trained CNN-based models, such as Xception, DenseNet201, ResNet101, InceptionV3, DenseNet121, DenseNet169, ResNet152, and InceptionResNetV2, to enhance classification through better augmentation strategies. The results show tremendous progress, with all eight models reaching impressive accuracy ranging from 97% to 99%. Furthermore, attention visualization techniques such as GradCAM, GradCAM++, ScoreCAM, Faster Score-CAM, and LayerCAM, as well as Vanilla Saliency and SmoothGrad, are used to provide insights into the models' classification decisions, thereby improving interpretability and understanding of malignant and benign image classification.

#22 An Advanced Features Extraction Module for Remote Sensing Image Super-Resolution [PDF1] [Copy] [Kimi1]

Authors: Naveed Sultan ; Amir Hajian ; Supavadee Aramvith

In recent years, convolutional neural networks (CNNs) have achieved remarkable advancement in the field of remote sensing image super-resolution due to the complexity and variability of textures and structures in remote sensing images (RSIs), which often repeat in the same images but differ across others. Current deep learning-based super-resolution models focus less on high-frequency features, which leads to suboptimal performance in capturing contours, textures, and spatial information. State-of-the-art CNN-based methods now focus on the feature extraction of RSIs using attention mechanisms. However, these methods are still incapable of effectively identifying and utilizing key content attention signals in RSIs. To solve this problem, we proposed an advanced feature extraction module called Channel and Spatial Attention Feature Extraction (CSA-FE) for effectively extracting the features by using the channel and spatial attention incorporated with the standard vision transformer (ViT). The proposed method trained over the UCMerced dataset on scales 2, 3, and 4. The experimental results show that our proposed method helps the model focus on the specific channels and spatial locations containing high-frequency information so that the model can focus on relevant features and suppress irrelevant ones, which enhances the quality of super-resolved images. Our model achieved superior performance compared to various existing models.

#23 Visually Guided Swarm Motion Coordination via Insect-inspired Small Target Motion Reactions [PDF] [Copy] [Kimi]

Authors: Md Arif Billah ; Imraan A. Faruque

Despite progress developing experimentally-consistent models of insect in-flight sensing and feedback for individual agents, a lack of systematic understanding of the multi-agent and group performance of the resulting bio-inspired sensing and feedback approaches remains a barrier to robotic swarm implementations. This study introduces the small-target motion reactive (STMR) swarming approach by designing a concise engineering model of the small target motion detector (STMD) neurons found in insect lobula complexes. The STMD neuron model identifies the bearing angle at which peak optic flow magnitude occurs, and this angle is used to design an output feedback switched control system. A theoretical stability analysis provides bi-agent stability and state boundedness in group contexts. The approach is simulated and implemented on ground vehicles for validation and behavioral studies. The results indicate despite having the lowest connectivity of contemporary approaches (each agent instantaneously regards only a single neighbor), collective group motion can be achieved. STMR group level metric analysis also highlights continuously varying polarization and decreasing heading variance.

#24 Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models [PDF12] [Copy] [Kimi14]

Authors: Hongjie Wang ; Difan Liu ; Yan Kang ; Yijun Li ; Zhe Lin ; Niraj K. Jha ; Yuchen Liu

Diffusion Models (DMs) have exhibited superior performance in generating high-quality and diverse images. However, this exceptional performance comes at the cost of expensive architectural design, particularly due to the attention module heavily used in leading models. Existing works mainly adopt a retraining process to enhance DM efficiency. This is computationally expensive and not very scalable. To this end, we introduce the Attention-driven Training-free Efficient Diffusion Model (AT-EDM) framework that leverages attention maps to perform run-time pruning of redundant tokens, without the need for any retraining. Specifically, for single-denoising-step pruning, we develop a novel ranking algorithm, Generalized Weighted Page Rank (G-WPR), to identify redundant tokens, and a similarity-based recovery method to restore tokens for the convolution operation. In addition, we propose a Denoising-Steps-Aware Pruning (DSAP) approach to adjust the pruning budget across different denoising timesteps for better generation quality. Extensive evaluations show that AT-EDM performs favorably against prior art in terms of efficiency (e.g., 38.8% FLOPs saving and up to 1.53x speed-up over Stable Diffusion XL) while maintaining nearly the same FID and CLIP scores as the full model. Project webpage: https://atedm.github.io.

#25 An LSTM-Based Chord Generation System Using Chroma Histogram Representations [PDF] [Copy] [Kimi]

Author: Jack Hardwick

This paper proposes a system for chord generation to monophonic symbolic melodies using an LSTM-based model trained on chroma histogram representations of chords. Chroma representations promise more harmonically rich generation than chord label-based approaches, whilst maintaining a small number of dimensions in the dataset. This system is shown to be suitable for limited real-time use. While it does not meet the state-of-the-art for coherent long-term generation, it does show diatonic generation with cadential chord relationships. The need for further study into chroma histograms as an extracted feature in chord generation tasks is highlighted.