Image and Video Processing

2026-02-11 | | Total: 6

#1 Intensity-based Segmentation of Tissue Images Using a U-Net with a Pretrained ResNet-34 Encoder: Application to Mueller Microscopy [PDF] [Copy] [Kimi] [REL]

Authors: Sooyong Chae, Dani Giammattei, Ajmal Ajmal, Junzhu Pei, Amanda Sanchez, Tananant Boonya-ananta, Andres Rodriguez, Tatiana Novikova, Jessica Ramella-Roman

Manual annotation of the images of thin tissue sections remains a time-consuming step in Mueller microscopy and limits its scalability. We present a novel automated approach using only the total intensity M11 element of the Mueller matrix as an input to a U-Net architecture with a pretrained ResNet-34 encoder. The network was trained to distinguish four classes in the images of murine uterine cervix sections: background, internal os, cervical tissue, and vaginal wall. With only 70 cervical tissue sections, the model achieved 89.71% pixel accuracy and 80.96% mean tissue Dice coefficient on the held-out test dataset. Transfer learning from ImageNet enables accurate segmentation despite limited size of training dataset typical of specialized biomedical imaging. This intensity-based framework requires minimal preprocessing and is readily extensible to other imaging modalities and tissue types, with publicly available graphical annotation tools for practical deployment.

Subjects: Image and Video Processing , Applied Physics , Biological Physics , Optics

Publish: 2026-02-10 13:47:15 UTC


#2 Camel: Frame-Level Bandwidth Estimation for Low-Latency Live Streaming under Video Bitrate Undershooting [PDF] [Copy] [Kimi] [REL]

Authors: Liming Liu, Zhidong Jia, Li Jiang, Wei Zhang, Lan Xie, Feng Qian, Leju Yan, Bing Yan, Qiang Ma, Zhou Sha, Wei Yang, Yixuan Ban, Xinggong Zhang

Low-latency live streaming (LLS) has emerged as a popular web application, with many platforms adopting real-time protocols such as WebRTC to minimize end-to-end latency. However, we observe a counter-intuitive phenomenon: even when the actual encoded bitrate does not fully utilize the available bandwidth, stalling events remain frequent. This insufficient bandwidth utilization arises from the intrinsic temporal variations of real-time video encoding, which cause conventional packet-level congestion control algorithms to misestimate available bandwidth. When a high-bitrate frame is suddenly produced, sending at the wrong rate can either trigger packet loss or increase queueing delay, resulting in playback stalls. To address these issues, we present Camel, a novel frame-level congestion control algorithm (CCA) tailored for LLS. Our insight is to use frame-level network feedback to capture the true network capacity, immune to the irregular sending pattern caused by encoding. Camel comprises three key modules: the Bandwidth and Delay Estimator and the Congestion Detector, which jointly determine the average sending rate, and the Bursting Length Controller, which governs the emission pattern to prevent packet loss. We evaluate Camel on both large-scale real-world deployments and controlled simulations. In the real-world platform with 250M users and 2B sessions across 150+ countries, Camel achieves up to a 70.8% increase in 1080P resolution ratio, a 14.4% increase in media bitrate, and up to a 14.1% reduction in stalling ratio. In simulations under undershooting, shallow buffers, and network jitter, Camel outperforms existing congestion control algorithms, with up to 19.8% higher bitrate, 93.0% lower stalling ratio, and 23.9% improvement in bandwidth estimation accuracy.

Subjects: Image and Video Processing , Multimedia

Publish: 2026-02-10 07:56:30 UTC


#3 Smaller is Better: Generative Models Can Power Short Video Preloading [PDF] [Copy] [Kimi] [REL]

Authors: Liming Liu, Jiangkai Wu, Xinggong Zhang

Preloading is widely used in short video platforms to minimize playback stalls by downloading future content in advance. However, existing strategies face a tradeoff. Aggressive preloading reduces stalls but wastes bandwidth, while conservative strategies save data but increase the risk of playback stalls. This paper presents PromptPream, a computation powered preloading paradigm that breaks this tradeoff by using local computation to reduce bandwidth demand. Instead of transmitting pixel level video chunks, PromptPream sends compact semantic prompts that are decoded into high quality frames using generative models such as Stable Diffusion. We propose three core techniques to enable this paradigm: (1) a gradient based prompt inversion method that compresses frames into small sets of compact token embeddings; (2) a computation aware scheduling strategy that jointly optimizes network and compute resource usage; and (3) a scalable searching algorithm that addresses the enlarged scheduling space introduced by scheduler. Evaluations show that PromptStream reduces both stalls and bandwidth waste by over 31%, and improves Quality of Experience (QoE) by 45%, compared to traditional strategies.

Subjects: Image and Video Processing , Multimedia

Publish: 2026-02-10 07:27:38 UTC


#4 SAS-Net: Scene-Appearance Separation Network for Robust Spatiotemporal Registration in Bidirectional Photoacoustic Microscopy [PDF] [Copy] [Kimi] [REL]

Author: Jiahao Qin

High-speed optical-resolution photoacoustic microscopy (OR-PAM) with bidirectional scanning enables rapid functional brain imaging but introduces severe spatiotemporal misalignment from coupled scan-direction-dependent domain shift and geometric distortion. Conventional registration methods rely on brightness constancy, an assumption violated under bidirectional scanning, leading to unreliable alignment. A unified scene-appearance separation framework is proposed to jointly address domain shift and spatial misalignment. The proposed architecture separates domain-invariant scene content from domain-specific appearance characteristics, enabling cross-domain reconstruction with geometric preservation. A scene consistency loss promotes geometric correspondence in the latent space, linking domain shift correction with spatial registration within a single framework. For in vivo mouse brain vasculature imaging, the proposed method achieves normalized cross-correlation (NCC) of 0.961 and structural similarity index (SSIM) of 0.894, substantially outperforming conventional methods. Ablation studies demonstrate that domain alignment loss is critical, with its removal causing 82% NCC reduction (0.961 to 0.175), while scene consistency and cycle consistency losses provide complementary regularization for optimal performance. The method achieves 11.2 ms inference time per frame (86 fps), substantially exceeding typical OR-PAM acquisition rates and enabling real-time processing. These results suggest that the proposed framework enables robust high-speed bidirectional OR-PAM for reliable quantitative and longitudinal functional imaging. The code will be publicly available at https://github.com/D-ST-Sword/SAS-Net

Subjects: Image and Video Processing , Artificial Intelligence , Computer Vision and Pattern Recognition

Publish: 2026-02-06 21:01:27 UTC


#5 Operator-Based Information Theory for Imaging: Entropy, Capacity, and Irreversibility in Physical Measurement Systems [PDF] [Copy] [Kimi] [REL]

Author: Charles Wood

Imaging systems are commonly described using resolution, contrast, and signal-to-noise ratio, but these quantities do not provide a general account of how physical transformations affect the flow of information. This paper introduces an operator-based formulation of information theory for imaging. The approach models the imaging chain as a composition of bounded operators acting on functions, and characterises information redistribution using the spectral properties of these operators. Three measures are developed. Operator entropy quantifies how an operator distributes energy across its singular spectrum. Operator information capacity describes the number of modes that remain recoverable above a noise-dependent threshold. An irreversibility index measures the information lost through suppression or elimination of modes and captures the accumulation of information loss under operator composition. The framework applies to linear, nonlinear, and stochastic operators and does not depend on the specific imaging modality. Analytical examples show how attenuation, blur, and sampling affect entropy, capacity, and irreversibility in different ways. The results provide a general structure for analysing the physical limits of imaging and form the basis for subsequent work on information geometry, spatiotemporal budgets, nonlinear channels, and reconstruction algorithms.

Subjects: Image and Video Processing , Information Theory

Publish: 2025-12-16 18:11:16 UTC


#6 In-Hospital Stroke Prediction from PPG-Derived Hemodynamic Features [PDF] [Copy] [Kimi] [REL]

Authors: Jiaming Liu, Cheng Ding, Daoqiang Zhang

The absence of pre-hospital physiological data in standard clinical datasets fundamentally constrains the early prediction of stroke, as patients typically present only after stroke has occurred, leaving the predictive value of continuous monitoring signals such as photoplethysmography (PPG) unvalidated. In this work, we overcome this limitation by focusing on a rare but clinically critical cohort - patients who suffered stroke during hospitalization while already under continuous monitoring - thereby enabling the first large-scale analysis of pre-stroke PPG waveforms aligned to verified onset times. Using MIMIC-III and MC-MED, we develop an LLM-assisted data mining pipeline to extract precise in-hospital stroke onset timestamps from unstructured clinical notes, followed by physician validation, identifying 176 patients (MIMIC) and 158 patients (MC-MED) with high-quality synchronized pre-onset PPG data, respectively. We then extract hemodynamic features from PPG and employ a ResNet-1D model to predict impending stroke across multiple early-warning horizons. The model achieves F1-scores of 0.7956, 0.8759, and 0.9406 at 4, 5, and 6 hours prior to onset on MIMIC-III, and, without re-tuning, reaches 0.9256, 0.9595, and 0.9888 on MC-MED for the same horizons. These results provide the first empirical evidence from real-world clinical data that PPG contains predictive signatures of stroke several hours before onset, demonstrating that passively acquired physiological signals can support reliable early warning, supporting a shift from post-event stroke recognition to proactive, physiology-based surveillance that may materially improve patient outcomes in routine clinical care.

Subjects: Machine Learning , Image and Video Processing

Publish: 2026-02-10 01:50:26 UTC