MICCAI.2025

| Total: 1027

#1 Semi-Supervised Deformation-Free Image-to-Image Translation for Realistic CT Synthesis from CBCT [PDF] [Copy] [Kimi] [REL]

Authors: Han Ji Yong, Yang Su, Kim Sujeong, Kim Sunjung, Lim Sang-Heon, Yun Heejin, Kim Dahee, Yi Won-Jin, Han Ji Yong, Yang Su, Kim Sujeong, Kim Sunjung, Lim Sang-Heon, Yun Heejin, Kim Dahee, Yi Won-Jin

Cone-Beam Computed Tomography (CBCT) is widely used for diagnostics and treatment planning in oral and maxillofacial field due to its low radiation dose and high spatial resolution. Still, its clinical utility is limited by low contrast and incorrect Hounsfield Unit (HU) values. In contrast, multi-detector CT (CT) provides high contrast and reliable HU measurements, with a higher radiation dose. In this work, we present a novel two-stage framework for unpaired CBCT-to-CT synthesis that ensures the exact preservation of anatomical structure, maintains high resolution, and achieves accurate HU value. In thefirst stage, we generate pseudo-paired CT images. In the second stage, weutilize a UNet++ generator enhanced with Interpolation and Convolution Upsampling (ICUP), Edge-Conditioned Skip Connections (ECSC), and a dual discriminator strategy for a semi-supervised approach. Consequently, we generate realistic CT images using pseudo-paired CT images. Extensive quantitative and qualitative evaluations demonstrate that our method outperforms existing unpaired translation techniques, producing realistic CT images that closely match CT images in both HU accuracy and exactly preserve anatomical structure of the CBCT. The code is available at https://github.com/HANJIYONG/Semi-Supervised-Deformation-Free-I2I.

Subject: MICCAI.2025


#2 AdFair-CLIP: Adversarial Fair Contrastive Language-Image Pre-training for Chest X-rays [PDF] [Copy] [Kimi] [REL]

Authors: Yi Chenlang, Xiong Zizhan, Qi Qi, Wei Xiyuan, Bathla Girish, Lin Ching-Long, Mortazavi Bobak J., Yang Tianbao, Yi Chenlang, Xiong Zizhan, Qi Qi, Wei Xiyuan, Bathla Girish, Lin Ching-Long, Mortazavi Bobak J., Yang Tianbao

Contrastive Language-Image Pre-training (CLIP) models have demonstrated superior performance across various visual tasks including medical image classification. However, fairness concerns, including demographic biases, have received limited attention for CLIP models. This oversight leads to critical issues, particularly those related to race and gender, resulting in disparities in diagnostic outcomes and reduced reliability for underrepresented groups. To address these challenges, we introduce AdFair-CLIP, a novel framework employing adversarial feature intervention to suppress sensitive attributes, thereby mitigating spurious correlations and improving prediction fairness. We conduct comprehensive experiments on chest X-ray (CXR) datasets, and show that AdFair-CLIP significantly enhances both fairness and diagnostic accuracy, while maintaining robust generalization in zero-shot and few-shot scenarios. These results establish new benchmarks for fairness-aware learning in CLIP-based medical diagnostic models, particularly for CXR analysis.

Subject: MICCAI.2025


#3 Hierarchical Corpus-View-Category Refinement for Carotid Plaque Risk Grading in Ultrasound [PDF] [Copy] [Kimi] [REL]

Authors: Zhu Zhiyuan, Wang Jian, Jiang Yong, Han Tong, Huang Yuhao, Zhang Ang, Yang Kaiwen, Luo Mingyuan, Liu Zhe, Duan Yaofei, Ni Dong, Tang Tianhong, Yang Xin, Zhu Zhiyuan, Wang Jian, Jiang Yong, Han Tong, Huang Yuhao, Zhang Ang, Yang Kaiwen, Luo Mingyuan, Liu Zhe, Duan Yaofei, Ni Dong, Tang Tianhong, Yang Xin

Accurate carotid plaque grading (CPG) is vital to assess the risk of cardiovascular and cerebrovascular diseases. Due to the small size and high intra-class variability of plaque, CPG is commonly evaluated using a combination of transverse and longitudinal ultrasound views in clinical practice. However, most existing deep learning-based multi-view classification methods focus on feature fusion across different views, neglecting the importance of representation learning and the difference in class features. To address these issues, we propose a novel Corpus-ViewCategory Refinement Framework (CVC-RF) that processes information from Corpus-, View-, and Category-levels, enhancing model performance. Our contribution is four-fold. First, to the best of our knowledge, we are the foremost deep-learning-based method for CPG according to the latest Carotid Plaque-RADS guidelines. Second, we propose a novel centermemory contrastive loss, which enhances the network’s global modeling capability by comparing with representative cluster centers and diverse negative samples at Corpus-level. Third, we design a cascaded down-sampling attention module to fuse multi-scale information and achieve implicit feature interaction at View-level. Finally, a parameterfree mixture-of-experts weighting strategy is introduced to leverage class clustering knowledge to weight different experts, enabling feature decoupling at Category-level. Experimental results indicate that CVC-RF effectively models global features via multi-level refinement, achieving state-of-the-art performance in the challenging CPG task.

Subject: MICCAI.2025


#4 Anatomy-Conserving Unpaired CBCT-to-CT Translation via Schrödinger Bridge [PDF] [Copy] [Kimi1] [REL]

Authors: Shi Ke, Ouyang Song, Liu Gang, Luo Yong, Su Kehua, Liang Zhiwen, Du Bo, Shi Ke, Ouyang Song, Liu Gang, Luo Yong, Su Kehua, Liang Zhiwen, Du Bo

Unpaired Cone-beam CT (CBCT)-to-CT translation is pivotal for radiotherapy planning, aiming to synergize CBCT’s clinical practicality with CT’s dosimetric precision. Existing methods, limited by scarce paired data and registration errors, struggle to preserve anatomical fidelity—a critical requirement to avoid incorrect diagnosis and inadequate treatments. Current CycleGAN-derived approaches risk structural distortions, while diffusion models oversmooth high-frequency details vital for dose calculation in the reverse diffusion. In this paper, we propose the Anatomy-Conserving Schrödinger Bridge (ACSB), a novel unpaired medical image translation framework leveraging entropy-regularized optimal transport to disentangle modality-specific artifacts from anatomy. We incorporate a carefully designed generator, Anatomy-Conserving vision transformer (AC-ViT) to integrate multi-scale anatomical priors via attention-guided feature fusion. We further adopt frequency-aware optimization targeting radiotherapy-critical spectral components. Extensive experiments on the dataset demonstrate the superiority of the proposed ACSB, showcasing excellent generalization over different anatomically distinct regions.Code: https://github.com/Lalala-iks/ACSB

Subject: MICCAI.2025


#5 RedDino: A foundation model for red blood cell analysis [PDF] [Copy] [Kimi] [REL]

Authors: Zedda Luca, Loddo Andrea, Di Ruberto Cecilia, Marr Carsten, Zedda Luca, Loddo Andrea, Di Ruberto Cecilia, Marr Carsten

Red blood cells (RBCs) are fundamental to human health, and precise morphological analysis is critical for diagnosing hematological disorders. Despite the potential of foundation models for medical diagnostics, comprehensive AI solutions for RBC analysis remain limited. We introduce RedDino, a self-supervised foundation model specifically designed for RBC image analysis. Leveraging a RBC-tailored version of the DINOv2 self-supervised learning framework, RedDino is trained on an extensive, meticulously curated dataset comprising 1.25 million RBC images from diverse acquisition modalities and sources. Comprehensive evaluations demonstrate that RedDino significantly outperforms existing state-of-the-art models on the RBC shape classification task. Through systematic assessments, including linear probing and nearest neighbor classification, we validate the model’s robust feature representation and strong generalization capabilities. Our key contributions are (1) a dedicated foundation model tailored for RBC analysis, (2) detailed ablation studies exploring DINOv2 configurations for RBC modeling, and (3) comprehensive generalization performance evaluation. We address key challenges in computational hematology by developing RedDino, a robust and generalizable model that captures nuanced morphological characteristics and represents a substantial advancement in developing reliable diagnostic tools. The source code and pretrained models for RedDino are available at https://anonymous.4open.science/r/RedDino-1F17 .

Subject: MICCAI.2025


#6 Metastatic Lymph Node Station Classification in Esophageal Cancer via Prior-guided Supervision and Station-Aware Mixture-of-Experts [PDF] [Copy] [Kimi1] [REL]

Authors: Li Haoshen, Wang Yirui, Yu Qinji, Zhu Jie, Yan Ke, Guo Dazhou, Lu Le, Dong Bin, Zhang Li, Ye Xianghua, Wang Qifeng, Jin Dakai, Li Haoshen, Wang Yirui, Yu Qinji, Zhu Jie, Yan Ke, Guo Dazhou, Lu Le, Dong Bin, Zhang Li, Ye Xianghua, Wang Qifeng, Jin Dakai

Assessing lymph node (LN) metastasis in CT is critical for esophageal cancer treatment planning. While clinical criteria are commonly used, the diagnostic accuracy is low with sensitivities ranging from 39.7% to 67.2% in previous studies. Deep learning would have the potential to improve it by learning from large-scale accurately labeled data. However, from the surgical procedure in LN dissection, pathological report only indicates the number of dissected LNs in each lymph node station (LN-station) with the number of metastatic ones found in the respective LN-station. So, it is difficult to establish one-to-one pairing between LN instances observed in CT and their metastasis status confirmed in the pathological report. In contrast, gold reference labels on LN-station metastasis can be readily retrieved from pathology reports at scale. Hence, instead of distinguishing LN instance metastasis, we directly classify LN-station metastasis using pathology-confirmed station labels. We first segment mediastinal LN-stations automatically to serve as input for classification. Then, to improve classification performance, we automatically segment all visible LN instances in CT and design a new LN prior-guided attention loss to explicitly regularize the network to focus on regions of suspicious LN. Furthermore, considering the varying appearances and contexts of different LN-station, we propose a station-aware mixture-of-experts module, where the expert is trained to specialize in a group of LN-stations by learning to route each LN-station group tokens to the corresponding expert. We conduct five-fold cross-validation on 1,153 esophageal cancer patients with CT and pathology reports (the largest study to date), and our method significantly outperforms state-of-the-art approaches by 2.26% in AUROC.

Subject: MICCAI.2025


#7 Multi-expert collaboration and knowledge enhancement network for multimodal emotion recognition [PDF] [Copy] [Kimi] [REL]

Authors: Wang Kun, Zhao Junyong, Zhang Liying, Zhu Qi, Zhang Daoqiang, Wang Kun, Zhao Junyong, Zhang Liying, Zhu Qi, Zhang Daoqiang

Emotion recognition leveraging multimodal data plays a pivotal role in human-computer interaction and clinical applications, such as depression, mania, Parkinson’s Disease, etc. However, existing emotion recognition methods are susceptible to heterogeneous feature representations across modalities. Additionally, complex emotions involve multiple dimensions, which presents challenges for achieving highly trustworthy decisions. To address these challenges, in this paper, we propose a novel multi-expert collaboration and knowledge enhancement network for multimodal emotion recognition. First, we devise a cross-modal fusion module to dynamically aggregate complementary features from EEG and facial expressions through attention-guided. Second, our approach incorporates a feature prototype alignment module to enhance the consistency of multimodal feature representations. Then, we design a prior knowledge enhancement module that injects original dynamic brain networks into feature learning to enhance the feature representation. Finally, we introduce a multi-expert collaborative decision module designed to refine predictions, enhancing the robustness of classification results. Experimental results on the DEAP dataset demonstrate that our proposed method surpasses several state-of-the-art emotion recognition techniques.

Subject: MICCAI.2025


#8 Indepth Integration of Multi-granularity Features from Dual-modal for Disease Classification [PDF] [Copy] [Kimi] [REL]

Authors: Wu Yeli, Zhang Xiaocai, Wu Weiwen, Jiang Haiteng, An Chao, Zhang Jianjia, Wu Yeli, Zhang Xiaocai, Wu Weiwen, Jiang Haiteng, An Chao, Zhang Jianjia

Multi-granularity features can be extracted from multi-modal medical images and how to effectively analyze these features jointly is a challenging and critical issue for computer-aided diagnosis (CAD). However, most existing multi-modal classification methods have not fully explored the interactions among the intra- and inter-granularity features across multi-modals. To address this limitation, we propose a novel Indepth Integration of Multi-Granulairty Features Network (IIMGF-Net) for a typical multi-modal task, i.e., a dual-modal based CAD. Specifically, the proposed IIMGF-Net consistes of two types of key modules, i.e., Cross-Modal Intra-Granularity Fusion (CMIGF) and Multi-Granularity Collaboration (MGC). The CMIGF module enhances the attentive interactions between the same granularity features from dual-modals and derive an integrated representation at each granularity. Based on these representations, the MGC module captures inter-granularity interactions among the resulting representations of CMIGF through a coarse-to-fine and fine-to-coarse collaborative learning mechanism. Extensive experiments on two dual-modal datasets validate the effectiveness of the proposed method, demonstrating its superiority in dual-modal CAD tasks by integrating multi-granularity information.

Subject: MICCAI.2025


#9 PathVG: A New Benchmark and Dataset for Pathology Visual Grounding [PDF] [Copy] [Kimi] [REL]

Authors: Zhong Chunlin, Hao Shuang, Wu Junhua, Chang Xiaona, Jiang Jiwei, Nie Xiu, Tang He, Bai Xiang, Zhong Chunlin, Hao Shuang, Wu Junhua, Chang Xiaona, Jiang Jiwei, Nie Xiu, Tang He, Bai Xiang

With the rapid development of computational pathology, many AI-assisted diagnostic tasks have emerged. Cellular nuclei segmentation can segment various types of cells for downstream analysis, but it relies on predefined categories and lacks flexibility. Moreover, pathology visual question answering can perform image-level understanding but lacks region-level detection capability. To address this, we propose a new benchmark called Pathology Visual Grounding (PathVG), which aims to detect regions based on expressions with different attributes. To evaluate PathVG, we create a new dataset named RefPath which contains 27,610 images with 33,500 language-grounded boxes. Compared to visual grounding in other domains, PathVG presents pathological images at multi-scale and contains expressions with pathological knowledge. In the experimental study, we found that the biggest challenge was the implicit information underlying the pathological expressions. Based on this, we proposed Pathology Knowledge-enhanced Network (PKNet) as the baseline model for PathVG. PKNet leverages the knowledge-enhancement capabilities of Large Language Models (LLMs) to convert pathological terms with implicit information into explicit visual features, and fuses knowledge features with expression features through the designed Knowledge Fusion Module (KFM). The proposed method achieves state-of-the-art performance on the PathVG benchmark. We will release our dataset and methods upon the acceptance of the paper. The source code and dataset have been available at https://github.com/ssecv/PathVG.

Subject: MICCAI.2025


#10 Asymmetric Matching in Abdominal Lymph Nodes of Follow-up CT Scans [PDF] [Copy] [Kimi] [REL]

Authors: Mao Yiji, Zhang Yi, Zou Xinyu, Zheng Yuling, Huang Hao, Zhang Haixian, Mao Yiji, Zhang Yi, Zou Xinyu, Zheng Yuling, Huang Hao, Zhang Haixian

Accurate tracking of abdominal lymph nodes (LN) across follow-up computed tomography (CT) scans is crucial for colorectal cancer staging and treatment response evaluation. However, establishing reliable LN correspondences remains underexplored due to challenges including scale variations, low resolution, difficulty distinguishing nodes from adjacent structures, inability to handle tissue deformation, and dynamic visibility. To address these challenges, we propose an asymmetric matching framework that strikes a balance between enhancing LN specificity and contextual correlations. For specificity, we achieve cross-dimensional feature consistency and generate discriminative LN features via self-supervised learning on orthogonal 2D projections of 3D node volumes. For correlation, we develop a graph model capturing lymphatic topology within scans, reinforced by temporal contrastive learning that encourages consistency between matched node pairs across CT. To balance specificity and correlation, we propose a multi-module architecture that integrates volumetric LN features with projection embeddings through attention-based fusion, enabling confidence-calibrated similarity assessment across temporal scans. Experimental results demonstrate that our solution provides reliable lymph node correspondence for clinical follow-up and disease monitoring. Code is available at https://github.com/maoyij/Asymmetric-Matching.

Subject: MICCAI.2025


#11 PCR-MIL: Phenotype Clustering Reinforced Multiple Instance Learning for Whole Slide Image Classification [PDF] [Copy] [Kimi] [REL]

Authors: Lou Jingjiao, Pan Qingtao, Yang Qing, Ji Bing, Lou Jingjiao, Pan Qingtao, Yang Qing, Ji Bing

Multiple instance learning (MIL) has proven effective in classifying whole slide images (WSIs), owing to its weakly supervised learning framework. However, existing MIL methods still face challenges, particularly over-fitting due to small sample sizes or limited WSIs (bags). Pseudo-bags enhance MIL’s classification performance by increasing the number of training bags. However, these methods struggle with noisy labels, as positive patches often occupy small portions of tissue, and pseudo-bags are typically generated by random splitting. Additionally, they face difficulties with non-discriminative instance embeddings due to the lack of domain-specific feature extractors. To address these limitations, we propose Phenotype Clustering Reinforced Multiple Instance Learning (PCR-MIL), a novel MIL framework that integrates clustering-based pseudo-bags to improve MIL’s noise robustness and the discriminative power of instance embeddings. PCR-MIL introduces two key innovations: (i) Phenotype Clustering-based Feature Selection (PCFS) selects relevant instance embeddings for prediction. It clusters instances into phenotype-specific groups, assigns positive instances to each pseudo-bag, and then uses Grad-CAM to select the most relevant positive embeddings. This approach mitigates noisy label challenges and enhances MIL’s robustness to noise; (ii) Reinforced Feature Extractor (RFE) uses reinforcement learning to train an extractor based on selected clean pseudo-bags instead of noisy samples. This approach improves the discriminative power of extracted instance embeddings and enhances the feature representation capabilities of MIL. Experimental results on the publicly available BRACS and CRC-DX datasets demonstrate that PCR-MIL outperforms state-of-the-art methods. The code is available at: https://github.com/JingjiaoLou/PCR-MIL.

Subject: MICCAI.2025


#12 T2WI-BCMIC: Non-Fat Saturated T2-Weighted Imaging Dataset for Bladder Cancer Muscle Invasion Classification [PDF] [Copy] [Kimi] [REL]

Authors: Huang Han, Chen Weiyi, Wu Qiuxia, Wang Huanjun, Cai Qian, Guo Yan, Huang Han, Chen Weiyi, Wu Qiuxia, Wang Huanjun, Cai Qian, Guo Yan

Accurate classification of muscle invasion in bladder cancer using computer-aided diagnosis (CAD) is crucial for timely intervention and improved prognosis. Despite advances in deep learning for medical image analysis, muscle invasion classification remains limited by the scarcity of publicly available annotated datasets. To address this, we introduce T2WI-BCMIC, the first expert-annotated dataset for bladder cancer muscle invasion classification. T2WI-BCMIC contains Non-fat saturated T2-weighted magnetic resonance imaging (MRI) images with five-class annotations, covering various invasion depths. We establish a benchmark using several popular deep learning architectures, providing a solid foundation for future comparisons. However, achieving further performance improvements remains challenging due to the small dataset size. Therefore, we propose a novel search-based data augmentation algorithm that increases data diversity by maximizing the divergence from the class-specific manifold, while preserving the class distribution to maintain class identity. Experimental results on T2WI-BCMIC show that our algorithm outperforms existing methods, achieving significant performance improvements. The T2WI-BCMIC dataset and benchmark are available at: https://github.com/T2-MI/T2WI-BCMIC for further research.

Subject: MICCAI.2025


#13 TEGDA: Test-time Evaluation-Guided Dynamic Adaptation for Medical Image Segmentation [PDF] [Copy] [Kimi] [REL]

Authors: Zhou Yubo, Wu Jianghao, Liao Wenjun, Zhang Shichuan, Zhang Shaoting, Wang Guotai, Zhou Yubo, Wu Jianghao, Liao Wenjun, Zhang Shichuan, Zhang Shaoting, Wang Guotai

Distribution shifts of medical images seriously limit the performance of segmentation models when applied in real-world scenarios. Test-Time Adaptation (TTA) has emerged as a promising solution for ensuring robustness on images from different institutions by tuning the parameters at test time without additional labeled training data. However, existing TTA methods are limited by unreliable supervision due to a lack of effective methods to monitor the adaptation performance without ground-truth, which makes it hard to adaptively adjust model parameters in the stream of testing samples. To address these limitations, we propose a novel Test-Time Evaluation-Guided Dynamic Adaptation (TEGDA) framework for TTA of segmentation models. In the absence of ground-truth, we propose a novel prediction quality evaluation metric based on Agreement with Dropout Inferences calibrated by Confidence (ADIC). Then it is used to guide adaptive feature fusion with those in a feature bank with high ADIC values to obtain refined predictions for supervision, which is combined with an ADIC-adaptive teacher model and loss weighting for robust adaptation. Experimental results on multidomain cardiac structure and brain tumor segmentation demonstrate that our ADIC can accurately estimate segmentation quality on the fly, and our TEGDA obtained the highest average Dice and lowest average HD95, significantly outperforming several state-of-the-art TTA methods. The code is available at https://github.com/HiLab-git/TEGDA.

Subject: MICCAI.2025


#14 Phenotype-Guided Generative Model for High-Fidelity Cardiac MRI Synthesis: Advancing Pretraining and Clinical Applications [PDF] [Copy] [Kimi] [REL]

Authors: Li Ziyu, Hu Yujian, Ding Zhengyao, Mao Yiheng, Li Haitao, Yi Fan, Zhang Hongkun, Huang Zhengxing, Li Ziyu, Hu Yujian, Ding Zhengyao, Mao Yiheng, Li Haitao, Yi Fan, Zhang Hongkun, Huang Zhengxing

Cardiac Magnetic Resonance (CMR) imaging is a vital non-invasive tool for diagnosing heart diseases and evaluating cardiac health. However, the limited availability of large-scale, high-quality CMR datasets poses a major challenge to the effective application of artificial intelligence (AI) in this domain. Even the amount of unlabeled data and the health status it covers are difficult to meet the needs of model pretraining, which hinders the performance of AI models on downstream tasks. In this study, we present Cardiac Phenotype-Guided CMR Generation (CPGG), a novel approach for generating diverse CMR data that covers a wide spectrum of cardiac health status. The CPGG framework consists of two stages: in the first stage, a generative model is trained using cardiac phenotypes derived from CMR data; in the second stage, a masked autoregressive diffusion model, conditioned on these phenotypes, generates high-fidelity CMR cine sequences that capture both structural and functional features of the heart in a fine-grained manner. We synthesized a massive amount of CMR to expand the pretraining data. Experimental results show that CPGG generates high-quality synthetic CMR data, significantly improving performance on various downstream tasks, including diagnosis and cardiac phenotypes prediction. These gains are demonstrated across both public and private datasets, highlighting the effectiveness of our approach. Code is available at https://github.com/Markaeov/CPGG.

Subject: MICCAI.2025


#15 Revisiting Masked Image Modeling with Standardized Color Space for Domain Generalized Fundus Photography Classification [PDF] [Copy] [Kimi] [REL]

Authors: Jang Eojin, Kang Myeongkyun, Kim Soopil, Sagong Min, Park Sang Hyun, Jang Eojin, Kang Myeongkyun, Kim Soopil, Sagong Min, Park Sang Hyun

Diabetic retinopathy (DR) is a serious complication of diabetes, requiring rapid and accurate assessment through computer-aided grading of fundus photography. To enhance the practical applicability of DR grading, domain generalization (DG) and foundation models have been proposed to improve accuracy on data from unseen domains. Despite recent advancements, foundation models trained in a self-supervised manner still exhibit limited DG capabilities, as self-supervised learning does not account for domain variations. In this paper, we revisit masked image modeling (MIM) in foundation models to advance DR grading for domain generalization. We introduce a MIM-based approach that transforms images to achieve standardized color representation across domains. By transforming images from various domains into this color space, the model can learn consistent representation even for unseen images, promoting domain-invariant feature learning. Additionally, we employ joint representation learning of both the original and transformed images, using cross-attention to integrate their respective strengths for DR classification. We showed a performance improvement of up to nearly 4% across the three datasets, positioning our method as a promising solution for domain-generalized medical image classification.

Subject: MICCAI.2025


#16 Delving into Out-of-Distribution Detection with Medical Vision-Language Models [PDF] [Copy] [Kimi] [REL]

Authors: Ju Lie, Zhou Sijin, Zhou Yukun, Lu Huimin, Zhu Zhuoting, Keane Pearse A., Ge Zongyuan, Ju Lie, Zhou Sijin, Zhou Yukun, Lu Huimin, Zhu Zhuoting, Keane Pearse A., Ge Zongyuan

Recent advances in medical vision-language models (VLMs) demonstrate impressive performance in image classification tasks, driven by their strong zero-shot generalization capabilities. However, given the high variability and complexity inherent in medical imaging data, the ability of these models to detect out-of-distribution (OOD) data in this domain remains underexplored. In this work, we conduct the first systematic investigation into the OOD detection potential of medical VLMs. We evaluate state-of-the-art VLM-based OOD detection methods across a diverse set of medical VLMs, including both general and domain-specific purposes. To accurately reflect real-world challenges, we introduce a cross-modality evaluation pipeline for benchmarking full-spectrum OOD detection, rigorously assessing model robustness against both semantic shifts and covariate shifts. Furthermore, we propose a novel hierarchical prompt-based method that significantly enhances OOD detection performance. Extensive experiments are conducted to validate the effectiveness of our approach. The codes are available at \url{https://github.com/PyJulie/Medical-VLMs-OOD-Detection}.

Subject: MICCAI.2025


#17 MARSeg: Enhancing Medical Image Segmentation with MAR and Adaptive Feature Fusion [PDF] [Copy] [Kimi] [REL]

Authors: Hwang Jeonghyun, Rhee Seungyeon, Kim Minjeong, Viriyasaranon Thanaporn, Choi Jang-Hwan, Hwang Jeonghyun, Rhee Seungyeon, Kim Minjeong, Viriyasaranon Thanaporn, Choi Jang-Hwan

Recent advances in Masked Autoregressive (MAR) models highlight their ability to preserve fine-grained details through continuous vector representations, making them highly suitable for tasks requiring precise pixel-level delineation. Motivated by these strengths, we introduce MARSeg, a novel segmentation framework tailored for medical images. Our method first pre-trains a MAR model on large-scale CT scans, capturing both global structures and local details without relying on vector quantization. We then propose a Generative Parallel Adaptive Feature Fusion (GPAF) module that effectively unifies spatial and channel-wise attention, thereby combining latent features from the pre-trained MAE encoder and decoder. This approach preserves essential boundary information while enhancing the robustness of organ and tumor segmentation. Experimental results on multiple CT datasets from the Medical Segmentation Decathlon (MSD) demonstrate that MARSeg outperforms existing state-of-the-art methods in terms of Dice Similarity Coefficient (DSC) and Intersection over Union (IoU), confirming its efficacy in handling complex anatomical and pathological variations. The code is available at https://github.com/Ewha-AI/MARSeg.

Subject: MICCAI.2025


#18 Predicting Longitudinal Brain Development via Implicit Neural Representations [PDF] [Copy] [Kimi] [REL]

Authors: Dannecker Maik, Rueckert Daniel, Dannecker Maik, Rueckert Daniel

Predicting individualized perinatal brain development is crucial for understanding personalized neurodevelopmental trajectories, however, remains challenging due to limited longitudinal data. While population based atlases model generic trends, they fail to capture subject-specific growth patterns. In this work, we propose a novel approach leveraging Implicit Neural Representations (INRs) to predict individualized brain growth over multiple weeks. Our method learns from a limited dataset of less than 100 paired fetal and neonatal subjects, sampled from the developing Human Connectome Project. The trained model demonstrates accurate personalized future and past trajectory predictions from a single calibration scan. By incorporating conditional external factors such as birth age or birth weight, our model further allows the simulation of neurodevelopment under varying conditions. We evaluate our method against established perinatal brain atlases, demonstrating higher prediction accuracy and fidelity up to 20 weeks. Finally, we explore the method’s ability to reveal subject-specific cortical folding patterns under varying factors like birth weight, further advocating its potential for personalized neurodevelopmental analysis.

Subject: MICCAI.2025


#19 DISCLOSE the Neurodegeneration Dynamics: Individualized ODE Discovery for Alzheimer’s Disease Precision Medicine [PDF] [Copy] [Kimi] [REL]

Authors: Jung Wooseok, Park Joonhyuk, Kim Won Hwa, Jung Wooseok, Park Joonhyuk, Kim Won Hwa

Monitoring progression from Mild Cognitive Impairment due to Alzheimer’s Disease (MCI-AD) is critical for patient care. However, current approaches to model AD progression overlook complex interrelated neurodegeneration in different regions of the brain and how AD pathology and genotypes manipulate it. This study defines neurodegeneration dynamics and proposes the Dynamics Individualized by Static Covariates without LOngitudinal ScrEening (DISCLOSE) framework. This method predicts individualized neurodegeneration dynamics from only baseline amyloid-beta deposition and the number of APOE4 alleles with an Ordinal Differential Equation (ODE). We evaluated DISCLOSE using longitudinal MRI samples in the Alzheimer’s Disease Neuroimaging Initiative (ADNI) cohort. The results demonstrate that DISCLOSE outperforms existing methods in long-term trajectory prediction, particularly for predictions beyond three years. This work presents a significant step toward modeling individualized disease trajectories. Also, DISCLOSE could quantitatively interpret the effects of AD-related genotypes and pathophysiology on regional atrophy progression.

Subject: MICCAI.2025


#20 Hybrid State-Space Models and Denoising Training for Unpaired Medical Image Synthesis [PDF] [Copy] [Kimi] [REL]

Authors: Zhang Junming, Jiang Shancheng, Zhang Junming, Jiang Shancheng

Unsupervised medical image synthesis faces significant challenges due to the absence of paired data, often resulting in global anatomical distortions and local detail loss. Existing approaches primarily rely on convolutional neural networks (CNNs) for local feature extraction; however, their limited receptive fields hinder effective global anatomical modeling. Recently, Vision Mamba (ViM) has demonstrated efficient global modeling capabilities via state-space models, yet its potential in this task remains unexplored. To address this gap, we propose a hybrid architecture, CRAViM (Convolutional Residual Attention Vision Mamba), which integrates the precise local anatomical feature extraction of CNNs with the long-range dependency modeling of state-space models, thereby enhancing the structural fidelity and detail preservation of synthesized images. Furthermore, we introduce a cycle denoise consistency-based training framework that incorporates transport loss and random denoise loss to jointly optimize global structural constraints and local detail restoration. Experimental results on two public medical imaging datasets demonstrate that CRAViM achieves notable improvements in key metrics such as SSIM and NMI over existing methods, effectively maintaining global anatomical consistency while enhancing local details, thus validating the effectiveness of our approach. The code for this paper can be found at https://github.com/jmzhang-cv/CRAViM.

Subject: MICCAI.2025


#21 Incorporating Rather Than Eliminating: Achieving Fairness for Skin Disease Diagnosis Through Group-Specific Experts [PDF] [Copy] [Kimi] [REL]

Authors: Xu Gelei, Duan Yuying, Liu Zheyuan, Li Xueyang, Jiang Meng, Lemmon Michael, Jin Wei, Shi Yiyu, Xu Gelei, Duan Yuying, Liu Zheyuan, Li Xueyang, Jiang Meng, Lemmon Michael, Jin Wei, Shi Yiyu

AI-based systems have achieved high accuracy in skin disease diagnostics but often exhibit biases across demographic groups, leading to inequitable healthcare outcomes and diminished patient trust. Most existing bias mitigation methods attempt to eliminate the correlation between sensitive attributes and diagnostic prediction, but this often degrades performance due to the lost of clinically relevant diagnostic cues. In this work, we propose an alternative approach that incorporates sensitive attributes to achieve fairness. We introduce FairMoE, a framework that employs layer-wise mixture-of-experts modules to serve as group-specific learners. Unlike traditional methods that rigidly assign data based on group labels, FairMoE dynamically routes data to the most suitable expert, making it particularly effective for handling cases near group boundaries. Experimental results show that, unlike previous fairness approaches that reduce performance, FairMoE achieves substantial accuracy improvements while preserving comparable fairness metrics.

Subject: MICCAI.2025


#22 TemSAM: Temporal-aware Segment Anything Model for Cerebrovascular Segmentation in Digital Subtraction Angiography Sequences [PDF] [Copy] [Kimi] [REL]

Authors: Zhang Liang, Jiang Xixi, Ding Xiaohuan, Huang Zihang, Zhao Tianyu, Yang Xin, Zhang Liang, Jiang Xixi, Ding Xiaohuan, Huang Zihang, Zhao Tianyu, Yang Xin

Digital Subtraction Angiography (DSA) is the gold standard in vascular disease imaging but it poses challenges due to its dynamic frame changes. Early frames often lack detail in small vessels, while late frames may obscure vessels visible in earlier phases, necessitating time-consuming expert interpretation. Existing methods primarily focus on single-frame analysis or basic temporal integration, treating all frames uniformly and failing to exploit complementary inter-frame information. Furthermore, existing pre-trained models like the Segment Anything Model (SAM), while effective for general medical video segmentation, fall short in handling the unique dynamics of DSA sequences driven by contrast agents. To overcome these limitations, we introduce TemSAM, a novel temporal-aware segment anything model for cerebrovascular segmentation in DSA sequences. TemSAM integrates two main components: (1) a multi-level Minimum Intensity Projection (MIP) global prompt that enhances temporal representation through a MIP-guided Global Attention (MGA) module, utilizing global information provided by MIP, and (2) a complementary information fusion module, which includes a frame selection module and a Masked Cross-Temporal Attention Module, enabling additional foreground information extraction from complementary frame. Our Experimental results demonstrate that TemSAM significantly outperforms existing methods. Our code is available at https://github.com/zhang-liang-hust/TemSAM.

Subject: MICCAI.2025


#23 Exemplar Med-DETR: Toward Generalized and Robust Lesion Detection in Mammogram Images and Beyond [PDF] [Copy] [Kimi] [REL]

Authors: Bhat Sheethal, Georgescu Bogdan, Panambur Adarsh Bhandary, Zinnen Mathias, Nguyen Tri-Thien, Mansoor Awais, Elbarbary Karim Khalifa, Bayer Siming, Ghesu Florin-Cristian, Grbic Sasa, Maier Andreas, Bhat Sheethal, Georgescu Bogdan, Panambur Adarsh Bhandary, Zinnen Mathias, Nguyen Tri-Thien, Mansoor Awais, Elbarbary Karim Khalifa, Bayer Siming, Ghesu Florin-Cristian, Grbic Sasa, Maier Andreas

Detecting abnormalities in medical images poses unique challenges due to differences in feature representations and the intricate relationship between anatomical structures and abnormalities. This is especially evident in mammography, where dense breast tissue can obscure lesions, complicating radiological interpretation. Despite leveraging anatomical and semantic context, existing detection methods struggle to learn effective class-specific features, limiting their applicability across different tasks and imaging modalities. In this work, we introduce Exemplar Med-DETR, a novel multi-modal contrastive detector that enables feature-based detection. It employs cross-attention with inherently derived, intuitive class-specific exemplar features and is trained with an iterative strategy. We achieve state-of-the-art performance across three distinct imaging modalities from four public datasets. On Vietnamese dense breast mammograms, we attain an mAP50 of 0.7 for mass detection and 0.55 for calcifications, yielding an absolute improvement of 16% points from previous state-of-the-art. Additionally, a radiologist-supported evaluation of 100 mammograms from an out-of-distribution Chinese cohort demonstrates a twofold gain in lesion detection performance. For chest X-rays and angiography, we achieve an mAP50 of 0.25 for mass and 0.37 for stenosis detection, improving results by 4% and 7% points, respectively. These results highlight the potential of our approach to advance robust and generalizable detection systems for medical imaging.

Subject: MICCAI.2025


#24 Surface-based Multi-Axis Longitudinal Disentanglement Using Contrastive Learning for Alzheimer’s Disease [PDF] [Copy] [Kimi] [REL]

Authors: Zhang Jianwei, Shi Yonggang, Zhang Jianwei, Shi Yonggang

Accurate modeling of disease progression is essential for comprehending the heterogeneous neuropathologies such as Alzheimer’s Disease (AD). Traditional neuroimaging analysis often confound disease effects with normal aging, complicating the differential diagnosis. Recent advancements in deep learning have catalyzed the development of disentanglement techniques in Autoencoder networks, aiming to segregate longitudinal changes attributable to aging from those due to disease-specific alterations within the latent space. However, existing longitudinal disentanglement methods usually model disease as a single axis factor which ignores the complexity and heterogeneity of Alzheimer’s Disease. In response to this issue, we propose a novel Surface-based Multi-axis Disentanglement framework.This framework posits multiple disease axes within the latent space, enhancing the model’s capacity to encapsulate the multifaceted nature of AD, which includes various disease trajectories. To assign axes to data trajectories without explicit ground truth labels, we implement a longitudinal contrastive loss leveraging self-supervision, thereby refining the separation of disease trajectories. Evaluated on the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset (N=1321), our model demonstrates superior performance in delineating between cognitively normal (CN), mild cognitive impairment (MCI), and AD subjects,classification of stable MCI vs converting MCI and Amyloid status, compared to the single-axis model. This is further substantiated through an ablation study on the contrastive loss, underscoring the utility of our multi-axis approach in capturing the complex progression patterns of AD. The code is available at: https://github.com/jianweizhang17/MultiAxisDisentanglement.git

Subject: MICCAI.2025


#25 T2GS: Comprehensive Reconstruction of Dynamic Surgical Scenes with Gaussian Splatting [PDF] [Copy] [Kimi] [REL]

Authors: Xu Jinjing, Li Chenyang, Liu Peng, Pfeiffer Micha, Liu Liwen, Docea Reuben, Wagner Martin, Speidel Stefanie, Xu Jinjing, Li Chenyang, Liu Peng, Pfeiffer Micha, Liu Liwen, Docea Reuben, Wagner Martin, Speidel Stefanie

Surgical scene reconstruction from endoscopic video is crucial for many applications in computer- and robot-assisted surgery. However, existing methods primarily focus on soft tissue deformation while often neglecting the dynamic motion of surgical tools, limiting the completeness of the reconstructed scene. To bridge the aforementioned research gap, we propose T^2GS, a novel and efficient surgical scene reconstruction framework that enables efficient spatio-temporal modelling of both deformable tissues and dynamically interacting surgical tools. T^2GS leverages Gaussian Splatting for dynamic scene reconstruction, and it integrates a recent tissue deformation modelling technique while most importantly, introduces a novel efficient tool motion model (ETMM). At its core, ETMM disambiguates the modelling process of tool’s motion as global trajectory modelling and local shape-change modelling. We additionally propose pose-informed pointcloud fusion (PIPF), holistically initialized of tools’ gaussians for improved tool motion modelling. Extensive experiments on public datasets demonstrate T^2GS’s superior performance for comprehensive endoscopic scene reconstruction compared to previous methods. Moreover, as we specifically design our method with efficiency in concern, T^2GS also showcases promising reconstruction efficiency (3mins) and rendering speed (71fps), highlighting its potential for intraoperative applications. Our code is available at https://gitlab.com/nct_tso_public/ttgs.

Subject: MICCAI.2025