Total: 1
Recent advances in computational pathology have introduced whole slide image (WSI)-level multimodal large language models (MLLMs) for automated pathological analysis. However, current WSI-level MLLMs face two critical challenges: limited explainability in their decision-making process and insufficient attention to morphological features crucial for accurate diagnosis. To address these challenges, we first introduce WSI-Bench, a large-scale morphology-aware benchmark containing 180k VQA pairs from 9,850 WSIs across 30 cancer types, specifically designed to evaluate MLLMs' understanding of morphological characteristics crucial for accurate diagnosis. To the best of our knowledge, WSI-Bench presents the first benchmarking systematically evaluate morphological understanding capabilities in WSI analysis. To enhance the model explainability, we present WSI-LLaVA, an MLLM framework for gigapixel WSI understanding with a three-stage training strategy, which can provide detailed morphological findings to explain its final answer. For more precise model assessment in pathological contexts, we develop two specialized WSI metrics: WSI-Precision and WSI-Relevance. Extensive evaluation on WSI-Bench reveals both the capabilities and limitations of current WSI MLLMs in morphological analysis and various pathology tasks, while demonstrating WSI-LLaVA's superior performance across all capabilities.