Total: 1
Although attention-based multi-instance learning (MIL) algorithms have achieved impressive performance on slide-level whole slide image (WSI) classification tasks, they are prone to mistakenly focusing on irrelevant patterns such as staining conditions and tissue morphology, leading to incorrect patch-level predictions and unreliable interpretability. In this paper, we analyze why attention-based methods tend to rely on spurious correlations in their predictions. Furthermore, we revisit max-pooling-based approaches and examine the reasons behind the underperformance of existing methods. We argue that well-trained max-pooling-based MIL models can make predictions based on causal factors and avoid relying on spurious correlations. Building on these insights, we propose a simple yet effective max-pooling-based MIL method (FocusMIL) that outperforms existing mainstream attention-based methods on two datasets. In this position paper, we advocate renewed attention to max-pooling-based methods to achieve more robust and interpretable predictions.