2606.06950

Total: 1

#1 When is 3D Worth It? A Resource-Performance Frontier for CNNs and Transformers in Lung CT [PDF] [Copy] [Kimi] [REL]

Authors: Md Enamul Hoq, Sharafat Hossain, Imraul Emmaka, Linda Larson-Prior, Lawrence Tarbox, Jonathan Bona, Donald Johann Jr. and Fred Prior

Three-dimensional models are widely assumed preferable for volumetric medical imaging, yet their practical value depends on whether performance gains justify added computational cost and complexity. Rather than proposing a new architecture, we study how input dimensionality (2D, 2.5D, 3D) affects model behavior across convolutional neural networks (CNNs) and Vision Transformers (ViTs) under a fixed training protocol. Using a leakage-free NLST cohort (n = 1,977) with supporting LIDC-IDRI data, we find that the 2.5D CNN offers the most favorable discrimination-stability trade-off in our comparison (ROC-AUC 0.682, 95% CI [0.546, 0.799]) with a stable operating point. In contrast, 3D CNNs show threshold instability, and transformers exhibit degenerate predictions, such as all-positive predictions. Confidence intervals are wide and overlapping, so we present these results as a controlled resource-performance frontier and a failure-mode taxonomy rather than as definitive superiority claims. For class-imbalanced lung cancer screening classification, 2D and 2.5D inputs provide a more reliable trade-off between performance, stability, and computational efficiency than full 3D representations.

Subjects: Computer Vision and Pattern Recognition , Artificial Intelligence

Publish: 2026-06-05 06:24:20 UTC