Total: 1
The performance of learning-based object detection algorithms, which attempt to both classify and locate objects within images, is determined largely by the quality of the annotated dataset used for training. Two types of labelling noises are prevalent: objects that are incorrectly classified (categorization noise) and inaccurate bounding boxes (localization noise); both noises typically occur together in large-scale datasets. In this paper we propose a distillation-based method to train object detectors that takes into account both categorization and localization noise. The key insight underpinning our method is that the early-learning phenomenon - in which models trained on noisy data with mixed clean and false labels tend to first fit to the clean data, and memorize the false labels later -- manifests earlier for localization noise than for categorization noise. We propose a method that uses models from the early-learning phase (before overfitting to noisy data occurs) as a teacher network. A plug-in module implementation compatible with general object detection architectures is developed, and its performance is validated against the state-of-the-art using PASCAL VOC, MS COCO and VinDr-CXR medical detection datasets.