271@2022@IJCAI

Total: 1

#1 Robust High-Dimensional Classification From Few Positive Examples [PDF] [Copy] [Kimi] [REL]

Authors: Deepayan Chakrabarti ; Benjamin Fauber

We tackle an extreme form of imbalanced classification, with up to 105 features but as few as 5 samples from the minority class. This problem occurs in predicting predicting tumor types and fraud detection, among others. Standard imbalanced classification methods are not designed for such severe data scarcity. Sampling-based methods need too many samples due to the high-dimensionality, while cost-based methods must place too high a weight on the limited minority samples. Our proposed method, called DIRECT, bypasses sample generation by training the classifier over a robust smoothed distribution of the minority class. DIRECT is fast, simple, robust, parameter-free, and easy to interpret. We validate DIRECT on several real-world datasets spanning document, image, and medical classification. DIRECT is up to 5x − 7x better than SMOTE-like methods, 30−200% better than ensemble methods, 3x − 7x better than cost-sensitive methods. The greatest gains are for settings with the fewest samples in the minority class, where DIRECT’s robustness is most helpful.