33570@AAAI

Total: 1

#1 Optimal and Efficient Binary Questioning for Accelerated Annotation [PDF1] [Copy] [Kimi] [REL]

Authors: Franco Marchesoni-Acland, Jean-Michel Morel, Josselin Kherroubi, Gabriele Facciolo

Even though data annotation is extremely important for interpretability, research, and development of artificial intelligence solutions, annotating data remains costly. Research efforts such as active learning or few-shot learning alleviate the cost by increasing sample efficiency, yet the problem of annotating data more quickly has received comparatively little attention. Leveraging a predictor has been shown to reduce annotation cost in practice but has not been theoretically considered. We ask the following question: to annotate a binary classification dataset with N samples, can the annotator answer less than N yes/no questions? Framing this question-and-answer (Q&A) game as an optimal encoding problem, we find a positive answer given by the Huffman encoding of the possible labelings. Unfortunately, the algorithm is computationally intractable even for small dataset sizes. As a practical method, we propose to minimize a cost function a few steps ahead, similarly to lookahead minimization in optimal control. This solution is analyzed, compared with the optimal one, and evaluated using several synthetic and real-world datasets. The method allows a significant improvement (23-86%) in the annotation efficiency of real-world datasets.

Subject: AAAI.2025 - Humans and AI