RAV: Retrieval-Augmented Voting for Tactile Descriptions Without Training

#1 RAV: Retrieval-Augmented Voting for Tactile Descriptions Without Training [PDF²] [Copy] [Kimi¹] [REL]

Authors: Jinlin Wang, Yulong Ji, Hongyu Yang

Tactile perception is essential for human-environment interaction, and deriving tactile descriptions from multimodal data is a key challenge for embodied intelligence to understand human perception. Conventional approaches relying on extensive parameter learning for multimodal perception are rigid and computationally inefficient. To address this, we introduce Retrieval-Augmented Voting (RAV), a parameter-free method that constructs visual-tactile cross-modal knowledge directly. RAV retrieves similar visual-tactile data for given visual and tactile inputs and generates tactile descriptions through a voting mechanism. In experiments, we applied three voting strategies, SyncVote, DualVote and WeightVote, achieving performance comparable to large-scale cross-modal models without training. Comparative experiments across datasets of varying quality—defined by annotation accuracy and data diversity—demonstrate that RAV’s performance improves with higher-quality data at no additional computational cost. Code, and model checkpoints are opensourced at https://github.com/PluteW/RAV.

Subject: EMNLP.2025 - Main

2025.emnlp-main.315@ACL

#1 RAV: Retrieval-Augmented Voting for Tactile Descriptions Without Training [PDF2] [Copy] [Kimi1] [REL]

#1 RAV: Retrieval-Augmented Voting for Tactile Descriptions Without Training [PDF²] [Copy] [Kimi¹] [REL]