Continually Evolving Skill Knowledge in Vision Language Action Model

#1 Continually Evolving Skill Knowledge in Vision Language Action Model [PDF⁴] [Copy] [Kimi⁴] [REL]

Authors: Yuxuan Wu, Guangming Wang, Zhiheng Yang, Maoqing Yao, Brian Sheil, Hesheng Wang

Developing general robot intelligence in open environments requires continual skill learning. Recent Vision-Language-Action (VLA) models leverage massive pretraining data to support diverse manipulation tasks, but they still depend heavily on task-specific fine-tuning, revealing a lack of continual learning capability. Existing continual learning methods are also resource-intensive to scale to VLA models. We propose Stellar VLA, a knowledge-driven continual learning framework with two variants: T-Stellar, modeling task-centric knowledge space, and TS-Stellar, capturing hierarchical task-skill structure. Stellar VLA enables self-supervised knowledge evolution through joint learning of task latent representation and the knowledge space, reducing annotation needs. Knowledge-guided expert routing provide task specialization without extra network parameters, lowering training overhead.Experiments on the LIBERO benchmark and real-world tasks show over 50 percentage average improvement in final success rates relative to baselines. TS-Stellar further excels in complex action inference, and in-depth analyses verify effective knowledge retention and discovery. Our code will be released soon.

Subjects: Robotics , Artificial Intelligence

Publish: 2025-11-22 15:00:08 UTC

2511.18085

#1 Continually Evolving Skill Knowledge in Vision Language Action Model [PDF4] [Copy] [Kimi4] [REL]

#1 Continually Evolving Skill Knowledge in Vision Language Action Model [PDF⁴] [Copy] [Kimi⁴] [REL]