Post-Training and Test-Time Scaling of Generative Agent Behavior Models for Interactive Autonomous Driving

#1 Post-Training and Test-Time Scaling of Generative Agent Behavior Models for Interactive Autonomous Driving [PDF¹] [Copy] [Kimi] [REL]

Authors: Hyunki Seong, Jeong-Kyun Lee, Heesoo Myeong, Yongho Shin, Hyun-Mook Cho, Duck Hoon Kim, Pranav Desai, Monu Surana

Learning interactive motion behaviors among multiple agents is a core challenge in autonomous driving. While imitation learning models generate realistic trajectories, they often inherit biases from datasets dominated by safe demonstrations, limiting robustness in safety-critical cases. Moreover, most studies rely on open-loop evaluation, overlooking compounding errors in closed-loop execution. We address these limitations with two complementary strategies. First, we propose Group Relative Behavior Optimization (GRBO), a reinforcement learning post-training method that fine-tunes pretrained behavior models via group relative advantage maximization with human regularization. Using only 10% of the training dataset, GRBO improves safety performance by over 40% while preserving behavioral realism. Second, we introduce Warm-K, a warm-started Top-K sampling strategy that balances consistency and diversity in motion selection. Our Warm-K method-based test-time scaling enhances behavioral consistency and reactivity at test time without retraining, mitigating covariate shift and reducing performance discrepancies. Demo videos are available in the supplementary material.

Subjects: Robotics , Computer Vision and Pattern Recognition

Publish: 2025-12-15 12:18:50 UTC

2512.13262

#1 Post-Training and Test-Time Scaling of Generative Agent Behavior Models for Interactive Autonomous Driving [PDF1] [Copy] [Kimi] [REL]

#1 Post-Training and Test-Time Scaling of Generative Agent Behavior Models for Interactive Autonomous Driving [PDF¹] [Copy] [Kimi] [REL]