Learning to Reason from Feedback at Test-Time

2025.acl-long.262@ACL

Total: 1

#1 Learning to Reason from Feedback at Test-Time [PDF⁶] [Copy] [Kimi⁵] [REL]

Authors: Yanyang Li, Michael R. Lyu, Liwei Wang

Solving complex tasks in a single attempt is challenging for large language models (LLMs). Iterative interaction with the environment and feedback is often required to achieve success, making effective feedback utilization a critical topic. Existing approaches either struggle with length generalization or rely on naive retries without leveraging prior information. In this paper, we introduce FTTT, a novel paradigm that formulates feedback utilization as an optimization problem at test time. Additionally, we propose a learnable test-time optimizer, OpTune, to effectively exploit feedback. Experiments on two LLMs across four reasoning datasets demonstrate that FTTT and OpTune achieve superior scalability and performance.

Subject: ACL.2025 - Long Papers

2025.acl-long.262@ACL

#1 Learning to Reason from Feedback at Test-Time [PDF6] [Copy] [Kimi5] [REL]

#1 Learning to Reason from Feedback at Test-Time [PDF⁶] [Copy] [Kimi⁵] [REL]