MURKA: Multi-Reward Reinforcement Learning with Knowledge Alignment for Optimization Tasks

#1 MURKA: Multi-Reward Reinforcement Learning with Knowledge Alignment for Optimization Tasks [PDF] [Copy] [Kimi] [REL]

Authors: Wantong Xie, Yi-Xiang Hu, Jieyang Xu, Feng Wu, Xiangyang Li

Optimization plays a central role in Operations Research (OR) and numerous industrial applications, yet automating the end-to-end process of translating natural language descriptions into executable optimization programs remains a formidable challenge. While recent efforts have applied Large Language Models (LLMs) to this task, existing approaches are hindered by high inference costs, limited robustness across domains, and weak verification mechanisms. In this work, we propose MURKA, a reinforcement learning and knowledge distillation-based framework that enhances LLM-driven optimization modeling via collaborative agent alignment. MURKA orchestrates three specialized agents---Extractor, Solver, and Checker---to achieve accurate problem understanding, robust formulation, and verifiable execution. The Extractor is trained using group relative policy optimization with a composite reward function that incorporates semantic correctness and execution fidelity. The Solver benefits from knowledge distillation from a powerful teacher model, yielding structurally valid and executable formulations in AMPL. The Checker iteratively verifies solution correctness via solver feedback. We validate MURKA's generalizability through extensive experiments across diverse OR benchmarks, demonstrating its robustness and scalability. Experimental results on eight diverse OR benchmarks, including NLP4LP, ComplexOR, and NL4Opt, demonstrate that MURKA, built on the LLaMa3-8B backbone, achieves a 5.9\% absolute improvement in solution accuracy and a 5.1\% increase in execution success rate compared to leading baselines. These results establish MURKA as an effective and scalable paradigm for LLM-driven optimization, with strong potential for deployment in real-world OR applications.

Subject: NeurIPS.2025 - Poster

f4pvPNf9ox@OpenReview

#1 MURKA: Multi-Reward Reinforcement Learning with Knowledge Alignment for Optimization Tasks [PDF] [Copy] [Kimi] [REL]