Leveraging Unpaired Feedback for Long-Term LLM-based Recommendation Tuning

#1 Leveraging Unpaired Feedback for Long-Term LLM-based Recommendation Tuning [PDF] [Copy] [Kimi] [REL]

Authors: Jizhi Zhang, Chongming Gao, Wentao Shi, Xin Chen, Jingang Wang, Xunliang Cai, Fuli Feng

Most recommender systems focus on short-term objectives such as click-through rate, often at the expense of long-term user satisfaction. This can lead to echo chambers, where users are repeatedly exposed to redundant content. While recent efforts integrate Large Language Models (LLMs) into recommendation, they typically inherit this short-sighted focus. In this work, we highlight unpaired feedback—implicit signals such as continued engagement (positive) or silent disengagement (negative) that lack explicit contrastive labels—as a key challenge for long-term recommendation. Effectively learning from such feedback is crucial for improving LLM-based recommenders in dynamic user environments. To this end, we propose ULRec (Unpaired Feedback for Long-Term LLM-based Recommendation Tuning), a simple framework that fine-tunes LLMs using both positive and negative unpaired feedback. ULRec leverages the KTO algorithm to incorporate these signals without requiring paired supervision. Despite its simplicity, ULRec consistently improves long-term recommendation performance, demonstrating the value of modeling unpaired user feedback.

Subject: EMNLP.2025 - Findings

2025.findings-emnlp.1332@ACL

#1 Leveraging Unpaired Feedback for Long-Term LLM-based Recommendation Tuning [PDF] [Copy] [Kimi] [REL]