Total: 1
Characterizing users and items through vector representations is crucial for various tasks in recommender systems. Recent approaches attempt to apply Large Language Models (LLMs) in recommendation through a question\&answer format, where real items (eg, Item No.2024) are represented with compound words formed from in-vocabulary tokens (eg, ``item``, ``20``, ``24``). However, these tokens are not suitable for representing items, as their meanings are shaped by pre-training on natural language tasks, limiting the model's ability to capture user-item relationships effectively. In this paper, we explore how to effectively characterize users and items in LLM-based recommender systems from the token construction view. We demonstrate the necessity of using out-of-vocabulary (OOV) tokens for the characterization of items and users, and propose a well-constructed way of these OOV tokens. By clustering the learned representations from historical user-item interactions, we make the representations of user/item combinations share the same OOV tokens if they have similar properties. This construction allows us to capture user/item relationships well (memorization) and preserve the diversity of descriptions of users and items (diversity). Furthermore, integrating these OOV tokens into the LLM’s vocabulary allows for better distinction between users and items and enhanced capture of user-item relationships during fine-tuning on downstream tasks. Our proposed framework outperforms existing state-of-the-art methods across various downstream recommendation tasks.