Large Language Models as User-Agents for Evaluating Task-Oriented-Dialogue Systems

#1 Large Language Models as User-Agents for Evaluating Task-Oriented-Dialogue Systems [PDF³] [Copy] [Kimi⁹] [REL]

Authors: Taaha Kazi, Ruiliang Lyu, Sizhe Zhou, Dilek Hakkani-Tur, Gokhan Tur

Traditionally, offline datasets have been used to evaluate task-oriented dialogue (TOD) models. These datasets lack context awareness, making them suboptimal benchmarks for conversational systems. In contrast, user-agents, which are context-aware, can simulate the variability and unpredictability of human conversations, making them better alternatives as evaluators. Prior research has utilized large language models (LLMs) to develop user-agents. Our work builds upon this by using LLMs to create user-agents for the evaluation of TOD systems. This involves prompting an LLM, using in-context examples as guidance, and tracking the user-goal state. Our evaluation of diversity and task completion metrics for the user-agents shows improved performance with the use of better prompts. Additionally, we propose methodologies for the automatic evaluation of TOD models within this dynamic framework.

Subjects: Computation and Language , Artificial Intelligence

Publish: 2024-11-15 06:05:45 UTC

2411.09972

#1 Large Language Models as User-Agents for Evaluating Task-Oriented-Dialogue Systems [PDF3] [Copy] [Kimi9] [REL]

#1 Large Language Models as User-Agents for Evaluating Task-Oriented-Dialogue Systems [PDF³] [Copy] [Kimi⁹] [REL]