Lexara: A User-Centered Toolkit for Evaluating Large Language Models for Conversational Visual Analytics

#1 Lexara: A User-Centered Toolkit for Evaluating Large Language Models for Conversational Visual Analytics [PDF] [Copy] [Kimi] [REL]

Authors: Srishti Palani, Vidya Setlur

Large Language Models (LLMs) are transforming Conversational Visual Analytics (CVA) by enabling data analysis through natural language. However, evaluating LLMs for CVA remains a challenge: requiring programming expertise, overlooking real-world complexity, and lacking interpretable metrics for multi-format (visualizations and text) outputs. Through interviews with 22 CVA developers and 16 end-users, we identified use cases, evaluation criteria and workflows. We present Lexara, a user-centered evaluation toolkit for CVA that operationalizes these insights into: (i) test cases spanning real-world scenarios; (ii) interpretable metrics covering visualization quality (data fidelity, semantic alignment, functional correctness, design clarity) and language quality (factual grounding, analytical reasoning, conversational coherence) using rule-based and LLM-as-a-Judge methods; and (iii) an interactive toolkit enabling experimental setup and multi-format and multi-level exploration of results without programming expertise. We conducted a two-week diary study with six CVA developers, drawn from our initial cohort of 22. Their feedback demonstrated Lexara's effectiveness for guiding appropriate model and prompt selection.

Subjects: Human-Computer Interaction , Artificial Intelligence

Publish: 2026-03-06 02:30:55 UTC

2603.05832

#1 Lexara: A User-Centered Toolkit for Evaluating Large Language Models for Conversational Visual Analytics [PDF] [Copy] [Kimi] [REL]