Same Question, Different Words: A Latent Adversarial Framework for Prompt Robustness

2025.emnlp-main.1595@ACL

Total: 1

#1 Same Question, Different Words: A Latent Adversarial Framework for Prompt Robustness [PDF] [Copy] [Kimi] [REL]

Insensitivity to semantically-preserving variations of prompts (paraphrases) is crucial for reliable behavior and real-world deployment of large language models. However, language models exhibit significant performance degradation with semantically equivalent but differently phrased prompts, and existing solutions either depend on trial-and-error prompt engineering or require computationally expensive inference-time algorithms. In this study, built on the key insight that worst-case prompts exhibit a drift in embedding space, we present Latent Adversarial Paraphrasing (LAP), a dual-loop adversarial framework that optimizes a trainable perturbation as “latent continuous paraphrase” and language model performance on these perturbations iteratively. Extensive experiments are conducted to demonstrate the effectiveness of LAP across multiple backbones on the RobustAlpaca benchmark with a 0.5%-4% absolution improvement on worst-case win-rate.

Subject: EMNLP.2025 - Main