WaterPool: A Language Model Watermark Mitigating Trade-Offs among Imperceptibility, Efficacy and Robustness

#1 WaterPool: A Language Model Watermark Mitigating Trade-Offs among Imperceptibility, Efficacy and Robustness [PDF] [Copy] [Kimi] [REL]

Watermarking is a prominent technique to trace the usage of specific large language models (LLMs) by injecting patterns into model-generated content. An ideal watermark should be imperceptible, easily detectable, and robust to text alterations, yet existing methods typically face trade-offs among these properties. This paper utilizes a key-centered scheme to unify existing methods by decomposing a watermark into two components: a key module and a mark module. We show that the trade-off issue is the reflection of the conflict between the scale of the key sampling space during generation and the complexity of key restoration during detection within the key module. To this end, we introduce WaterPool, a simple yet effective key module that preserves a complete key sampling space for imperceptibility while utilizing semantics-based search to improve the key restoration process. WaterPool can integrate seamlessly with existing watermarking techniques, significantly enhancing their performance, achieving near-optimal imperceptibility, and markedly improving their detection efficacy and robustness (+12.73% for KGW, +20.27% for EXP, +7.27% for ITS).

Subject: NAACL.2025 - Long Papers

2025.naacl-long.209@ACL

#1 WaterPool: A Language Model Watermark Mitigating Trade-Offs among Imperceptibility, Efficacy and Robustness [PDF] [Copy] [Kimi] [REL]