2025.acl-long.1365@ACL

Total: 1

#1 Tokenisation is NP-Complete [PDF1] [Copy] [Kimi3] [REL]

Authors: Philip Whittington, Gregor Bachmann, Tiago Pimentel

In this work, we prove the NP-completeness of two variants of tokenisation, defined here as the problem of compressing a dataset to at most 𝛿 symbols by either finding a vocabulary directly (_direct_ tokenisation), or selecting a sequence of merge operations (_bottom-up_ tokenisation).

Subject: ACL.2025 - Long Papers