Total: 1
Are there any conditions under which a generative model’s outputs are guaranteed not to infringe the copyrights of its training data? This is the question of "provable copyright protection" first posed by Vyas, Kakade, and Barak [ICML 2023]. They define _near access-freeness (NAF)_ and propose it as sufficient for protection. This paper revisits the question and establishes new foundations for provable copyright protection---foundations that are firmer both technically and legally. First, we show that NAF alone does not prevent infringement. In fact, NAF models can enable verbatim copying, a blatant failure of copy protection that we dub being _tainted_. Then, we introduce our _blameless copy protection framework_ for defining meaningful guarantees, and instantiate it with _clean-room copy protection_. Clean-room copy protection allows a user to control their risk of copying by behaving in a way that is unlikely to copy in a counterfactual "clean-room setting." Finally, we formalize a common intuition about differential privacy and copyright by proving that DP implies clean-room copy protection when the dataset is _golden_, a copyright deduplication requirement.