Blameless Users in a Clean Room: Defining Copyright Protection for Generative Models

#1 Blameless Users in a Clean Room: Defining Copyright Protection for Generative Models [PDF¹] [Copy] [Kimi¹] [REL]

Are there any conditions under which a generative model’s outputs are guaranteed not to infringe the copyrights of its training data? This is the question of "provable copyright protection" first posed by Vyas, Kakade, and Barak [ICML 2023]. They define _near access-freeness (NAF)_ and propose it as sufficient for protection. This paper revisits the question and establishes new foundations for provable copyright protection---foundations that are firmer both technically and legally. First, we show that NAF alone does not prevent infringement. In fact, NAF models can enable verbatim copying, a blatant failure of copy protection that we dub being _tainted_. Then, we introduce our _blameless copy protection framework_ for defining meaningful guarantees, and instantiate it with _clean-room copy protection_. Clean-room copy protection allows a user to control their risk of copying by behaving in a way that is unlikely to copy in a counterfactual "clean-room setting." Finally, we formalize a common intuition about differential privacy and copyright by proving that DP implies clean-room copy protection when the dataset is _golden_, a copyright deduplication requirement.

Subject: NeurIPS.2025 - Spotlight

V8SndhCN0z@OpenReview

#1 Blameless Users in a Clean Room: Defining Copyright Protection for Generative Models [PDF1] [Copy] [Kimi1] [REL]

#1 Blameless Users in a Clean Room: Defining Copyright Protection for Generative Models [PDF¹] [Copy] [Kimi¹] [REL]