Total: 1
With the development of large language models (LLMs), numerous online applications based on these models have emerged. As system prompts significantly influence the performance of LLMs, many such applications conceal their system prompts and regard them as intellectual property. Consequently, numerous efforts have been made to steal these system prompts. However, for applications that do not publicly disclose their system prompts, previously stolen prompts have low confidence. This is because previous methods rely on confirmation from application developers, which is unrealistic since developers may be unwilling to acknowledge that their system prompts have been leaked. We observed a phenomenon: when an LLM performs repetitive tasks, it accurately repeats based on the context rather than relying on its internal model parameters. We validated this phenomenon by comparing the results of two different inputs—repetitive tasks and knowledge-based tasks—under conditions of normal execution, contaminated execution, and partially restored execution. By contaminating the input nouns and then partially restoring them using data from the normal execution's intermediate layers, we measured the accuracies of both task types across these three execution processes. Based on this phenomenon, we propose a high-confidence leakage method called RepeatLeakage. By specifying the range that the model needs to repeat and encouraging the model not to change the format, we manage to extract its system prompt and conversation contexts. We validated the repetition phenomenon on multiple open-source models and successfully designed prompts using RepeatLeakage to leak contents from the actual system prompts of GPT-Store and publicly available ChatGPT conversation contexts. Finally, we tested RepeatLeakage in real environments such as ChatGPT web, successfully leaking their system prompts and conversation contexts.