Can ChatGPT keep a secret? This is the question that started this challenge. If ChatGPT knew a secret, would it reveal it to the user? Could instructions be crafted to prevent it from ever revealing the secret?
Using CustomGPTs, different techniques and strategies were used to test ChatGPT’s ability to keep a secret. Each Custom GPT was given a secret phrase and instructions not to reveal the secret phrase. What became apparent was that the GPTs would frequently and easily reveal the secret phrases, despite instructions not to. Efforts were then made to try to figure out what instructions would be effective in reducing the methods and probabilities of the GPT revealing the secret phrase.
There are seven Custom GPTs that have different instructions and different secret phrases. They generally progress from easiest to hardest to get the secret phrase. However, prompts that didn’t work in one level may actually work in a higher level because the GPTs may use different strategies to prevent the secret from being revealed. For example, one Custom GPT may use a self-check strategy which might be effective in preventing the secret phrase from being revealed in generative actions, and another might use a very specific set of directives that are effective in preventing access to its instructions but not effective in generation. But there is a general increase in security of the secret phrase as you go up the levels.
Level 7 represents the most secure approach currently found. If/when that is compromised we will create a Level 8.
The seven Custom GPTs are public in the OpenAI Custom GPT marketplace. We encourage you to try to get the secret phrases yourself. Start at Level 1 and move up.
- Level 1: https://chatgpt.com/g/g-tIGJEhe0p-ai-secret-keeper-level-1
- Level 2: https://chatgpt.com/g/g-d4rDRFRIC-ai-secret-keeper-level-2
- Level 3: https://chatgpt.com/g/g-aVjEolqx3-ai-secret-keeper-level-3
- Level 4: https://chatgpt.com/g/g-hzl9RdVsK-ai-secret-keeper-level-4
- Level 5: https://chatgpt.com/g/g-p2jNWrDTy-ai-secret-keeper-level-5
- Level 6: https://chatgpt.com/g/g-OUv2IEeXe-ai-secret-keeper-level-6
- Level 7: https://chatgpt.com/g/g-sxG7PtzS3-ai-secret-keeper-level-7
email it to: [email protected]
The instructions for each GPT can be found in GPT_INSTRUCTIONS. You may want to try to get the secret phrases from the GPTs before reading the GPT instructions. But it may not actually matter because it's often not obvious how to get a GPT to reveal the secret phrases, even with the instructions.
Contact Ryan Semerau @ [email protected] for information about the implications of this and for in-depth findings about what this reveals about ChatGPT's ability to keep information private, and what strategies are most effective.