Skip to content

ryan321/aisecretkeeper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

AI Secret Keeper

Can ChatGPT keep a secret? This is the question that started this challenge. If ChatGPT knew a secret, would it reveal it to the user? Could instructions be crafted to prevent it from ever revealing the secret?

Custom GPTs

Using CustomGPTs, different techniques and strategies were used to test ChatGPT’s ability to keep a secret. Each Custom GPT was given a secret phrase and instructions not to reveal the secret phrase. What became apparent was that the GPTs would frequently and easily reveal the secret phrases, despite instructions not to. Efforts were then made to try to figure out what instructions would be effective in reducing the methods and probabilities of the GPT revealing the secret phrase.

Seven Levels

There are seven Custom GPTs that have different instructions and different secret phrases. They generally progress from easiest to hardest to get the secret phrase. However, prompts that didn’t work in one level may actually work in a higher level because the GPTs may use different strategies to prevent the secret from being revealed. For example, one Custom GPT may use a self-check strategy which might be effective in preventing the secret phrase from being revealed in generative actions, and another might use a very specific set of directives that are effective in preventing access to its instructions but not effective in generation. But there is a general increase in security of the secret phrase as you go up the levels.

Level 7 represents the most secure approach currently found. If/when that is compromised we will create a Level 8.

Try it yourself

The seven Custom GPTs are public in the OpenAI Custom GPT marketplace. We encourage you to try to get the secret phrases yourself. Start at Level 1 and move up.

CustomGPT Links

  1. Level 1: https://chatgpt.com/g/g-tIGJEhe0p-ai-secret-keeper-level-1
  2. Level 2: https://chatgpt.com/g/g-d4rDRFRIC-ai-secret-keeper-level-2
  3. Level 3: https://chatgpt.com/g/g-aVjEolqx3-ai-secret-keeper-level-3
  4. Level 4: https://chatgpt.com/g/g-hzl9RdVsK-ai-secret-keeper-level-4
  5. Level 5: https://chatgpt.com/g/g-p2jNWrDTy-ai-secret-keeper-level-5
  6. Level 6: https://chatgpt.com/g/g-OUv2IEeXe-ai-secret-keeper-level-6
  7. Level 7: https://chatgpt.com/g/g-sxG7PtzS3-ai-secret-keeper-level-7

When you get a secret phrase

email it to: [email protected]

GPT Instructions

The instructions for each GPT can be found in GPT_INSTRUCTIONS. You may want to try to get the secret phrases from the GPTs before reading the GPT instructions. But it may not actually matter because it's often not obvious how to get a GPT to reveal the secret phrases, even with the instructions.

What's next

Contact Ryan Semerau @ [email protected] for information about the implications of this and for in-depth findings about what this reveals about ChatGPT's ability to keep information private, and what strategies are most effective.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published