- cross-posted to:
- aistuff@lemdro.id
- cross-posted to:
- aistuff@lemdro.id
By “attack” they mean “jailbreak”. It’s also nothing like a buffer overflow.
The article is interesting though and the approach to generating these jailbreak prompts is creative. It looks a bit similar to the unspeakable tokens thing: https://www.vice.com/en/article/epzyva/ai-chatgpt-tokens-words-break-reddit
That seems like they left debugging code enabled/accessible.
That seems like they left debugging code enabled/accessible.
No, this is actually a completely different type of problem. LLMs also aren’t code, and they aren’t manually configured/set up/written by humans. In fact, we kind of don’t really know what’s going on internally when performing inference with an LLM.
The actual software side of it is more like a video player that “plays” the LLM.