A New Attack Impacts ChatGPT—and No One Knows How to Stop It

CAVOK@lemmy.world · 1 year ago

A New Attack Impacts ChatGPT—and No One Knows How to Stop It

Kerfuffle@sh.itjust.works · 1 year ago

By “attack” they mean “jailbreak”. It’s also nothing like a buffer overflow.

The article is interesting though and the approach to generating these jailbreak prompts is creative. It looks a bit similar to the unspeakable tokens thing: https://www.vice.com/en/article/epzyva/ai-chatgpt-tokens-words-break-reddit

dan1101@lemmy.world · 1 year ago

That seems like they left debugging code enabled/accessible.

Kerfuffle@sh.itjust.works · 1 year ago

That seems like they left debugging code enabled/accessible.

No, this is actually a completely different type of problem. LLMs also aren’t code, and they aren’t manually configured/set up/written by humans. In fact, we kind of don’t really know what’s going on internally when performing inference with an LLM.

The actual software side of it is more like a video player that “plays” the LLM.