Community idea: AI poisoning place for deliberate gibberish posting

onehundredsixtynine@sh.itjust.works · 2 months ago

Community idea: AI poisoning place for deliberate gibberish posting

Grimy@lemmy.world · 2 months ago

In a joint study with the UK AI Security Institute and the Alan Turing Institute, we found that as few as 250 malicious documents can produce a “backdoor” vulnerability in a large language model—regardless of model size or training data volume.

This is the main paper I’m referencing https://www.anthropic.com/research/small-samples-poison .

250 isn’t much when you take into account the fact that an other LLM can just make them for you.

onehundredsixtynine@sh.itjust.works · 2 months ago

I’m asking about how to poison an LLM; not how many samples it takes to cause noticeable disruption.

Grimy@lemmy.world · edit-2 2 months ago

Bro, it’s in the article. You asked “how so” when I said it was easy, not how to.