[wlug] New Attack On AI Bots: Prompt Injection

17 Sep 2022

      A Twitter bot account run by a company that posts remote job listings
was programmed to respond to tweets directed to it with general
inoffensive comments touting the positive aspects of remote work.

However
<https://arstechnica.com/information-technology/2022/09/twitter-pranksters-derail-gpt-3-bot-with-newly-discovered-prompt-injection-hack/>,
some other Twitter users discovered that if they began their tweet with
directives to “ignore previous instructions”, they could get the bot to
say just about anything. And much hilarity ensued.

Basically, the way the bot program works is it begins with some
standard instructions on what to say, provided by the people who set up
the account, to which is appended whatever tweet content was sent to the
bot by the other user. This is then processed by the AI system to
produce a response.

This attack has been dubbed “prompt injection”, by analogy with “SQL
injection” and other similar attacks on non-AI systems. But whereas
those attacks can be blocked using basic, well-known syntactic
quoting techniques, there seem to be no easy equivalents for stopping an
AI paying attention to instructions it should not be obeying.

[wlug] New Attack On AI Bots: Prompt Injection

Lawrence D'Oliveiro