
A Twitter bot account run by a company that posts remote job listings was programmed to respond to tweets directed to it with general inoffensive comments touting the positive aspects of remote work. However <https://arstechnica.com/information-technology/2022/09/twitter-pranksters-derail-gpt-3-bot-with-newly-discovered-prompt-injection-hack/>, some other Twitter users discovered that if they began their tweet with directives to “ignore previous instructions”, they could get the bot to say just about anything. And much hilarity ensued. Basically, the way the bot program works is it begins with some standard instructions on what to say, provided by the people who set up the account, to which is appended whatever tweet content was sent to the bot by the other user. This is then processed by the AI system to produce a response. This attack has been dubbed “prompt injection”, by analogy with “SQL injection” and other similar attacks on non-AI systems. But whereas those attacks can be blocked using basic, well-known syntactic quoting techniques, there seem to be no easy equivalents for stopping an AI paying attention to instructions it should not be obeying.