The rise of AI Worms

Generative AI systems are being used more and more as they develop, such as OpenAI’s ChatGPT and Google’s Gemini. AI agents and ecosystems are being developed by startups and tech businesses to sit atop systems that can automate tedious tasks for you. Examples of these include scheduling appointments and possibly even making purchases. However, the increased flexibility granted to the tools also raises the possibility of attacks against them.

As an example of the dangers associated with interconnected, self-governing AI ecosystems, a team of academics has developed what they describe as the first generative AI worms, capable of propagating across several platforms and perhaps stealing information or infecting them with malware. As explained by Cornell Tech researcher Ben Nassi, “it basically means that you have the ability to conduct or perform a new kind of cyberattack that hasn’t been seen before.”

The worm, known as Morris II, was developed by Nassi and fellow academics Ron Bitton and Stav Cohen as an homage to the 1988 internet debacle caused by the first Morris computer worm. In an exclusive research paper and website for WIRED, the researchers demonstrate how the AI worm may breach some security measures in ChatGPT and Gemini by attacking a generative AI email assistance, stealing email data and sending spam.

The work was done in test environments rather than on a publicly accessible email assistant, and it comes at a time when large language models (LLMs) are becoming more and more multimodal—that is, capable of producing video and images in addition to text. Although there haven’t been any reports of generative AI worms in the wild, some experts claim that startups, developers, and tech corporations should be wary of this security danger.

The majority of generative AI systems operate by receiving language cues, which direct the tools to generate an image or respond to a query. Still, it is possible to use these prompts as a weapon against the system. Prompt injection attacks have the ability to provide a chatbot with secret instructions, whereas jailbreaks have the ability to cause a system to ignore its security measures and display offensive or hazardous content. An attacker might, for instance, obfuscate content on a website instructing an LLM to pose as a con artist and request your bank information.

The researchers used a self-replicating stimulus dubbed “adversarial” to develop the generative AI worm. According to the researchers, the generative AI model responds to this prompt by producing another prompt. Simply put, the AI system is instructed to generate a series of additional commands in its responses. The researchers claim that this is essentially comparable to classic SQL injection and buffer overflow attacks.

The researchers connected ChatGPT, Gemini, and open-source LLM, LLaVA, to develop an email system that could send and receive messages using generative AI in order to demonstrate how the worm may function. They then discovered two ways to take use of the system: one was to use a self-replicating prompt that was text-based, and the other was to embed the question within an image file.

In one case, the researchers took on the role of attackers and composed an email with the adversarial text prompt. This email “poisons” the email assistant’s database by utilizing retrieval-augmented generation (RAG), which allows LLMs to retrieve more data from outside their system. According to Nassi, the RAG “jailbreaks the GenAI service” when it retrieves an email in response to a user query and sends it to GPT-4 or Gemini Pro to generate a response. This ultimately results in the theft of data from the emails. When the generated response with the sensitive user data is utilized to respond to an email sent to a new client and then kept in the new client’s database, Nassi claims, it later infects new hosts.

According to the researchers, the second way involves the email helper forwarding the message to others by including a harmful prompt in an image. Any form of image carrying spam, abuse content, or even propaganda can be transferred further to new clients once the initial email has been delivered, according to Nassi, because the self-replicating command is encoded within the image.

A video showcasing the findings shows the email system repeatedly forwarding a message. Also, according to the experts, data extraction from emails is possible. According to Nassi, it may include names, phone numbers, credit card numbers, SSNs, or anything else deemed private.

The discovery represents a warning about “bad architecture design” inside the larger AI ecosystem, the researchers claim, even though it violates certain of ChatGPT and Gemini’s safety protocols. They nevertheless informed Google and OpenAI of their discoveries. An OpenAI representative states, “They seem to have discovered a way to exploit prompt-injection type vulnerabilities by relying on user input that hasn’t been checked or filtered.” The company is attempting to make its systems “more resilient,” and developers are advised to “use methods that ensure they are not working with harmful input.” Google said it would not comment on the study. Nassi’s messages reveal that the researchers at the company asked to meet in order to discuss the topic.

Although the worm’s presentation occurs in a largely controlled setting, developers should be aware of the potential threat posed by generative AI worms in the future, according to many security experts who have evaluated the research. This is especially true when users grant AI applications permission to act on their behalf, such as scheduling appointments or sending emails, and when these tasks may need collaboration with other AI agents. Security experts from China and Singapore have demonstrated in another recent study how they might jailbreak a million LLM agents in less than five minutes.

Sahar Abdelnabi, a researcher at the CISPA Helmholtz Center for Information Security in Germany, notes that worms may be possible when AI models take in data from external sources or the AI agents can work autonomously. Abdelnabi worked on some of the first demonstrations of prompt injections against LLMs in May 2023 and highlighted that worms may be possible. According to Abdelnabi, she finds the concept of dispersing injections to be highly likely. Everything is dependent upon the kind of applications that make use of these models. Although this type of attack is currently being simulated, according to Abdelnabi, it might not remain theoretical for very long.

Nassi and the other researchers report that they expect to see generative AI worms in the wild within the next two to three years in a publication that summarizes their findings. According to the research paper, numerous businesses in the sector are actively developing GenAI ecosystems by including GenAI capabilities into their automobiles, smartphones, and operating systems.

Nevertheless, there are ways to protect generative AI systems against possible worms, such as by employing conventional security techniques. Adam Swanda, a threat researcher with AI business security company Robust Intelligence, thinks that “proper secure application design and monitoring could address parts of this with a lot of these issues.” Generally speaking, you should not rely on LLM output in any part of your application.

Swanda adds that a critical mitigation that can be implemented is involving humans in the process—that is, making sure AI bots aren’t permitted to act without authorization. It is not desirable for an LLM who has read your email to be able to write another one. There need to be a line drawn there. Swanda notes that if a question is being repeated thousands of times within Google and OpenAI’s systems, that will generate a lot of “noise” and could be simple to identify.

Numerous mitigation strategies are reiterated by both Nassi and the research. In the end, Nassi asserts, anyone developing AI helpers must be cognizant of the hazards. According to him, you should ascertain whether the development of the ecosystem and applications within your organization mostly adheres to one of these strategies. “Because this needs to be considered if they do.”

Source link