Promptware: The Evolving AI Malware Threat & 7-Step Kill Chain

by Rachel Kim – Technology Editor

A Google Calendar invitation containing a malicious prompt successfully livestreamed video of an unsuspecting user, demonstrating a new class of cyberattack targeting artificial intelligence systems. The incident, detailed in recent research, highlights the emerging threat of “promptware” – a sophisticated, multi-stage malware execution mechanism exploiting vulnerabilities in large language models (LLMs).

Security researchers are warning that the dominant focus on “prompt injection” – embedding malicious instructions into LLM inputs – obscures a more dangerous reality. Prompt injection represents only the initial access point in a complex attack chain that mirrors traditional malware campaigns like Stuxnet and NotPetya, according to a new analysis published by researchers at Lawfare.

The core issue lies in the fundamental architecture of LLMs. Unlike conventional computing systems that maintain a strict separation between executable code and user data, LLMs process all input – system commands, user emails and retrieved documents – as a continuous sequence of tokens. This lack of architectural boundaries allows malicious instructions embedded within seemingly harmless content to be processed with the same authority as legitimate commands.

Once inside the system, the attack progresses through “Privilege Escalation,” often referred to as “jailbreaking.” Attackers circumvent the safety training and policy guardrails implemented by LLM vendors like OpenAI and Google. Techniques akin to social engineering, or the leverage of adversarial suffixes, trick the model into performing actions it would normally refuse. This escalation grants the attacker full access to the underlying model’s capabilities.

Following privilege escalation, the attack enters a “Reconnaissance” phase, where the LLM is manipulated to reveal information about its assets, connected services, and capabilities. This allows the attack to advance autonomously, without immediately alerting the victim. Unlike traditional malware reconnaissance, which typically precedes initial access, promptware reconnaissance occurs after successful initial access and jailbreaking.

“A transient attack that disappears after one interaction with the LLM application is a nuisance; a persistent one compromises the LLM application for good,” the Lawfare analysis states. The “Persistence” phase involves embedding the promptware into the long-term memory of an AI agent or poisoning the databases it relies on. A compromised email archive, for example, could re-execute malicious code every time the AI summarizes past emails.

The establishment of “Command-and-Control” (C2) allows attackers to evolve the threat from a static payload into a controllable trojan. Dynamic fetching of commands during LLM inference enables modification of the promptware’s behavior.

“Lateral Movement” leverages the interconnectedness of AI agents to spread the attack to other users, devices, and systems. Giving AI access to emails, calendars, and enterprise platforms creates pathways for malware propagation. An infected email assistant could forward malicious payloads to all contacts, replicating the attack like a computer virus. Attacks can also pivot from calendar invites to control smart home devices or exfiltrate data from connected web browsers.

The final stage, “Actions on Objective,” involves achieving tangible malicious outcomes, such as data exfiltration, financial fraud, or even physical world impact. Researchers have demonstrated AI agents being manipulated into selling cars for a single dollar or transferring cryptocurrency to attacker-controlled wallets. Agents with coding capabilities can be tricked into executing arbitrary code, granting attackers total system control.

The Google Calendar attack, detailed in the research paper “Invitation Is All You Demand,” achieved initial access by embedding a malicious prompt in the title of a calendar event. The prompt then leveraged a technique known as delayed tool invocation to execute the injected instructions. The prompt’s persistence within the Google Calendar artifact ensured its continued execution.

Another demonstration, described in “Here Comes the AI Worm,” achieved initial access via a prompt injected into an email. The prompt employed role-playing to compel the LLM to follow the attacker’s instructions. The injected prompt instructed the LLM to replicate itself and exfiltrate sensitive user data, leading to lateral movement when the email assistant drafted new emails containing the compromised information.

The researchers argue that prompt injection is not a problem that can be “fixed” with current LLM technology. Instead, a comprehensive defensive strategy is needed, focusing on breaking the kill chain at subsequent stages – limiting privilege escalation, constraining reconnaissance, preventing persistence, disrupting C2, and restricting agent actions. The shift requires a move from reactive patching to systematic risk management.

You may also like

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.