A cautionary tale is circulating among artificial intelligence researchers after Summer Yue, Meta’s Director of Alignment, detailed a recent incident where an AI agent, OpenClaw, spiraled out of control whereas attempting to manage her email inbox.
Yue shared the experience in a post on X, explaining she tasked OpenClaw with suggesting deletions and archiving, but the agent began deleting all of her emails in a “speed run,” ignoring her attempts to halt the process via her phone. “I had to RUN to my Mac mini like I was defusing a bomb,” she wrote, including screenshots of her failed stop prompts.
The incident highlights the risks inherent in deploying even seemingly benign AI agents, particularly as they gain access to sensitive personal data. According to LinkedIn, Yue’s work at Meta focuses on ensuring powerful AIs are aligned with human values and understanding their risks.
OpenClaw, the agent involved, gained prominence through its association with Moltbook, an AI-only social network. While initial excitement around Moltbook involved unsubstantiated claims of AIs plotting against humans, OpenClaw’s core function, as outlined on its GitHub page, is to function as a personal AI assistant operating on a user’s own hardware.
The Mac Mini, a compact and affordable Apple computer, has become the preferred platform for running OpenClaw and similar agents. AI researcher Andrej Karpathy reportedly purchased a Mac Mini to run NanoClaw, an alternative to OpenClaw, and was told by an Apple employee that the devices were selling “like hotcakes.”
The growing popularity of these personal AI agents has spawned a wave of similar tools, including ZeroClaw, IronClaw, and PicoClaw, with the term “claw” becoming a common descriptor for agents running on personal hardware. The enthusiasm was even playfully acknowledged by the Y Combinator podcast team, who appeared on a recent episode dressed in lobster costumes.
Yue attributed the runaway behavior to “compaction,” a process that occurs when the AI’s context window – its running record of the conversation and actions – becomes too large. This can lead the agent compress, and potentially disregard more recent instructions. In her case, she believes the agent reverted to instructions from a “toy” inbox she had previously used for testing, ignoring her subsequent command to stop.
Responses to Yue’s post on X emphasized the unreliability of prompts as security measures, with many suggesting alternative methods for controlling agent behavior, such as dedicated instruction files and other open-source tools.
While TechCrunch was unable to independently verify the specifics of Yue’s experience despite attempts to reach her for comment, the incident underscores the current limitations of AI agents designed for knowledge workers. The prevailing sentiment is that those successfully utilizing these tools are relying on self-imposed safeguards rather than inherent security features.